Note this was limited to the 50 states, DC, and PR.
K-Means Clustering on COVID-19 Cases
Thought it might be interesting to run K-Means on all the cases in the US.
To do so, I exploded the county dataset from its aggregated form to each row representing a single case of COVID-19.
It could be interesting to normalize the data for population.
Ran k=range(2, 22, 2)
Some interesting findings:
k=2 - East vs West. Even though the 26 states east of the Mississippi River make up ~58% of the total US population
we see ~85% of US cases represented by the cluster with its centroid near Carroll County, Maryland.
k=4 - The northeast (~61%), the south (~14%), the midwest (~16%), and the west (~9%).
k=12 - New Orleans emerges as a separate cluster from Texas. The DMV, Baltimore, and Philly gets its own cluster.
k=16 - New England and Boston finally get their own cluster.
Time Series View
Note both of these plots have a log scale on the y-axis.