Plotting COVID-19 Cases in the US

Published 04-22-2020 20:07:54

Map View

  • Note this was limited to the 50 states, DC, and PR.

K-Means Clustering on COVID-19 Cases

  • Thought it might be interesting to run K-Means on all the cases in the US. To do so, I exploded the county dataset from its aggregated form to each row representing a single case of COVID-19. It could be interesting to normalize the data for population.
  • Ran k=range(2, 22, 2)
  • Some interesting findings:
    • k=2 - East vs West. Even though the 26 states east of the Mississippi River make up ~58% of the total US population we see ~85% of US cases represented by the cluster with its centroid near Carroll County, Maryland.
    • k=4 - The northeast (~61%), the south (~14%), the midwest (~16%), and the west (~9%).
    • k=12 - New Orleans emerges as a separate cluster from Texas. The DMV, Baltimore, and Philly gets its own cluster.
    • k=16 - New England and Boston finally get their own cluster.

Time Series View

  • Note both of these plots have a log scale on the y-axis.