K-Means Visualization


Points are generated (semi-)randomly: 135 points are generated. A 30-point cluster in the top left is generated, followed by a 20-point cluster in the middle right, followed by a 35-point cluster in the bottom. 50 points are then randomly generated uniformly across the entire canvas. Initial centroids are generated randomly. No special algorithms (like k-means++) are used here. You can also keep the same points and generate new random initial centroids.

Some stats are also shown below, demonstrating the speed of convergence and the fact that k-means only converges to a local minimum, not necessarily an absolute minimum (try choosing with different initial clusters and looking at the error term, and the cluster sizes). An iteration is defined to be a cluster assignment and a re-computation of the centroids.

The bad initial centroids button chooses initial centroids so that it (usually) converges with an empty cluster.

How many clusters?
Number of Iterations
Converged after TBD iterations
Cluster Sizes
Error