“Same stats, different graphs”

Do you know what “stimulated annealing” is? If not, then check out this cool project by Justin Matejka and George Fitzmaurice. The full title of their excellent work is “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Stimulated Annealing,” and here is an excerpt:

The key insight behind our approach is that while it is relatively difficult to generate a dataset from scratch with particular statistical properties, it is relatively easy to take an existing dataset, modify it slightly, and maintain those statistical properties. We do this by choosing a point at random, moving it a little bit, then checking that the statistical properties of the set haven’t strayed outside of the acceptable bounds (in this particular case, we are ensuring that the means, standard deviations, and correlations remain the same to two decimal places.)

Fig 3. Making a number of small changes to a dataset on the left, while maintaining the same overall statistical properties (to two decimal places), shown on the right.

Repeating this subtle “perturbation” process enough times, results in a completely different dataset. However, as mentioned above, in order for these datasets to be effective tools for underscoring the importance of visualizing your data, they need to be visually distinct and clearly different. We accomplish this by biasing the random point movements towards a particular shape. In the animation below, we show the process of 200,000 iterations of perturbations towards a ‘circle’ shape:

Fig 4. Transforming a random cloud of points into a circle, while maintaining the same statistical properties.