Better Spatial Geometries for Clustering Algorithms
Everybody please start using this
A thing I sometimes dabble on nights and weekends is improved geometries for clustering algorithms. Just added new geometries for larger dimensions based on sphere packing. Software packages which do clustering should be able to switch to these and magically work better for only minor computational cost. Benchmarks here:
This metric is a good measure of how much unnecessary noise is added to distances and how easily points can slide past each other when being annealed.
The lattice implementation benchmarks look a little bumpy because of strangeness in sphere packing in different numbers of dimensions. It is optimal in dimensions 2, 3, 4, 6, and 8, but there are improvements which could be implemented with a bunch of work in 5, 7, and 9 dimensions. Above that the best option is an open mathematical problem but those are large numbers of dimensions to use for clustering anyway.
For visualization 2d lattice should be used. It 'crystallizes' nicely as shown here:
For applications which need antipodes (opposite points) Torus is best.
This is based on the sphere packing thoughts I had earlier. It turns out the extra 2 spheres for the kissing number in 7 dimensions come from squeezing spheres in at the north and south poles, which Henry Cohn kindly explained to me. I had the bright idea of trying to squeeze an extra sphere in just the north pole for 6 dimensions and then anneal everything else out of the way, which sadly doesn’t quite work but did result in a new record for the spherical code for 6 dimensions and 73 spheres.
It also turns out that although this kissing number construction forms lattices in 6 and 8 dimensions (the construction in 8 dimensions is explained on Wikipedia) in 5 and 7 dimensions because they’re odd going to the offset of the ‘filled in antipode’ twice would result in landing on a lattice point which has been knocked out because it’s of the opposite parity. It’s a bit bizarre that these match exactly the kissing numbers from the optimal packings anyway, which seems to imply that they can be used to tesselate space to form a sphere packing which is of maximal density.