Vector Quantization (k-means problem)
5 papers with code • 0 benchmarks • 0 datasets
Given a data set $X$ of d-dimensional numeric vectors and a number $k$ find a codebook $C$ of $k$ d-dimensional vectors such that the sum of square distances of each $x \in X$ to the respective nearest $c \in C$ is as small as possible. This is also known as the k-means problem and is known to be NP-hard.
Benchmarks
These leaderboards are used to track progress in Vector Quantization (k-means problem)
Libraries
Use these libraries to find Vector Quantization (k-means problem) models and implementationsMost implemented papers
Fast K-Means with Accurate Bounds
We propose a novel accelerated exact k-means algorithm, which performs better than the current state-of-the-art low-dimensional algorithm in 18 of 22 experiments, running up to 3 times faster.
Breathing K-Means
For larger values of m, e. g., m = 20, breathing k-means likely is the new SOTA for the k-means problem.
Learning the k in k-means
The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution.
The Effect of Points Dispersion on the $k$-nn Search in Random Projection Forests
$k$-nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest.
Data Aggregation for Hierarchical Clustering
Hierarchical Agglomerative Clustering (HAC) is likely the earliest and most flexible clustering method, because it can be used with many distances, similarities, and various linkage strategies.