Vector Quantization with Sorting Transformation
Abstract
Vector quantization is a nearest neighbor representation based compression technique for vector data. It creates a collection of codewords to represent the entire vector space. Each vector data is then represented by its nearest neighbor codeword, where the distance between them is the compression error. To improve nearest neighbor representation for vector quantization, we propose to apply sorting transformation to vector data such that members within each vector are sorted. We show that among all permutation transformations, the sorting transformation minimizes L2 distance and maximizes similarity measures such as cosine similarity and Pearson correlation for vector data. Applying sorting transformation with vector quantization can substantially reduce compression errors. Meanwhile, it incurs storage overhead for saving the sorting permutation for each compressed vector. Through experimental validation on compression and nearest neighbor retrieval, we show that this is a beneficial trade-off for vector quantization on low dimensional vectors, a common scenario for vector quantization applications.