Maintaining Bernoulli samples over evolving multisets
Rainer Gemulla, Wolfgang Lehner, et al.
SIGMOD/PODS/ 2007
Large-scale machine learning (ML) algorithms are often iterative, using repeated read-only data access and I/Obound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable very fast matrix-vector operations on in-memory data. Generalpurpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Compressed linear algebra (CLA) avoids these problems by applying lightweight lossless database compression techniques to matrices and then executing linear algebra operations such as matrix-vector multiplication directly on the compressed representations. The key ingredients are effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Experiments on an initial implementation in SystemML show in-memory operations performance close to the uncompressed case and good compression ratios.We thereby obtain significant end-to-end performance improvements up to 26x or reduced memory requirements.
Rainer Gemulla, Wolfgang Lehner, et al.
SIGMOD/PODS/ 2007
Peter J. Haas, Christian König
SIGMOD 2004
Ihab Ilyas, Volker Markl, et al.
ICAC 2004
Kevin Beyer, Rainer Gemulla, et al.
Communications of the ACM