SAO: A stream index for answering linear optimization queries
Abstract
Linear optimization queries retrieve the top-K tuples in a sliding window of a data stream that maximize/minimize the linearly weighted sums of certain attribute values. To efficiently answer such queries against a large relation, an onion index was previously proposed to properly organize all the tuples in the relation. However, such an onion index does not work in a streaming environment due to fast tuple arrival rate and limited memory. In this paper, we propose a SAO index to approximately answer arbitrary linear optimization queries against a data stream. It uses a small amount of memory to efficiently keep track of the most "important" tuples in a sliding window of a data stream. The index maintenance cost is small because the great majority of the incoming tuples do not cause any changes to the index and are quickly discarded. At any time, for any linear optimization query, we can retrieve from the SAO index the approximate top-K tuples in the sliding window almost instantly. The larger the amount of available memory, the better the quality of the answers is. More importantly, for a given amount of memory, the quality of the answers can be further improved by dynamically allocating a larger portion of the memory to the outer layers of the SAO index. We evaluate the effectiveness of this SAO index through a prototype implementation. © 2007 IEEE.