Approximate counting of inversions in a data stream
Abstract
Inversions are used as a fundamental quantity to measure the sortedness of data, to evaluate different ranking methods for databases, and in the context of rank aggregation. Considering the volume of the data sets in these applications, the data stream model [14, 2] is a natural setting to design efficient algorithms. We obtain a suite of space-efficient streaming algorithms for approximating the number of inversions in a permutation. The best space bound we achieve is O(log n log log n) through a deterministic algorithm. In contrast, we derive an Ω(n) lower bound for randomized exact computation for this problem; thus approximation is essential. We also consider two generalizations of this problem: (1) approximating the number of inversions between two permutations, for which we obtain a randomized O(√n log n)-space algorithm, and (2) approximating the number of inversions in a general list, for which we obtain a randomized O(√n log2 n)-space two-pass algorithm. In contrast, we derive Ω(n)-space lower bounds for deterministic approximate computation for these problems; thus both randomization and approximation are essential. All our algorithms use only O(log n) time per data item.