Fast, expressive top-k matching
Abstract
Top-k matching is a fundamental problem underlying online advertising platforms, mobile social networks, etc. Distributed processes (e.g., advertisers) specify predicates, which we call subscriptions, for events (e.g., user actions) they wish to react to. Subscriptions define weights for elementary constraints on individual event attributes and do not require that events match all constraints. An event is multicast only to the processes with the k highest match scores for that event - this score is the aggregation of the weights of all constraints in a subscription matching the event. However, state-of-the-art approaches to top-k matching support only rigid models of events and subscriptions, which leads to suboptimal matches. We present a novel model of weighted top-k matching which is more expressive than the state-of-the-art, and a corresponding efficient algorithm. Our model supports attributes with intervals, weights specified by producers of events or by subscriptions, negative weights, prorating of matched constraints, and the ability to vary scores dynamically with system parameters. Our algorithm exhibits time and space complexities which are competitive with state-of-the-art algorithms regardless of our added expressiveness - O(M logN+S log k) and O(MN +k) respectively, with N the number of constraints, M the number of event attributes, and S the number of matching constraints. Through empirical evaluation with both statistically generated and real-world data we demonstrate that our algorithm is (a) equally or more efficient and scalable than the state-ofthe art without exploiting our added expressiveness, and it (b) significantly outperforms existing approaches upgraded - if possible at all - to match our expressiveness.