Recursive fact-finding: A streaming approach to truth estimation in crowdsourcing applications
Abstract
This paper presents a streaming approach to solve the truth estimation problem in crowdsourcing applications. We consider a category of crowdsourcing applications where a group of individuals volunteer (or are recruited to) share certain observations or measurements about the physical world. Examples include reporting locations of gas stations that remain operational after a natural disaster or reporting locations of potholes on city streets. We call such applications social sensing. Ascertaining the correctness of reported observations is a key challenge in such applications, referred to as the truth estimation problem. This problem is made difficult by the fact that the reliability of individual sources is usually unknown a priori, since any concerned citizen may, in principle, participate. Moreover, the timescales of crowdsourcing campaigns of interest can be as small as a few hours or days, which does not offer enough history for a reputation system to converge. Instead, recent prior work, including our own, developed fact-finding algorithms to solve this problem by iteratively assessing the credibility of sources and their claims in the absence of reputation scores. Such algorithms, however, operate on the entire dataset of reported observations in a batch fashion, which makes them less suited to applications where new observations arrive continually. In this paper, we describe a streaming fact-finder that recursively updates previous estimates based on new data. The recursive algorithm solves an expectation maximization (EM) problem to determine the odds of correctness of different observations. We compare the performance of our recursive EM algorithm to a batch EM algorithm, as well as to several state-of-art fact-finders through extensive simulations. We also demonstrate convergence of the recursive algorithm to the results of the batch version through a real social sensing experiment. Our evaluation shows that the proposed approach can process data streams much more efficiently while keeping the truth estimation accuracy close to that of the (much slower) batch algorithm. Ours is therefore the first fact-finder developed with explicit consideration to the continuous update needs of crowd-sourcing applications. © 2013 IEEE.