Shuchi Chawla, Robert Krauthgamer, et al.
Computational Complexity
The Web harbors a large number of communities - groups of content-creators sharing a common interest - each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities - those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.
Shuchi Chawla, Robert Krauthgamer, et al.
Computational Complexity
Christos H. Papadimitriou, Prabhakar Raghavan, et al.
FOCS 1994
Tukan Batu, Sanjoy Dasgupta, et al.
STOC 2002
David Liben-Nowell, Jasmine Novak, et al.
PNAS