About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IBM J. Res. Dev
Paper
A scalable architecture for real-time analysis of microblogging data
Abstract
As events take place in the real world, e.g., sports games and marketing campaigns, people react and interact on online social networks (OSNs), especially microblog services such as Twitter, generating a large stream of data. Analyzing this data presents an opportunity for researchers and companies to better understand human behavior (both on the network and in real life) during the event's lifespan. Designing automated systems to conduct these analyses in fractions of minutes (or even seconds) is subjected to many challenges: the volume of data is large, the number of posts in future events cannot be predicted, and the system need to be always available and running smoothly to avoid information loss and delays on delivering the analytics results. In this paper, we present a scalable architecture for real-time analysis of microblogging data, with the ability to deal with large volumes of posts, by considering modular parallel workflows. This architecture, which has been implemented on the IBM InfoSphere Streams platform, was tested on a real-world use case to conduct sentiment analysis of Twitter posts during the games of the 2013 Fédération Internationale de Football Association (FIFA) Confederations Cup, and the system has successfully coped with the challenges of this task.