Research My IBM Log in
Workshop paper

Efficiently querying archived data using hadoop

Abstract

The need to analyze structured data for various business intelligence applications such as customer churn analysis, social network analysis, telecom network monitoring etc., is well known. However, the potential size to which such data will scale in future will make solutions that revolve around data warehouses hard to scale. As data sizes grow the movement of data from the warehouse to archives becomes more frequent. Current file based archive models make the archived data unusable for any type of insight extraction. In this paper, we present an active archival solution for data warehouses that makes use of Hadoop distributed file system (HDFS) to store the data in an always available and cost-effective manner. We investigate various structured data storage schemes within HDFS and empirical evaluations show that a combination of Universal scheme model and column store is best suited for the active archival solution. © 2010 ACM.

Semiconductors Artificial Intelligence Quantum Computing Hybrid Cloud About Publications Blog Events Careers Contact Research Topics People Projects Newsletter X LinkedIn YouTube RSS Contact IBM Privacy Terms of use Accessibility