Scalable performance of system S for extract-transform-load processing
Abstract
ETL (Extract-Transform-Load) processing is filling an increasingly critical role in analyzing business data and in taking appropriate business actions based on the results. As the volume of business data to be analyzed increases and quick responses are more critical for business success, there are strong demands for scalable high-performance ETL processors. In this paper, we evaluate a distributed data stream processing engine called System S for those purposes. Based on the original motivation of building System S as a data stream processing engine, we first perform a qualitative study to see if the programming model of System S is suitable for representing an ETL workflow. Second we did performance studies with a representative ETL scenario. Through our series of experiments, we found that the SPADE programming model and its runtime environment naturally fits the requirements of handling massive amounts of ETL data in a highly scalable manner. Copyright 2010 ACM.