Exploiting correlation and parallelism of materialized-view recommendation for distributed data warehouses
Abstract
Many large enterprises require access to distributed data warehouses for business intelligence (BI) applications. Typically distributed data warehouses are integrated into a centralized data warehouse for the benefit of easy maintenance. However, this approach needs to overcome the complexity of data loading and job scheduling as well as scalability issues. On the other hand, the approach of a fully federated system may not be feasible for data intensive BI applications. The hybrid approach via intelligent data placement is more flexible and applicable than the centralized or fullfederation configuration. The current implementation of the hybrid approach to integrating distributed data warehouses is to aggregate selected data from various remote sources as materialized views and cache them at the federation server to improve the performance of complex BI query workloads. In this paper, we propose an improvement that recommends Materialized Query Tables (MQTs) for backend servers for the benefits of load distribution and easy maintenance of aggregated data in conjunction with the current hybrid approach of data placement. Our approach considers the correlation between backend servers and recommends MQTs that are well coordinated among the backend servers and optimized for a given workload. We also exploit the parallelism property among the backend servers to make our approach run almost linearly (in contrast to exponentially) with respect to the number of backend servers, without sacrificing its recommendation quality. Experimental evaluations validate the effectiveness and efficiency of our approach. © 2007 IEEE.