Publication
IEEE Transactions on Knowledge and Data Engineering
Paper

Automatic fragment detection in dynamic web pages and its impact on caching

View publication

Abstract

Constructing Web pages from fragments has been shown to provide significant benefits for both content generation and caching. In order for a Web site to use fragment-based content generation, however, good methods are needed for fragmenting the Web pages. Manual fragmentation of Web pages is expensive, error prone, and unscalable. This paper proposes a novel scheme to automatically detect and flag fragments that are cost-effective cache units in Web sites serving dynamic content. Our approach analyzes Web pages with respect to their information sharing behavior, personalization characteristics, and change patterns. We identify fragments which are shared among multiple documents or have different lifetime or personalization characteristics. Our approach has three unique features. First, we propose a framework for fragment detection, which includes a hierarchical and fragmentaware model for dynamic Web pages and a compact and effective data structure for fragment detection. Second, we present an efficient algorithm to detect maximal fragments that are shared among multiple documents. Third, we develop a practical algorithm that effectively detects fragments based on their lifetime and personalization characteristics. This paper shows the results when the algorithms are applied to real Web sites. We evaluate the proposed scheme through a series of experiments, showing the benefits and costs of the algorithms. We also study the impact of using the fragments detected by our system on key parameters such as disk space utilization, network bandwidth consumption, and load on the origin servers. © 2005 IEEE.