Publication
DCC 2005
Conference paper

Off-line compression by extensible motifs

Abstract

Summary form only given. We present lossy off-line data compression techniques by textual substitution in which the patterns used in compression are chosen among the extensible motifs that are found to recur in the textstring with a minimum pre-specified frequency. A motif is to be interpreted here as a sequence of intermixed solid and don't care characters that obeys, in addition, some conditions of saturations: most notably, it must be not possible to eliminate some don't cares in the pattern without having to forfeit some of its occurrences. Motif discovery and motif-driven parses of various kinds have been previously introduced and used in Apostolico et al. (2004) and Apostolico et al. (2003). Whereas the motifs considered in those studies are "rigid", here we assume that each sequence of gaps present in a motif comes endowed with some individually prescribed degree of elasticity, whereby a same pattern may be stretched to fit segments of the source that match at all the solid characters but are otherwise of different lengths. This is expected to save on the size of the codebook, and hence to improve compression.

Date

Publication

DCC 2005

Authors

Share