Distributed data mining in a chain store database of short transactions
Abstract
In this paper, we broaden the horizon of traditional rule mining by introducing a new framework of causality rule mining in a distributed chain store database. Specifically, the causality rule explored in this paper consists of a sequence of triggering events and a set of consequential events, and is designed with the capability of mining non-sequential, inter-transaction information. Hence, the causality rule mining provides a very general framework for rule derivation. Note, however, that the procedure of causality rule mining is very costly particularly in the presence of a huge number of candidate sets and a distributed database, and in our opinion, cannot be dealt with by direct extensions from existing rule mining methods. Consequently, we devise in this paper a series of level matching algorithms, including Level Matching (abbreviatedly as LM), Level Matching with Selective Scan (abbreviatedly as LMS), and Distributed Level Matching (abbreviatedly as Distributed LM), to minimize the computing cost needed for the distributed data mining of causality rules. In addition, the phenomena of time window constraints are also taken into consideration for the development of our algorithms. As a result of properly employing the technologies of level matching and selective scan, the proposed algorithms present good efficiency and scalability in the ming of local and global causality rules. Scale-up experiments show that the proposed algorithms scale well with the number of sites and the number of customer transactions.