Charles H. Bennett, Aram W. Harrow, et al.
IEEE Trans. Inf. Theory
Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loose-coupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and inter-operability. As a byproduct of this study, we identify some primitives for native support in database systems for decision-support applications. © 2000 Kluwer Academic Publishers.
Charles H. Bennett, Aram W. Harrow, et al.
IEEE Trans. Inf. Theory
Yigal Hoffner, Simon Field, et al.
EDOC 2004
Zohar Feldman, Avishai Mandelbaum
WSC 2010
Sonia Cafieri, Jon Lee, et al.
Journal of Global Optimization