Embedded predictive modeling in a parallel relational database

A. Dorneich; E. Pednault; R. Natarajan; F. Tipu

doi:10.1145/1141277.1141409

SAC 2006

Conference paper

23 Apr 2006

Embedded predictive modeling in a parallel relational database

View publication

Abstract

A methodology for embedding predictive modeling algorithms in a commercial parallel database is described; specifically, the parallel editions of IBM DB2 Universal Database, although many aspects of the overall approach can be used with other commercial parallel databases. This parallelization approach was implemented in the Version 8.2 release of DB2 Intelligent Miner Modeling to support a new predictive modeling algorithm called Transform Regression. This database-embedded mining algorithm provides all the usual benefits, including easier integration into large enterprise applications, the ability to perform entire data mining workflows directly from an SQL-based programming interface, reduced data transfer costs between the database and the data mining application, and faster, parallel data access during query processing. However, in addition to the these benefits, a significant part of the data mining computations are also parallelized without the use of any sophisticated parallel programming constructs, or any specialized message passing and parallel synchronization libraries. Copyright 2006 ACM.

Paper