Publication
VLDB 2005
Conference paper

Parallel querying with non-dedicated computers

Abstract

We present DITN, a new method of parallel querying based on dynamic outsourcing of join processing tasks to non-dedicated, heterogeneous computers. In DITN, partitioning is not the means of parallelism. Data layout decisions are taken outside the scope of the DBMS, and handled within the storage software; query processors see a "Data In The Network" image. This allows gradual scaleout as the work-load grows, by using non-dedicated computers. A typical operator in a parallel query plan is Exchange [7]. We argue that Exchange is unsuitable for non-dedicated machines because it poorly addresses node heterogeneity, and is vulnerable to failures or load spikes during query execution. DITN uses an alternate intra-fragment parallelism where each node executes an independent select-project-join-aggregate-group by block, with no tuple exchange between nodes. This method cleanly handles heterogeneous nodes, and well adapts during execution to node failures or load spikes. Initial experiments suggest that DITN performs competitively with a traditional configuration of dedicated machines and well-partitioned data for up to 10 processors at least. At the same time, DITN gives significant flexibility in terms of gradual scaleout and handling of heterogeneity, load bursts, and failures.

Date

Publication

VLDB 2005

Authors

Share