Making big data simple with dashDB local
Abstract
In this paper we introduce dashDB Local, a new Big Data & SQL warehousing technology from IBM designed to provide dramatic simplification over traditional data warehousing deployments, while offering next generation query performance and the analytic richness of the Apache Spark ecosystem. Designed as a Software Defined Technology, dashDB Local automatically adapts to hardware platforms to provide simplified deployments. In experiments one multiple workloads, we have achieved deployment times for large clusters in < 30 minutes while providing dramatic workload performance speedups of several factors on multiple workloads under study compared to industry leading appliance technology and market leading data warehouses. We see dashDB Local providing five novel value propositions: 1. Improved deployment with automatic adaptation to the target hardware enabling fast deployment in minutes of fully configured data warehouse and Big Data clusters from gigabytes to petabytes. 2. Superior workload performance through advances in in-memory columnar algorithms. 3. Rich polyglot language support, supporting many dialects of Big Data, SQL languages. 4. Full stack integration of Apache Spark into the processing engine, co-existing with the rich SQL engine. 5. Compatibility with dashDB as a service in the cloud, providing on-premises and cloud flexibility to develop and run applications and analytics where desired.