CODS: Evolving data efficiently and scalably in column oriented databases
Abstract
Database evolution is the process of updating the schema of a database or data warehouse (schema evolution) and evolving the data to the updated schema (data evolution). Database evolution is often necessitated in relational databases due to the changes of data or workload, the suboptimal initial schema design, or the availability of new knowledge of the database. It involves two steps: updating the database schema, and evolving the data to the new schema. Despite the capability of commercial RDBMSs to well optimize query processing, evolving the data during a database evolution through SQL queries is shown to be prohibitively costly. We designed and developed CODS, a platform for efficient data level data evolution in column oriented databases, which evolves the data to the new schema without materializing query results or unnecessary compression/decompression as occurred in traditional query level approaches. CODS ameliorates the efficiency of data evolution by orders of magnitude compared with commercial or open source RDBMSs. © 2010 VLDB Endowment.