Real-time data quality analysis

Arun Iyengar; Dhaval Patel; Shrey Shrivastava; Nianjun Zhou; Anuradha Bhamidipaty

doi:10.1109/CogMI50398.2020.00022

CogMI 2020

Conference paper

01 Oct 2020

Real-time data quality analysis

View publication

Abstract

Data quality is critically important for big data and machine learning applications. Data quality systems can analyze data sets for quality and detection of potential errors. They can also provide remediation to fix problems encountered in analyzing data sets. This paper discusses key features that of data quality analysis systems. We also present new algorithms for efficiently maintaining updated data quality metrics on changing data sets. Our algorithms consider anomalies in data regions in determining how much different regions of data contribute to overall data metrics. We also make intelligent choices of which data metrics to update and how frequently to do so in order to limit the overhead for data quality metric updates.

Paper