Publication
IBM J. Res. Dev
Paper

Curating and integrating user-generated health data from multiple sources to support healthcare analytics

View publication

Abstract

As the volume and variety of healthcare-related data continue to grow, the analysis and use of this data will increasingly depend on the ability to appropriately collect, curate, and integrate disparate data from many different sources including user-generated health data. We describe our approach to, and highlight our experiences with, the development of a robust data curation process that supports healthcare analytics. The process consists of the following steps: collection, understanding, validation, cleaning, integration, enrichment, and storage. It has been successfully applied to the processing of a variety of data types including clinical data from electronic health records and observational studies, genomic data, microbiome data, self-reported data from surveys, and self-tracked data from wearables from more than 600 subjects. The curated data have been used to support a number of healthcare analytic applications, including descriptive analytics, data visualization, patient stratification, and predictive modeling.

Date

Publication

IBM J. Res. Dev

Authors

Share