Survey of public assay data: Opportunities and challenges to understanding antimicrobial resistance
Abstract
Learning methods allow researchers to make predictions, draw inferences, and automate generation of mathematical models. These models are crucial to solving real world problems including, for example, antimicrobial resistance (AR). Machine learning (ML) methods depend upon ground truth data to achieve specificity and sensitivity. Since AR assay data is limited, it is of paramount importance to understand the distribution of ground truth data, the analyses it is suited for, and any limitations that bias downstream methods. In this paper, we report an analysis of bacterial biochemical assay data associated with whole genome sequencing (WGS) and discuss important implications when making use of assay data in combination with genetic features as training data for ML models.