About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Paper
Selection of 51 predictors from 13,782 candidate multimodal features using machine learning improves coronary artery disease prediction
Abstract
Current cardiovascular risk assessment tools use a small number of predictors. Here, we study how machine learning might: (1) enable principled selection from a large multimodal set of candidate variables and (2) improve prediction of incident coronary artery disease (CAD) events. An elastic net-based Cox model (ML4HEN-COX) trained and evaluated in 173,274 UK Biobank participants selected 51 predictors from 13,782 candidates. Beyond most traditional risk factors, ML4HEN-COX selected a polygenic score, waist circumference, socioeconomic deprivation, and several hematologic indices. A more than 30-fold gradient in 10-year risk estimates was noted across ML4HEN-COX quintiles, ranging from 0.25% to 7.8%. ML4HEN-COX improved discrimination of incident CAD (C-statistic = 0.796) compared with the Framingham risk score, pooled cohort equations, and QRISK3 (range 0.754–0.761). This approach to variable selection and model assessment is readily generalizable to a broad range of complex datasets and disease endpoints.