IBM Research and the Broad Institute seek to unravel the true risks of genetic diseases
In 2019, IBM and the Broad Institute of MIT and Harvard started a multi-year collaborative research program to develop powerful predictive models that can potentially enable clinicians to identify patients at serious risk for cardiovascular disease 1.
At the start of our collaboration, we proposed an approach to develop AI-based models that combine and analyze a multitude of genetic risk factors within an individual’s genome, along with their clinical health records and biomarker data, to more accurately predict the onset of complex and often fatal conditions, such as heart attacks, sudden cardiac death, and atrial fibrillation. In order to effectively combine these different modalities of patient health information, a solid understanding of the individual data components is necessary.
During the initial phases of our partnership, the team has focused on building this foundation of improved understanding, characterization, and quantification of the component genomic, clinical, and biomarker data.
As a significant finding of our work together, we have recently published research in Nature Communications 2 along with health technology company Color, a co-first author of the paper. Focused on genetic risk, the team examined whether polygenic background (also known as the multiple variants and factors within an individual’s genome) can influence the occurrence of disease in tier 1 genomic conditions – such as familial hypercholesterolemia, hereditary breast and ovarian cancer, and Lynch syndrome.
Our work focused on one key question: Can disease risk from a single genetic variant – such as the APOB, LDLR, or PCSK9 genes associated with familial hypercholesterolemia – be significantly raised or lowered by multiple risk factors that may cause minor changes within a wide range of seemingly unconnected cellular pathways?
De-identified data from over 80K individuals from two large data sets (UK Biobank and Color) were analyzed. The findings confirm and extend prior observations of the high variability of disease incidence in carriers of monogenic risk variants.
Among carriers of a monogenic risk variant, there are substantial variations in disease risk based on polygenic background. In other words, being a monogenic risk variant carrier does not automatically result in developing the disease.
As we note in our new research, the probability of developing coronary artery disease (CAD) by age 75 years ranges from 17.5 percent for monogenic risk variant carriers of familial hypercholesterolemia in the lowest percentile of the polygenic score, to 77.9 percent for those in the highest polygenic score percentile. This means it is possible for monogenic carriers to develop CAD disease risk approaching the average of noncarriers (13 percent) if they have a low-risk polygenic score. On the other hand, noncarriers may have CAD disease risk approaching the average of carriers (41 percent) if they have a high-risk polygenic score.
For individuals with familial hypercholesterolemia, these refined risk estimates, obtained by combining monogenic and polygenic risks, may better inform shared decision making about the timing and intensity of lipid-lowering therapy. Overall, accounting for polygenic background is likely to increase the accuracy of risk estimation for individuals who inherit a monogenic risk variant, and could provide clinicians with another piece of the puzzle as to the true risk of an individual actually developing a condition. When combined with other clinical information, such as medical history, lifestyle and other risk factors, this could help doctors to better assess prevention plans as well as potential treatments and therapies.
By leveraging large databases to combine and analyze medical and genomic data from tens of thousands of people, we have been able to shed significant new light on a number of serious, chronic diseases that affect millions of people globally, such as breast cancer, familial hypercholesterolemia and Lynch syndrome.
Ultimately, our findings unveil a silver lining: even if an individual carries a genetic mutation associated with one of these diseases, their absolute risk might not be as set in stone as previously thought. In fact, their absolute risk might be nearly equivalent to an individual who doesn’t carry the mutation at all – depending on other factors and mutations within their specific genome.
In the context of our larger collaborative goal of developing multi-modal AI-based risk prediction models, findings from this study (and similar ones that are underway) can help inform how we can best combine the genetic risk information with the clinical and biomarker data. As we continue to develop a solid foundational understanding of the component data types and move towards integrating them and modeling them jointly, we remain optimistic that our work can potentially help advance precision medicine for patients.