Refining inclusive, genetic prediction models for diverse populations
In the realm of genetic research, a pressing issue has arisen: the majority of existing models used to estimate a person's genetic risk for a particular disease or trait are primarily based on data from people of European descent. This leads to inaccurate predictions for those of non-European ancestry, including individuals with mixed ancestry, often referred to as admixed ancestry.
To tackle this disparity, scientists at MIT, along with other researchers, have engineered a groundbreaking model that takes into account genetic data from a wide variety of ethnicities worldwide. This inclusive approach aims to enhance the accuracy of genetic predictions, particularly for underrepresented groups, promoting health equity by spreading the benefits of genomic sequencing more evenly across the globe.
"For people of African ancestry, our model proved to be about 60 percent more accurate on average," says Manolis Kellis, a professor at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and a member of the Broad Institute of MIT and Harvard. "For people of admixed genetic backgrounds more broadly, who have been excluded from most previous models, the accuracy of our model increased by an average of about 18 percent."
The researchers hope that their new methodology, which is now available for use by the broader scientific community, will improve health outcomes for a far-reaching demographic and help reduce health inequities. By developing a method that is more accurate for admixed and ancestry-diverse individuals, they aim to ensure that the benefits of human genetics research are equitably shared by everyone.
The work builds on the Human Genome Project, which mapped all human genes, and subsequent large-scale, cohort-based studies of how genetic variants in the human genome influence disease risk and other differences between individuals. These studies have shown that the impact of individual genetic variants on their own is generally minimal, but these small effects collectively impact the risk of heart disease, diabetes, stroke, and psychiatric disorders like schizophrenia.
However, since most genome-wide association studies included few people of non-European descent, polygenic risk models based on these studies have limited application to non-European populations. People from different geographic areas can have unique patterns of genetic variation, shaped by factors such as stochastic drift, population history, and environmental factors. For instance, genetic variations that offer protection against malaria are more common in people of African descent than in other populations, and these variations also impact other traits involving the immune system.
To overcome these limitations, the MIT team used computational and statistical techniques that enabled them to study each individual's unique genetic profile, rather than grouping individuals by population. This method enabled them to include people of admixed ancestry, a growing segment of the population that was previously underrepresented in most previous models.
When applying their new model to data from the UK Biobank, the researchers found that, compared to models trained only on European-ancestry individuals, their model's predictions are more accurate for all genetic ancestry groups. The most significant gains were observed for people of African ancestry, who improved by an average of 61 percent, although they made up only about 1.5 percent of the samples in the UK Biobank. The researchers also saw improvements for people of South Asian descent (11 percent), white British people (5 percent), and individuals of admixed ancestry (18 percent).
As the new model is further developed and integrated into disease risk assessments, it could have profound implications for personalized medicine, disease diagnosis, and the prevention of health disparities. The researchers are now looking to expand their model with more diverse datasets, including data from the United States, and apply it to additional traits beyond those analyzed in this study.
"Our work highlights the power of diversity, equity, and inclusion efforts in the context of genomics research," Tanigawa says. The research was funded by the National Institutes of Health.
- The story of genetic research has brought a pressing issue to light: the disparity in the accuracy of genetic risk models for people of non-European ancestry.
- In an effort to address this inequity, scientists have engineered a model using genetic data from various ethnicities, aiming to improve predictions for underrepresented groups.
- The MIT-led team, in particular, has reported a 60% average increase in accuracy for people of African ancestry and an 18% increase for those with admixed ancestry.
- The researchers hope their methodology, now available to the wider scientific community, will improve health outcomes for diverse populations and help reduce health disparities.
- The work builds upon the Human Genome Project and subsequent studies, which have shown that individual genetic variants collectively impact risk for various medical-conditions including heart disease and psychiatric disorders.
- However, most genome-wide association studies included few non-European descendants, leading to limited application of polygenic risk models to these populations.
- To overcome this limitation, the MIT team used computational techniques that allowed individual genetic profiles to be studied, enabling the inclusion of admixed ancestry individuals.
- The new model, applied to data from the UK Biobank, has shown more accurate predictions for all genetic ancestry groups, with significant gains for people of African, South Asian, white British, and admixed ancestry.