European Skew in Genetic Research Databases Won’t Abate Without More Concerted Effort
By Robert I. Field, Anthony W. Orlando and Arnold J. Rosoff
The “pangenome” project that is mapping the genetic sequences of 47 diverse human beings from around the world is a much-needed step forward for science and humanity. The lack of racial diversity in genetic databases used in research has been noted for some time, and it has raised growing concerns about the development of clinical applications based on research results. The pangenome does not eliminate these concerns, but it calls greater attention to the significant racial underrepresentation that remains in most databases currently used in research.
For genomic medicine to have widespread effectiveness, it is important that it be based on the study of a diverse pool of subjects. This is especially true in the development of “precision medicine,” in which therapies are tailored to a patient’s genetic characteristics. If a patient’s genetic traits are not represented in a database that was used for the research that led to a treatment, that treatment may be less effective or even risky for them.
Black Americans, as well as members of other racial and ethnic minorities, are most commonly underrepresented in medical research, yet they are the ones most vulnerable to the effects of such underrepresentation. This can create yet another instance of racial disadvantage in health care. However, a recent study of genetic researchers found that investigators tend to give only limited consideration to demographic diversity when selecting a database to use, with more attention paid to ease of access and other logistical considerations.
Lack of Diversity in an Illustrative Area – Alzheimer’s Disease Research
We conducted an analysis to investigate the extent to which the diversity of research databases has changed over time. We focused on an area of genetic research that would be manageable in size yet illustrative — susceptibility to Alzheimer’s disease — and reviewed more than 150 studies published between 1997 and 2022. We measured the diversity of databases as the reported percentage of subjects who are not of European ancestry, an approach similar to one used in a study of racial diversity of genetic samples used to study cardiovascular disease. When equally weighted across all studies, we found that 92 percent of them reported the diversity of the databases used. However, since larger samples may be more important in developing precision medicine tools, we also weighted studies by sample size. This analysis found that 96 percent of the studies with larger numbers of subjects reported demographic information.
Our analysis validated the presence of widespread underrepresentation in the studies that reported database demographic information. In only 23 percent of them did the databases qualify as diverse. The largest portion of studies, 42 percent, contained predominantly European genomes. The trend was even more pronounced when we weighted the studies by sample size. Using this metric, 84 percent of the genomes studied were European.
These studies are counterbalanced somewhat by a number of studies containing predominantly non-European genomes; however, these studies tend to use very small samples. Hence, underrepresentation is partly the result of the lack of diverse study populations but also partly the result of the small sample size when non-European subjects are included. We also found that genome-wide association studies (GWASs) are more likely than single-gene studies to use predominantly European databases, yet these are the studies for which broad genetic representation is most important. Moreover, the pattern appears to have changed little over the study period.
We also investigated the role of journals in publishing Alzheimer’s disease genetic studies with differing levels of diversity. Of note is the number of journals that have only published studies with 100 percent European subjects. However, there is also a sizeable minority of journals that have published studies with less than 50 percent European subjects. Thus, the nature of the journal itself, rather than trends over time, appears to matter most for the representativeness of databases used in published research. This suggests that disseminating research with diverse subject pools is not only theoretically possible, but several journals are already doing it.
Implications for Clinical Applications
What does this mean for clinical applications of genetic research? Clearly, it raises concerns over effectiveness, since lack of inclusion of genomes of large segments of the population means that genotypes prevalent in those populations could be missed. It may also affect the willingness of members of underrepresented groups to use those applications. Patients who are Black or members of other demographic groups that have a history of disadvantage in the health care system may feel especially suspicious of precision medicine and other genetically based therapies.
Racial disparities in care already pervade much of the American health care system, and the perception and experience of these disparities have diminished trust in medicine by many Black patients. Previous findings have noted underrepresentation of Blacks in many conventional clinical trials. Research suggests that lack of racial representation in the research used to develop new genetically based clinical tools could reinforce disparities in their use and foster further inequities in care.
Potential Solutions
Initiatives such as the pangenome project will add important resources for genetic researchers seeking more diverse pools of subjects for study. An especially important contribution may be made by the NIH’s All-of-Us initiative, which seeks to build a database with a million genomes from a racially and ethnically diverse population. Several health systems are also creating their own research databases with diverse racial and ethnic representation.
Another approach would be for commercial genetic testing companies to reach out more aggressively to members of underrepresented communities. The industry is aware of this challenge, and some companies claim to be striving toward this goal. Our study suggests that more work in this regard is needed.
Our research findings also point to another potential approach: amending journal policies to incentivize researchers to seek out more diverse databases by considering genetic diversity in reviewing studies submitted for publication. At a minimum, journal editors could require discussion of diversity as a limitation on the generalizability of findings. Journals serve as gatekeepers for research diffusion by determining which findings are published and by setting the preconditions for publication. Their ability to use these powers to promote awareness of research on racial inequities in health care has been noted by others.
Conclusion
Genomic medicine holds tremendous promise for curing diseases and saving lives. However, it is still a developing field. As it evolves, it should take care not to extend disparities that are already endemic in American health care. Our study indicates that in at least one important field of genetic research underrepresentation of members of historically excluded populations exists. It likely exists in others, as well. Clearly, continued efforts to increase diversity in genetic research are needed.
Robert I. Field, JD, MPH, PhD is Professor of Law at the Kline School of Law and Professor of Health Management and Policy at the Dornsife School of Public Health at Drexel University.
Anthony W. Orlando, MSc, MPW, PhD, is Assistant Professor of Finance, Real Estate, and Law at the College of Business Administration at California State Polytechnic University, Pomona.
Arnold J. Rosoff, JD, is Professor Emeritus of Legal Studies and Health Care Management at The Wharton School of the University of Pennsylvania.