Bringing Diversity to Genomic Data

Lack of diversity in genomic data is a well-recognized problem that creates challenges for researchers in identifying variants associated with disease, and for clinicians in diagnosing and treating patients. In fact, ethnic minorities may run the risk of being misdiagnosed because of the lack of diversity in genomic databases.

In response, research organizations have launched numerous initiatives to improve genomic diversity. Additionally, scientists are employing strategies such as reaching out to minority populations and collaborating with community health organizations to encourage increased participation in genomic studies. In the meantime, clinical laboratory professionals need to consider the limitations of existing genomic data when analyzing patient samples, according to experts.

Impact of Lack of Diversity

The Genome-Wide Association Studies (GWAS) Catalog—a National Human Genome Research Institute (NHGRI) and European Bioinformatics Institute-sponsored compendium of all published GWAS that have assayed at least 100,000 single nucleotide polymorphisms—illustrates the genomic diversity challenge. The first study describing participating GWAS as skewing heavily toward European populations was published 7 years ago.

An updated analysis by University of Washington researchers found that diversity has “gotten a bit better,” said study co-author Stephanie Fullerton, DPhil, an associate professor of bioethics and humanities (Nature 2016;538:161–4). “But we still have a real problem with representation of genomes of people who are ethnic minorities,” she added.

Fullerton and her colleague Alice Popejoy found genomic data diversity has improved because more Asian populations have been included. However, African Americans and Hispanics remain underrepresented. Lack of diversity limits geneticists’ ability to investigate associations between variants and disease and response to medications, the authors noted. Not having accurate and more complete genomic information also matters in clinical practice—individuals might be prescribed the wrong medications or undergo unnecessary surveillance and health interventions, which increases healthcare costs, said Fullerton.

In pursuit of precision healthcare, researchers need to move from GWAS to whole genome and whole exome sequencing, which enables them to assess rarer variations that are clinically relevant for diagnosis and treatment, noted Fullerton. These rare variants are “more likely to be ancestry-specific,” said Lawrence C. Brody, PhD, director of the Division of Genomics and Society and senior investigator at NHGRI. By definition, they also are less common, so researchers need to study more people to discern their effects.

However, whole genomic sequencing also has limitations because of a lack of diversity in study populations. For example, a recent analysis found that African Americans had been misdiagnosed with the inherited heart disease hypertrophic cardiomyopathy (HCM) because this ethnic population had been underrepresented in genomic control databases (N Engl J Med 2016;375:655–65).

“It is well recognized that various demographic groups have been poorly represented in past genomic studies and in control databases that have been used to identify causative genetic variants,” said the study’s first author, Arjun Manrai, PhD, a research fellow in the department of biomedical informatics at Harvard Medical School in Boston. This study demonstrates how such underrepresentation sometimes leads to genetic misdiagnoses in minority groups.

“Studies are urgently needed, both for HCM and other inherited diseases, to fully measure the impact of genomics in risk management and diagnosis across diverse populations,” added Manrai.

Toward Increased Diversity

One strategy to improve genomic diversity is using large-scale control databases to classify new genetic variants and systematically reassess prior variant classifications, said Manrai. For example, the genome Aggregation Database (gnomAD) contains sequencing data from more than 140,000 individuals from ancestrally diverse populations.

“We and others are using such resources to systematically re-evaluate prior assertions about which variants are believed to cause disease,” he said, adding that this approach is improving variant classification. Other large-scale control sequence databases such as the International Genome Sample Resource, 1000 Genomes Project, and the National Heart, Lung, and Blood Institute’s (NHLBI) GO Exome Sequencing Project, “are absolutely critical,” Manrai emphasized.

Another NHLBI project, Trans-Omics for Precision Medicine (TOPMed), is pursuing whole genome sequencing of diverse populations typically underrepresented in research. As Popejoy and Fullerton noted in their analysis, about half of TOPMed’s samples are from Americans of European descent, while 30%, 10%, and 8% are from African Americans, Latin Americans, and Asians, respectively. In contrast, 2016 GWAS representing 35 million samples were 81% of European ancestry, while 3%, 0.54%, and 14% were of African, Hispanic and Latin American, and Asian ancestry, respectively.

Additionally, NHGRI’s Population Architecture using Genomics and Epidemiology (PAGE) study aims to better understand the genetics of diverse populations, said Brody. One goal of PAGE involves measuring genetic variants using a specially designed multi-ethnic genotyping array.

NHGRI also holds the helm for designing the genetic and genomic aspects of the National Institutes of Health’s (NIH) landmark All of Us program, said Brody. All of Us plans to enroll 1 million people across the U.S. from diverse social, racial, ethnic, ancestral, geographic, and economic backgrounds. While not primarily a genomic study, having genetic information from so many people will be an excellent resource for interpreting the rest of the U.S. population’s genetic information, he explained.

Another NIH program, Human Heredity and Health in Africa (H3Africa), is being conducted by Africa-based researchers who are studying people within their respective countries. These investigators are developing plans to share their data with the global research community. This information will help “quite a few but not all” African Americans whose ancestry may be represented in these databases, Brody added.

As seminal as H3Africa might be, simply knowing about African genetic diversity is not sufficient if researchers want to understand the way genes and diseases relate, cautioned Fullerton. “We need to be doing that analysis in context. In other words, it’s just as important to analyze African American genomes if we’re interested in providing equitable healthcare to African Americans.”

The National Cancer Institute-funded Detroit Research on Cancer Survivors (ROCS) will provide researchers a resource for studying germline and somatic genetic variations in lung, breast, prostate, and colon cancers among African Americans, said Ann G. Schwartz, PhD, MPH, deputy director and executive vice president for research and academic affairs at the Barbara Ann Karmanos Cancer Institute and professor and
associate chair of oncology at Wayne State University School of Medicine in Detroit.

In addition to genetic data, Schwartz and her colleagues plan to assess associations between outcomes and treatments received, family history of cancer, and a multitude of other potential prognostic factors.

Community Outreach

As efforts such as All of Us, TOPMed, and ROCS percolate, researchers interested in improving genomic diversity have at their avail several strategies to ensure databases are more inclusive.

Reaching out to more diverse communities is a critical starting point, noted Fullerton. However, enticing people from traditionally under-represented groups to participate in genomic studies remains a challenge, she cautioned. Researchers need to meet potential study participants in community healthcare settings where they don’t usually conduct academic research.

Collaborating with community health organizations to answer questions that may be of interest to them and providing genetic services in exchange for genomic information might be one way to attract participation from underrepresented populations, suggested Fullerton. “We also need to put genomic research in context and think about how it intersects with the provision of healthcare and the problem of return of research results,” she said.

In addition to reaching out to minority populations to conduct large genomic studies, researchers also need “access to funding to support expensive genomic research,” said Schwartz.

NHGRI is offering several grants to encourage collecting data from diverse datasets, noted Brody. For example, the Clinical Sequencing Exploratory Research program is providing funds for genomic sequencing in healthcare settings that serve “ancestrally and socioeconomically diverse patients.”

U.S.-based researchers also need to inspire scientists from across the globe to generate and share their genomic data, said Brody. “For example, we might have a small number of Indonesians or Bolivians in this country, but they are important to us and we want to be able to interpret their genomes. If researchers in Indonesia or Bolivia could share their data, it would be helpful to people in the United States.” In turn, data from the U.S. would benefit other countries, he explained.

On the Frontlines

While researchers and organizations continue to find ways to improve genomic data diversity, clinical laboratory professionals need to be cognizant of the limitations of available genomic and exomic databases, said Fullerton. Specifically, clinical laboratories need to consider that ancestral origins are important when interpreting genetic variants, added Brody. Laboratorians should be aware of which populations were included in the reference databases the lab is using and have information on the ancestry of the patients they are testing, he said.

Laboratorians also need to consider the differences between rare variants and common variants. Most common variants are found across the globe, said Brody. Individuals who have a strongly penetrant rare variant associated with a specific medical condition such as cystic fibrosis or breast cancer are more likely to develop the associated disease.

Some but not all clinical laboratories have protocols for periodically updating their internal variant classifications, using resources such as control databases and guidelines on interpreting variants, said Manrai. “This can be arduous but is absolutely critical,” he said. “Variant classifications are proving to be much more fluid than once believed.”

The American College of Medical Genetics and Genomics (ACMG) and Association for Molecular Pathology in 2015 jointly issued standards and guidelines for interpreting sequence variants, and ACMG in 2013 issued and has continued to update recommendations for reporting incidental findings from clinical genome and exome sequencing.

Overall, while large-scale control databases are crucial to understanding the range of normal genomic variation across ancestrally diverse groups, “we lack a comparably powerful resource for researchers and physicians to explore the spectrum of normal clinical variation in individuals of diverse ancestry in the context of their genomic findings,” said Manrai. “This would be a game changer.”

Heather Lindsey is a writer in Maplewood, New Jersey.+Email: [email protected]