Reference Human Genomes and the Missing African DNA

20 June 2020


Dr Abdulrazak Ibrahim

Have you heard of 54gene? They are a start-up whose mission is to increase access to genomic data from African populations. Wait for a second - what does this mean? To understand this jargon, you first need to know more about the language of life.

The language of life as encoded in our DNA is written using four letters: A, C, G and T. Humans typically have 3 billion of those letters arranged in a cell. If the DNA of one cell is stretched out, it will be about 2 m in length. Doing that for the DNA in all our cells (a human body contains about 30 trillion cells) will cover twice the diameter of the Solar System. A comparison of the 3 billion letters of your cell with your child’s will show that 1.5 billion of those letters are from you and the other half from the partner with whom you had the sexual congress that produced the child. Brothers or sisters of the same parents will thus have roughly the same letters. This is why siblings look alike. But they also look different because the way these letters are arranged in their individual DNA is different. 

For example, if one sibling has the sequence “AACTG” in one region, another may have “AAGTG” in the same region. Notice that the difference here is only in the third letter. In the jargon of molecular biology, this is called Single Nucleotide Polymorphism (SNP, pronounced “snip”). SNPs occur once in every 1,000 letters of DNA. SNPs and other sources of variation are significant enough to cause 2 individuals with the same DNA to look different. So, when you inherit 50% of DNA from each of your both parents, you also inherit 50% SNPs from each, which will then form your own unique SNPs. The more closely related you are to a person, the more similar are your SNPs. This is why people of the same tribe/ethnicity or language group also look similar.

To understand the biology of any living system, including its SNPs, scientists have developed techniques to sequence and read the entire letters of DNA of organisms, including humans. This, for example, is crucial for designing drugs or vaccines. However, it is impossible to sequence the DNA of every single human being. Consequently, biologists rely on what is known as the reference genome, - an assembly of all DNA letters from people of various ethnicities, to construct an average representation of genomes for all humans. A reference genome can be used to study whether or not a drug can work in humans.

While the much-celebrated Human Genome Project was completed in 2003, a review of current human reference genomes in 2018 revealed that as much as 300 million DNA letters from African populations are missing in its draft. Indeed, close to 10 million variations were only recently identified in African DNA. This means despite having the highest genetic diversity of all humans, Africa has only contributed about 3% of all genetic data used for research and drug discovery. This has created a genomic data “in-equality” that disenfranchises the very region recognized as the cradle of humanity. The implication is that many drugs and other useful biological products have been designed over several years, without sufficient African genomic data in mind or at hand.

Covid 19

With recent global attention focusing on discussions around systemic racism, slavery and associated demographic shifts, the need for more knowledge about the African human genome becomes even more important. Perhaps one opportunity this offers is to underscore the need for scientists that work with reference genomes to forge partnerships that will see more centres across Africa, Asia, Europe, Americas and the rest of the world, collaborating to generate genomic data and fill the missing gap of the African DNA. This will not only help us understand our genomes better and therefore, foster drug discovery, but it may also potentially prevent the negative consequences of biases that have fed into systemic racism over the years.

This requires investment and commitment to research efforts and institutional capacity development that prioritize equitable genome-wide studies and leverage on biobanking potentials of Africa. Obviously, African research fraternities and their institutions must recognize this opportunity first and initiate the process.

In Nigeria, at least, that challenge has been picked up by Startups like 54gene, whose ambitious Africa’s whole genome project promises to improve the quality of healthcare worldwide. But neither research centres nor the private sector can do this alone. Concerted efforts involving global partners, publishing agencies, science communicators and governments need to come together in the quest to bridging the gap in genomic data, to equitably address human health and nutrition challenges in a sustainable manner.

Abdulrazak Ibrahim (PhD) is a molecular biologist at Ahmadu Bello University, Zaria-Nigeria.