|
Lead
We are all Africans!
The complexity of human migration patterns can finally be
unearthed with the help of the Genographics Project, writes Varun Aggarwal
It
is no secret that President Obamas origin is from Africa, but what if
the same was said about you. What if technology could go deep down in history
to discover where your ancestors came from and it tells you that the place was
Africa? In case you want to know about your genetic background, dating back
as far as 150,000 years, it is possible with the help of the Genographic Project.
The Genographic Project is a nonprofit, five-year (2005 to 2010), global research
partnership of National Geographic Society, IBM and the Waitt Family Foundation.
From 2005 to 2010, The Genographic Project will attempt to collect and analyze
DNA blood samples from over 100,000 indigenous people making it the worlds
largest study of its kind in the field of anthropological genetics. The resulting
data will map world migratory patterns dating back some 150,000 years and will
fill in the huge gaps in our knowledge of humankinds migratory history.
This data will eventually comprise the largest database of its kind. In addition
to the field research component, the project invites the public around the world
to participate in the study by purchasing a Genographic Public Participation
Kit by buying a kit from National Geographic (www.nationalgeographic.com/genographic).
By sending in a simple cheek swab sample, a participant will learn about his
or her own deep ancestry while contributing to the overall Project. Dr. Ajay
Royyuru, IBMs Lead Scientist for the Genographic Project, who also
heads IBMs Computational Biology Research Center, said, So far,
over 270,000 members of the public have joined by purchasing Participation Kits
from over 130 countriestwo and a half times our five-year goal in the
first three years. Samples are now typically genotyped and available to the
users within four weeks of being received by the lab.
One of the Projects ten regional centers processes the indigenous blood
samples. Extracted DNA is analyzed at the centers using a standardized set of
scientific protocols, looking at genetic markers on Y chromosome and mitochondrial
DNA. All data is uploaded to the central database. Public Participant cheek
swab samples are processed at the University of Arizona Research Laboratories
via Family Tree DNA.
Common origins of the entire human population were established
based on genetic evidence well before the Genographic Project. Royyuru explained,
We are all Africans. The human species appeared in the 2.5 million years
to 250,000 years before the present day, in Africa. Human migration within and
out of Africa has occurred substantively in the last 100,000 years. On the basis
of genetic markers, one defines population groups that share common ancestry.
Such data can be analyzed as a phylogeny (tree). Populations on close by branches
share more recent common ancestors. The branches labeled L-haplogroups
from mitochondrial DNA analysis are indeed the ones connected closest to the
root, therefore these are populations ancestral to all others (non-L) populations.
Majority of present day L-population is in the African continent.
Technology, a backbone
|
"So
far, over 270,000 members of the public have joined by purchasing Participation
Kits from over 130 countriestwo and a half times our five-year goal
in the first three years"
- Dr. Ajay Royyuru
IBMs Lead Scientist for the Genographic Project, Head-IBMs
Computational Biology Research Center
|
IBM supports the Genographic Project in three ways. First,
the scientists in the field use an IBM client solution that allows for simpler
data collection in the field and allows the scientists to transmit this data
securely to a central repository. Secondly, IBM designed and built a solution
called the DNA Analysis Repository (DAR) that houses the genetic information
of hundreds of thousands of volunteers who have donated DNA to the Genographic
Project, as well as data submitted by the scientists. Lastly, IBMs Computational
Biology Center, one of the worlds foremost life sciences research facilities,
is helping to analyze the data to infer patterns of ancestry, including eventually
opening this massive database to researchers around the world at the conclusion
of the five year Genographic Project.
IBM has developed client software that allows the Principal
Investigators (PIs) in the field to collect, store and transfer the data that
they are collecting. By creating a simple user interface and linking the software
with genotyping equipment from Applied Biosystems, the Genographic client software
allows the PI to create expeditions, manage phenotyping and genotyping, and
then securely transmit that data back to a central repository to allow for further
study from the entire Genographic Consortium. Currently, we have 11 scientists
working in the field and have collected tens of thousands of DNA samples from
participating indigenous groups, whose partnership is a vital component of the
Genographic Project, informed Royyuru. IBM has also delivered a series
of collaborative tools, including wikis and blogs that allow the scientific
teams to collaborate quickly with each other.
As the Genographic Project is an international effort to collect and analyze
human DNA to answer questions about our migratory paths on an unprecedented
scale, a central DNA repository is critical to the success of the Project. IBM
has developed a solution to manage this unprecedented mountain of genetic information
called the DNA Analysis Repository (DAR). The DAR sits at National Geographic
headquarters in Washington, DC, and its components are BladeCenters running
Linux and a host of IBM software, including DB2 and WebSphere. The DAR accepts
data securely from the PIs in the field via the client described above, from
FamilyTree DNA (the company performing genetic analysis of the public samples)
and from the public participants on the National Geographic Genographic Web
site (also running on IBM Linux BladeCenter servers). This central repository,
and an IBM reporting interface to query it, allows scientists all over the world
to analyze this data to draw the migratory paths of our species back tens of
thousands of years. At the conclusion of the Genographic Project, this database
will be made available to scientists to encourage further study. Public participants
can also access DAR data through the National Geographic Genographic Web site
to view the result of their DNA analysis.
IBMs Computational Biology Center provides critical analysis on the gathered
data to infer patterns of descent and shed light on the migratory paths of our
species. The IBM CBC team, along with the rest of the Genographic Project Consortium,
has authored several papers detailing the findings from the Genographic Project,
with many more on the way.
| Prof. Rasamswamy Pitchappan at Madurai Kamraj University
is leading the field research for the project in India. He has conducted
several expeditions within India, to obtain samples from selected indigenous
populations. He has already conducted expeditions in Tamil Nadu and Orissa,
and will soon have another expedition to Assam.
Evidence from recorded history (for instance, cultural,
anthropological, linguistic) and the limited genetic evidence gathered
from prior studies indicates that the Indian subcontinent has high diversity
of populations and holds significant clues to global migratory patterns.
Prof. Pitchappan and Dr. Spencer Wells prior work, for instance,
revealed the migration of Australian indigenous population from Africa,
via Southern India. Many such patterns in global migration can be uncovered,
by relating and analyzing the data from different region of the world.
|
Overcoming challenges
Such an enormous project cannot be successful without a few hurdles. The biggest
hurdle that stood in its way was of protecting personal data of hundreds of
thousands of participants. Royyuru explained, We are asking a person to
volunteer something that is personal, sequencing regions of their genomesthis
is what defines them, what is unique to them. There is an enormous amount of
sensitivity to such data, which we fully respect. We cannot compromise on confidentiality
and privacy. Our team is looking solely at what the scientific facts tell us.
The road ahead
IBM and National Genographic have developed educational and training material
for students and teachers, to learn about the project, population genetics,
global migration, as well as the cultural history of humankind.
The Genographic Legacy Fund seeks to reciprocate the positive contribution made
to the project by traditional and indigenous peoples by directing funds to cultural,
educational, and revitalization efforts within indigenous communities. Proceeds
from the sale of the Genographic Participation Kits help fund future field research
and a legacy project, which will build on National Geographics 117-year-long
focus on world cultures. The legacy project will support education and cultural
preservation projects among participating indigenous groups.
The project offers great insights to understand human diversity and will allow
researchers to learn things that we do not already know in an area that we are
eager to study information-based medicine. The understanding of how medicine
relates to a population, why one solution works for some people and not for
others, how to minimize side effects and maximize benefitsthese are all
vital for the future of healthcare. In addition, to reach this understanding,
you have to get to the root of what population diversity means. The data from
the Genographic Project, while not having any medical content, will far exceed
anything we could ever get in a medical study.
varun.aggarwal@expressindia.com
|