The Human Genome Discovery

The Greatest Biological Development in Science History

Scientists have cracked the code, the longest, tiniest imaginable, most important, oldest code: the code of human life, the DNA sequence of humanity. The numerics are staggering: written in just a four-letter alphabet (A, T, C, G), the human genome is around 3 billion letters long (or about one billion "words" in length since each word (a codon) is three letters long), and there are around 600 billion-trillion copies of it on Earth (6 billion people times 100 trillion cells per person). It took about 3 billion years to create (the age of life on Earth) and only 15 years to decipher if one starts at the beginning of the Human Genome Project. Alternatively, it might be argued that it has taken several 100,000 years (the age of Homo sapiens) for humans to look inside themselves and figure out their vital essence.
The human genome is the crown jewel of 20 years of biological research, the most important accomplishment in the field to date. On a scale unmatched in the history of biology, it has been a massive project built on the scientific endeavors of decades of dedicated investigators. In effect, biologists have climbed Mount Sinai and brought back the hitherto secret scriptures of life.

Without this "biological Rosetta Stone,"
Nature's four-letter texts would be as incomprehensible
as a message from an alien civilization.

     The first edition of this most sacred sequence in science has been released by two groups: the publicly supported International Human Genome Sequencing Consortium, whose principal spokesperson in the United States is Francis Collins; and the privately funded Celera Genomics headed by Craig Venter. The results have appeared in landmark publications. Science published the research of the Celera Genomics, while Nature published that of Human Genome Project.
     The genome encodes the proteins that form the structural elements of life and that regulate numerous biological processes. Genes provide the characteristics that distinguish one individual from another and allow these features to be passed from one generation to the next through reproduction, thereby providing the microscopic mechanism for evolution. For these reasons, the genome is often called the blueprint for life. In short, the sequence of the human genome and similar sequences for other organisms comprise the Books of Life, the Bible of Biology so to speak.
     The genome is composed of chromosomes. In humans, there are 24 four different types, which are labeled chromosome 1, chromosome 2, . . ., chromosome 22, X and Y. Thus, the Great Code is contained in 24 volumes. Humans, like other higher forms of life, are diploid (that is, their chromosomes are duplicated in the nucleus of a cell). There are 23 pairs, 22 of which are matched: There are two copies of chromosomes 1 through 22, and then either an XX pair for females or an XY pair for males. Each chromosome consists of a long DNA molecule wrapped into a compact form around proteins known as histones – roughly like the way thread is wound about a bobbin. DNA is comprised of two long chains of nucleotides bound and twisted about each other to form a helix. The nucleotides are of four types: adenine (symbol A), guanine (symbol G), cytosine (symbol C) and thymine (symbol T). Specifying the nucleotide sequence as a series of "biological letters," such as CTATGAT . . ., determines the DNA molecule.

In effect, biologists have climbed Mount Sinai
and brought back the hitherto secret scriptures of life.

The remarkable scientific accomplishment that has been achieved is to provide nearly complete DNA sequences for the 24 human chromosomes. Within a relatively short period of time, these sequences will be precisely known. Eventually, the genomes of almost every living creature on Earth will be part of the scientific data bank, the sum of which constitutes the Library of Life.
Genes are certain sections of the DNA that code proteins. Messenger ribonucleic acid, abbreviated mRNA, transports the information in the DNA to the protein-producing machinary of a cell. In a given cell, certain genes are turned on, meaning that they are allowed to generate the proteins that they code, while other genes are switched off. The genes that are turned on determine the function of a cell.

The Y chromosome is a junkyard.

The amino acid sequence of a protein coded by a gene is determined from the genetic code. Without this "biological Rosetta stone," Nature's four-letter texts would be as incomprehensible as a message from an alien civilization. Less than 1.5% of the genome encodes proteins; the rest consists of non-coding sequences, a sizeable fraction of which is junk, meaning that it appears to have no present biological purpose. In fact, the human genome is a genetic jungle full of sequences of "freeloaders," "parasites," "hitchhikers," "ancient viral invaders," and "evolutionary fossils" that are all competing for space on the DNA molecule. The "hitchhikers," scientifically known as transposons, have copied themselves and jumped from place to place. It appears that some stretches of sequences date back to the days of unicellular life in the Pre-Cambrian, more than 700 million years ago. It may be humbling to know that bacteria carry much less excessive baggage: their coding regions appear one after another with a minimum amount of junk DNA.
Feminists will be happy to learn that the male-defining Y chromosome is a junkyard, full of repetitive, non-functional nucleotide sequences. Furthermore, there are many copies of sperm-production genes in the Y chromosome; it is as though males are afraid of sterility or trying to defend themselves against female invasion. What is worse is that evolution has reduced it to a little stump in comparison with the other chromosomes and that it will be stuck with these features for a long time: Because the Y chromosome does not recombine (that is, it does not undergo sequence shuffling during reproduction), it is slow to evolve. On the other hand, this renders it useful in molecular anthropology, which uses DNA to deduce various relations among Homo sapiens during the past 200,000 years.

The human genome appears to contain only a third
as many genes as had been previously estimated.

     Although many proteins have been studied in detail, the DNA sequences will eventually provide a comprehensive list of all the proteins that the body makes. Defects in the proteins, which are caused by sequence errors in the genes, are responsible for much of human disease.
     From the viewpoint of the human genome, individuals are 99.9% identical. Yet, the residual 0.1% leads to several million spelling differences, with some such variations leading to dramatically higher risks of certain cancers and other diseases. These differences are known as polymorphisms, of which the most important type are single nucleotide polymorphisms, or SNPs (pronounced "snips"). SNPs are a main source of genetic variation.
     So what have we already learned from the human genome projects?
     Surprise! The human genome appears to contain only a third as many genes as had been previously estimated. Scientists had expected to find as many as 100,000 genes. But the latest results suggest somewhere between 26,000 to 40,000 genes with 30,000 being the favored figure. However, the old rule "one gene – one protein" appears to be wrong. Depending on the circumstances, a single gene may be able to initiate the manufacturing of several proteins, so that the number of distinct proteins in a human body probably numbers around 100,000. This is because a gene consists of exons separated by introns. All the coding for the protein is in the exons. When a protein is made, the introns are removed and the exons are spliced together, but it turns out that there are often several ways in which the splicing can be done.
     It also appears that total number of genes is not a leading factor in biological sophistication: The roundworm, for example, has 19,000 genes, while the fruit fly possesses 13,600. These organisms are relatively simple invertebrates. For example, the roundworm has only 960 cells, whereas a human has 100 trillion, and the 100 billion brain cells in a human should be compared to the 300 neurons of the roundworm.
     Other results: Initial findings indicate a larger amount of junk DNA. For a long time, biologists have known that much of the genome consists of repeating elements that have copied and inserted themselves into the sequence and whose only purpose appears to be to reproduce themselves, an idea that has been coined "the selfish gene" by Richard Dawkins. Although most junk DNA seems to serve no extant biological purpose, it might play a role in evolution. It should be somewhat humbling that it makes up more that 98.5% of the genome. In other words, less than 1.5% of the genome is used for coding proteins. This small percentage is half of what was thought to be the number before the sequencing projects were done.

The longest gene is dystrophin,
a muscle protein with 2,400,000 base pairs.

     Another interesting result is that whole blocks of genes are copied from one chromosome to another. This might have occurred in evolution tens of millions of years ago as a protective mechanism. Chromosome 19 is the biggest culprit, sharing genetic blocks with 16 other chromosomes. It also appears to be the one mostly densely packed (See Table of Human Genome Statistics below). Large-scale block transfers have also been seen in the genome of the mouse. These duplicated fragments of DNA that have gotten inserted back into the chromosomes have shaped the size and architecture of the genome of these mammals.
     The human genome also contains vast regions of repeating sequences. Scientists at Celera Genomics estimate that almost 50% of the genome consists of these repeaters. Two, the "freeloaders" called LINE1 and Alu, make up respectively about 17% and 10% of the DNA in human chromosomes.
     Here are some interesting statistics that have emerged from the human sequencing projects:

Table of Human Genome Statistics

Topic	Statistic
Total size of the genome:	approximately 3,200,000,000 bp*
Percentage of adenine (A) in the genome:	54%
Percentage of cytosine (C) in the genome:	38%
Percentage of bases not yet determined:	9%
Highest gene-dense chromosome:	chromosome 19 with 23 genes per 1,000,000 bp*
Least gene-dense chromosomes:	chromosome 13 and Y with 5 genes per 1,000,000 bp*
Percentage of DNA spanned by genes:	between 25% and 38%
Percentage of exons:	1.1 to 1.4%
Percentage of introns:	24% to 37%
Percentage of intergenic DNA:	74% to 64%
The average size of a gene:	27,000 bp*
The longest gene:	dystrophin (a muscle protein) with 2,400,000 bp*
Average length of an intron:	3,300 bp*
Most common length of an intron:	87 bp*
Occurrence rate of SNPs:	roughly 1 per 1,500 bp*
Occurrence rate of genes:	about 12 per 1,000,000 bp*

*bp = base pair

Note that the percentage of thymine (T) in the genome is the same as adenine (A) because these two nucleotides appear in complementary positions in the two strands that make up DNA. Likewise, the percentage of guanine (G) is the same as cytosine (C).

The day will come when a medical checkup consists of a DNA readout.

     With the annotation of the human genome, a lot of progress had been made. What is the next great challenge for genetic biologists?
     Imagine that an engineer presents to you the blueprints for a Chevrolet that are full of lines of gibberish letters. Apparently, the design plans are in the form of a code. How are you going to build the car? Fortunately, you have a "GM decoding booklet" which allows you to translate the letters into known words. So you begin the task of reading the blueprints. Soon you uncover car phrases: "assemble spokes into wheel frame and attach hubcap, place bulb into socket of red tinted chamber, cast silver colored cylinder so that it has a unique key hole, . . . " Suddenly you become dispirited. The words are grouped in phrases that allow you to construct small parts but absent are the instructions for putting all the parts together. Locally, you understand; globally you are lost.
     This is the current situation in biology. The "GM decoding booklet" is like the genetic code, the words in the automobile blueprints are like codons, and the small car parts are like proteins. Unfortunately, biologists do not presently know how to combine a specific set of proteins to provide a cell with a particular function. Nature miraculously does this automatically. It is like throwing all the small parts that you have constructed from the blueprint manuals into adjacent piles and having a car amazingly emerge. So the next great goal in understanding life is to figure out how proteins collectively interact to carry out cellular processes. At the genetic level, biologists must learn to deduce the biological consequences of having a whole ensemble of genes turned on.
     In general terms, life at the elementary level is well understood. Its processes are metabolism, transcription of DNA into RNA, translation of RNA into proteins and DNA replication. As we learn more about genes and proteins, a more detailed understanding of life will be achieved.

Manipulating the genes of humans and living creatures
will allow humankind to do
what has been traditionally attributed to God.

     What will all this newly found genome knowledge bring? The answer is a revolution, the genetic revolution. The human nucleotide sequences marks the beginning of a whole new approach to biology.
     In the early 20th century, physicists uncovered the dynamics of the atom, which is known as quantum mechanics. That discovery led to the electronics revolution and the technology that we so much enjoy today. Now biologists will lead the way. Coming is the biotechnological revolution. It will last for decades, perhaps even several centuries.
     We are already entering the age of genetic-based medicine. The new knowledge of the human nucleotide sequences will accelerate the development of therapeutic drugs that function at the molecular level. More accurate medical diagnoses will be available. Doctors will be able to address the fundamental causes of countless human disorders and will have a better change of predicting the side effects of drugs. On the horizon are cures for cancer and heart disease.
     Eventually, scientists will be able to identify all of the genes contributing to a given disease. Individuals will know which sicknesses they are most at risk, giving them the possibility of making health-driven lifestyle changes or of taking preventative medical steps. Doctors will be able to tailor treatment to individuals.
     The day will come when a medical checkup consists of a DNA readout and genetic flaws will be corrected soon after or even before birth. Scientists will tell us how our physical abilities, intelligence, external characteristics and personality are affected by the variations in SNPs. Genetic manipulation will provide ways to overcome the limits imposed by our evolutionary past.
     The human genome sequence is a powerful tool for gaining insight into our genetic heritage and where we stand in the evolutionary scheme of things. The evolutionary tree can be determined by comparing the genomes of Earth's species.
     Eventually, we shall be able to take control of our own biological destiny when scientists learn to manipulate the human genome at will. No longer will we be at the mercy of the forces of natural selection. We shall be able to modify in part our vital essence. This will not be the intention in the beginning. Initially, the goal will be to correct defective genes. But gradually genetic manipulation will expand to allow couples to select features of their offspring. "Pro-choice" will take on a new meaning. At some point, scientists will have almost complete mastery of the genome. Moreover, genetic manipulation will not be only confined to humans. Long before it is used on mankind, it will applied to animals and plants.
     One can imagine the genetics-dominated world of the late 21st century: There will be fruits, vegetables and meats that are genetically modified for higher nutritional value. Sheep, mink, pigs, cows and other livestock will have their genes adjusted to yield higher output. Zoos will house unusual animals that differ notably from the animals from which they were derived. In place of refineries will be vast vats of swamp-like liquids containing bacteria, who, like domesticated farm animals, will produce high-tech genetically designed products that will provide a wide range of humanity's needs: food, energy, chemicals and medicines.
     Manipulating the genes of humans and living creatures will allow mankind to do what has been traditionally attributed to God. Indeed, President Clinton described the human genome as "the language in which God created Man." In response, Sydney Brenner of the Salk Institute for Biological Studies in San Diego said, "Perhaps now we can view the Bible as the language in which Man created God."

Click here for the 2003 update of this report.

This report was prepared by the staff of Jupiter Scientific, an organization devoted to the promotion of science through books, the internet and other means of communication.

This web page may NOT be copied onto other web sites, but other sites may link to this page.

To Jupiter Scientific's Information Page