“ …the acquisition of the sequence is only the beginning.
The sequence information provides a starting point from
which the real research into the thousands of diseases
that have a genetic basis can begin.” J. CRAIG VENTER 1
The Human Genome Project
In 1986 Nobel laureate Renato Dulbecco laid down the gauntlet to the
scientific community to sequence the completehuman genome. “Its
significance,” he said, “would be comparable to that of the effort that
led to the conquest of space, and it should be carried out with the
same spirit.”2 Dulbecco also argued that such a project should be “an
international undertaking, because the sequence of the human DNA is
the reality of the species, and everything that happens in the world
depends upon those sequences.”
Likethe conquest of space, sequencing the human genome required
the development of wholly new technologies. The human genome,
containing more than three billion nucleotides, is vast. In 1986 DNA
sequencing had yet to be automated and, consequently, was slow and
tedious. Moreover, computer software for sequence analysis was just
being developed. Similar to the Apollo project that met PresidentKennedy’s goal of a manned lunar landing by 1970, the genome
project also succeeded — beyond the dreams of the scientists who
During the 1990s rapid progress was made in developing automated
sequencing methods and improving computer hardware and software.
By 2003 biologists had sequenced genomes from about one hundred
different species. These species included dozens of bacteria and othermicrobes, as well as the model systems: yeast, fruit fly, nematode, and
mouse. The capstone, of course, was the completion of the human
genome sequence. In 2001 two rival teams jointly announced the
completion of a draft sequence of the entire human genome,
consisting of more than three billion nucleotides.
Is human DNA “the reality of the species”? Do we now have all the
information we need to definehuman life? Perhaps surprisingly, the
answers are no. Genetics is more than just DNA. While DNA is the
blueprint for life, proteins carry out most cellular functions; DNA just
codes for RNA, which codes for protein.
One major surprise emerged from the sequencing of the human
genome. Although some scientists expected to find at least 100,000
genes coding for proteins, only about 30,000–35,000 ofsuch genes
M olecular to Global
R E D I S C OV E R I N G B I O L O GY
appear to be in the human genome. These genes comprise only about
two percent of the entire DNA. What is the rest of the DNA doing?
Biologists once thought that this noncoding DNA was just junk, and
hence called it “junk DNA.” As we will see below, evidence now
suggests that some junk DNA may have functions.
Thequest to understand the workings of human cells will not be over
until we understand how this genetic blueprint is used to produce a
particular set of proteins — the proteome — for each type of cell and
how these proteins control the physiology of the cell. (See the Proteins
and Proteomics unit.) We should think of the human genome as a
database of critical information that serves as a tool forexploring the
workings of the cell and, ultimately, understanding how a complex
living organism functions.
Sequencing a Genome
Sequencing a genome is an enormous task. It requires not only finding
the nucleotide sequence of small pieces of the genome, but also
ordering those small pieces together into the whole genome. A useful
analogy is a puzzle, where you must first put together the pieces of asmaller puzzle and then assemble those pieces into a much larger
picture. Two general strategies have been used in the sequencing of
large genomes: clone-based sequencing and whole genome
sequencing (Fig. 1 ).
In clone-based sequencing (also known as hierarchical shotgun
sequencing) the first step is mapping. One first constructs a map of the
chromosomes, marking them at regular intervals of...