Structural Biochemistry/Genetic code/Genome Sequencing

The genomes of various organisms from bacteria to multi-cellular eukaryotes and even the human genome have already been completely sequenced due to the development of DNA sequencing techniques such as fluorescent dideoxynucleotide chain terminators, high-volume, rapid DNA sequencing. In the shotgun approach in sequencing, the genomic DNA was sheared into random fragments that were then completely sequenced by computers that matched overlapping regions between the fragments. The top-down method requires detailed initial physical map of thousands of clones. The first eukaryotic genome that was sequenced completely was baker's yeast.

The human genome progressed from a draft sequence from 2001 to a finished sequence in 2004. The human genome has over three billion base pairs of DNA distributed among 24 chromosomes. The human genome contains approximately 25,000 genes and 10,000 pseudogenes which are DNA encoding mutated genes that are no longer functional. Many genes encode more than one protein due to alternative splicing of mRNA and modifications to proteins after translation occurs. There are also millions of non-coding regions due to mobile genetic elements such as SINES (short interspersed elements) and LINES (long interspersed elements)

Sequencing projects require both rapid sequencing techniques and efficient methods for assembling many short stretches of 300 to 500 base pairs into a complete sequence.

The Sanger Dideoxy Method
Frederick Sanger and his coworkers developed a method in which fragments of DNA are isolated by controlled termination of replication. This technique was preferred over all other methods because of its sheer simplicity. There are four mixtures which each consist of the template strand, DNA primers, DNA polymerase, 4 types of nucleotides, and then each reaction mixture contains a small amount of the 2',3'-dideoxy analog of one of the nucleotides, a different analog for each mixture. The 2',3'-dideoxy analog lacks a 3'-hydroxyl group which is needed in order for DNA polymerase to form a phosphodiester bond; because the concentrations of each of the dideoxy analogs is low, the chain termination only occasionally occurs. In other words, most of the time DNA polymerase creates the phosphodiester bond which elongates the replicated strand, but only sometimes does it stop it. At some point, all DNA fragments are stopped due to the dideoxy analogs, however they are all various lengths. There are for sets of these chain-terminated fragments, so they can each undergo electrophoresis and the base sequence of the DNA can be read. Because each type of dideoxy analog can be labeled with a fluorescent marker, this makes it easy for to produce a chromatographic trace at four different wavelengths. Fluorescence detection is the preferred method because it eliminates the use of radioactive reagents and can be automated. This method can sequence up to one million bases per day.