Structural Biochemistry/Single Molecule DNA Sequencing

There are several single molecule DNA sequencing (SMDS) techniques under development, yet only single molecule sequencing by cyclic synthesis is currently advanced to the point that sequence information is produced in a massively parallel way directly from single DNA molecules. This sequencing technology relies on the use of fluorescently labeled nucleotides by DNA polymerase into complementary strands of DNA that are immobilized to a surface. The individual DNA strands are separated by a few microns and can be monitored as independent entities. The fluorescent signal of each incorporated labeled nucleotide is then sequentially detected using fluorescent microscopy. Since each DNA molecule is sequenced separately, there is no need for synchronization between different molecules. Tens of millions of molecules can be sequenced in parallel in single small reaction volume, and thus this method readily produces high through put sequencing at a minimal cost. Currently this technique produces short reading lengths, which make it suitable to re-sequence applications in which a reference sequence is given. A single reference genome can serve as a template for the thousands of genomes produced by the short DNA fragments. This data can be used to find rare mutations and genetic heterogeneity in multiple target environments with great accuracy, high rates and low cost. The ability to extract a massive amount of sequence information will equip cancer research with a powerful tool needed to defeat various genetic diseases.

Single molecule sequencing is a goal that has been pursued for almost two decades as a possible candidate to replace the Sanger method of sequencing. DNA sequencing by cyclic synthesis (SBS) differs from the Sanger method, which relies on length separation of amplified DNA strands that terminate with a particular color according to the last base in the chain. Instead, in SBS the synthesis itself is followed by various methods, which monitor many reactions in parallel and thus accelerate sequencing rate and reduce cost. Out of all the cycle-extension approaches, single molecule sequencing has the highest sequence information density, i.e. the number of sequence reads per unit area.

Current Sanger sequencing methods require a large amount of DNA to be replicated and then each of the sequencing runs is performed on one sequence at the time, a lengthy and expensive route. The alternative that DNA sequencing by cyclic synthesis offers is the sequencing of millions of fragments in parallel, and in the case of SMDS by cyclic synthesis no duplication of the DNA is needed at all. This combination would not only make whole genome sequencing far cheaper, it would also make it a lot faster. This would allow for rapid sequencing of numerous genomes and generate useful statistical comparisons.

The basic scheme of SMDS by cyclic synthesis is as follows:

1) DNA is sheared and cut into short fragments

2) These fragments are elongated by a common DNA tail

3) The DNA fragments are immobilized onto a glass surface that contains primers that match the common DNA tail.

4) All bound fragments are then sequenced in parallel by -

4a) Polymerase extension of one base with a fluorescently labeled nucleotide.

4b) Detection by TIRM of multiple fields of view to record incorporation events on tens of millions of DNA fragments.

4c) Removal of the dye molecule.

4d) Return to 4a with a different nucleotide.

5) The data of each sequence is compared to a known sequence and aligned with it.

6) Data analysis from this alignment reveals the sequence information in the target DNA.

The sequencing of DNA using single molecule fluorescence calls for careful experimental design in order to be able to observe the incorporation of single nucleotides into the DNA template. The goal is to collect the sequence information from each molecule by itself. As multiple fields of view are imaged in order to monitor incorporations on millions of templates simultaneously, techniques that precisely monitor the position of the molecules should be addressed. The sequence information from each molecule should then be aligned to the reference sequence. For long enough sequences, it is possible to align the found sequences to the reference even if there is disagreement or ‘error’. This ‘error’ could come from either a real error in the sequencing, or from the data under analysis – i.e. the mutations, polymorphism or heterogeneity that the re-sequencing reveals. In order to have enough statistics to provide a meaningful picture of the DNA sequence, an over-sampling is required which averages out random error, and reveals the sequence content of the sample. As the amount of strands that are sequenced at the same time is enormous, this is not a strong limitation on the method.

SMDS by cyclic synthesis is a technique that minimizes cost and enhances throughput over current Sanger sequencing methods. The ability to sequence millions of bases in parallel at very high density and high data rates, without the constraint ofsynchronous incorporations, establishes this method as a viable option for massive DNA resequencing applications. Significant reductions in reagent use, combined with minimal sample preparation, contribute to lower the cost and time of the resequencing, as well as virtually eliminating the amplification biases. The microfluidic implementation of this method could reduce, even further, the cost of the reagents and of the device as a whole. Further, the use of Förster Resonant Energy Transfer as a local illumination source in single molecule sequencing by fluorescence is useful for reducing noise and false positive signals from unspecific binding of nucleotides, and is applicable in other situations where a tightly confined excitation light is desirable. The use of cleavable fluorescent markers substantially increases the read lengths in single molecule sequencing as steric interactions between adjacent dyes are eliminated. Further increase in read length is anticipated by optimizing reaction conditions and by choice of the DNA polymerase used.