Cracking The Code: Sanger Sequencing And The Human Genome Project

Last week, I wrote about the potential of personalized medicine, and I briefly mentioned the role of genetic sequencing and the Human Genome Project (HGP) in making personalized medicine possible. This week, I want to delve a little deeper into what exactly goes into sequencing a genome. The HGP, which took 13 years to complete, was performed using a process known as Sanger sequencing. Sanger sequencing, invented in 1977 by Fred Sanger, is a laborious and costly process of sequencing. In the HGP, relatively tiny fragments of the human genome were sequenced multiple times and aligned together, piece by piece, until they formed a full chromosome (there are 23 sets of unique chromosomes in humans). Shortly after the HGP, researchers began pushing for faster, cheaper sequencing technology. The goal was the “$1000 dollar genome,” a phrase that quickly became “shorthand for the promise of DNA-sequencing capability made so affordable that individuals might think the once-in-a-lifetime expenditure to have a full personal genome sequence read to a disk for doctors to reference is worthwhile.” In the 17 years since the HGP, scientists have more than delivered on this promise. Next-Generation Sequencing (NGS) is cheaper and faster than ever before, dipping below the 1000 dollar benchmark.

“Cell karyotype exhibiting trisomy” by National Institutes of Health (NIH) is licensed under CC BY-NC 2.0

DNA has a zipper structure where one strand has bases that complement those in the other strand; the end of the Adenine (A) base only fits into the end of the Thymine (T) base, and the end of the Guanine (G) base only fits into the end of the Cytosine (C) base. So a sequence like ACG would be complemented with the sequence TGC. The two strands zip together and incorrect pairings lead to a bulge in the zipper that can severely impede its functioning. This complementary zipper-like structure of DNA is essential to the cell’s ability to copy DNA and follow its instructions for building protein. It also makes Sanger sequencing possible.

Sanger sequencing starts with a DNA primer, a short piece of single-stranded DNA that is complementary to a known portion of the template DNA. When the solution is heated up, the template DNA denatures (splits apart into single strands). Through a cycle of heating and cooling, the primer binds to its complementary sequence in the template DNA. With the primer bound to the template, a protein called DNA polymerase can add DNA bases to the end of the primer in complement with the rest of the template. This is a normal job for a DNA polymerase enzyme, one of the key cellular proteins that performs DNA replication. Without a reliable DNA polymerase, cells would not be able to faithfully replicate their DNA during cell division.

“File:NHGRI Fact Sheet- Deoxyribonucleic Acid (DNA) (26990477451).jpg” byNational Human Genome Research Institute (NHGRI) from Bethesda, MD, USA is licensed under CC BY 2.0

DNA polymerase is able to add free-standing DNA bases onto the end of a primer to form a growing chain that is complementary to the template. Think of it as laying down a toy railroad track, the polymerase fits a new DNA base into the end of the existing track. It continues down the track, base by base, adding new pieces of the track ahead of it according to the instruction manual provided by the template strand. Typically, DNA polymerases make errors in only 1 out of 100,000 bases, and most of these mutations are repaired by other cellular mechanisms. But in Sanger sequencing, researchers generally use the most “high-fidelity” polymerases, which can have error rates as low as 1 in a million bases.

During cellular replication, DNA polymerase continues adding bases to the track until the template runs out of instructions. While that produces a fairly accurate complement of the template strand, it doesn’t necessarily tell us anything about the sequence. To learn the sequence, scientists use specially tagged and modified bases. The special bases are added randomly to the growing DNA track and, once they are added, they stop the polymerase from adding any more bases. These “end-track” bases are called dideoxy(dd) analogs, and like the end track piece on a railroad set, they can only connect to the track on one side. Once the “end-track” base is added, no more bases can be added and the track stops. If the polymerizing reaction is run repeatedly, you get a solution of single-stranded DNA of every possible length, from a single base to the entire length of the template. Each strand is tagged with the “end-track” base that stopped its extension.

Originally, each “end-track” base (ddA, ddT, ddC, ddG) was color-tagged and added to separate polymerase reaction vials. Once the reaction was completed and all of the instances of that base were marked with a colored “end-track” base, the reactions were run out in four lanes on a gel that separates the molecules by size. The result was the full DNA sequence displayed out vertically along the gel. The bases closest to the primer showed up as bands at the bottom of the gel because they were terminated by an “end-track” base right away. And whether they are in the A, T, C, or G lane tells the scientist what base exists at that position.

Today, Sanger sequencing is done by attaching four different fluorescent dyes to each “end-track” base. The reaction is done in one vial and runs through a thin gel capillary tube that separates the products based on size. A machine reads and records the different fluorescent signals coming off the molecules as they exit the tube, translating the different signals into base letters.

“File:Cost of sequencing a full human genome, OWID.svg” by Our World In Datais licensed under CC BY 3.0

Sanger sequencing is usually the cheapest and easiest form of sequencing for specific small regions of the genome. But for large scale genome sequencing, the cost, time, and amount of labor rise exponentially. In the case of the HGP, which was completed before NGS was established, 2000 bp fragments of the genome were sequenced from a library of bacteria clones of the human genome. The individual fragment sequences were aligned together to create the full human genome sequence.

Today, there are multiple different methods of next-generation sequencing available that utilize parallel sequencing reactions (multiple reactions happening simultaneously) at a micro-scale to significantly cut time and cost. Next week, I’ll go over some of the cutting edge sequencing techniques that have made getting your genome sequenced easier and cheaper than ever. I’ll also briefly discuss how NGS has catapulted us into the world of commercialized genetics.

Comment below or email me at contact@anyonecanscience.com to let me know what you think about this week’s blog post and tell me what sorts of topics you want me to cover in the future. And subscribe below for weekly science posts sent straight to your email!

Share on Facebook

Cracking The Code: Sanger Sequencing And The Human Genome Project

Share this:

Related