Last year we spoke with Dr. Yuan Gao about his experience using Illumina's Genome Analyzer.
Since then, he has published a paper in Nature Methods and is preparing four more manuscripts
presenting data generated by the Genome Analyzer. He recently sat down with us to discuss his
current accomplishments with this high-throughput sequencing system.
Yes, we are now 1000× more efficient at capturing our genomic regions of interest, which are
50,000 exons in the human genome, than at the time of our paper. At the time of the Nature
Methods publication we could only capture about 20%, or approximately 10,000 exons. There
were definitely some efficiency problems, which were due to the capture protocol. In a
half year’s work and over numerous runs, we have increased our exon capture efficiency
over 1000× such that we can now capture more than 90% of the same 50,000 exon targets.
This dramatic increase in efficiency is a result of very fast turnaround time for evaluating
and optimizing our capture protocol through joint effort with Dr. Jin Billy Li and Dr. Kun
Zhang from Dr. Church's lab at Harvard and Dr. Zhang's lab at UCSD. Such speedy results
were made possible by the ease of use, the fast workflow, and the data quality of the Genome
Analyzer. We are able to prepare a sample with several conditions, run it, and have feedback
in minimal time. The speed and the simplicity of the Genome Analyzer workflow helped us to
dramatically shorten the optimization cycles and reach our goal faster.
My lab is very small. It’s me, one assistant, and one student. With the Genome
Analyzer, we have a very powerful tool. If we can dream up some experiments where we
can leverage this sequencing technology, we can try it out relatively quickly, and with ease.
We can do a lot of exploration work and perform a wide range of experiments. Besides the
targeted resequencing work, I am involved with comparative whole-genome studies of E. coli
and Geobacter. My collaborators, Dr. Palsson's lab and Dr. Lovley's lab, have been evolving
E. coli and Geobacter to use different energy sources or different electron acceptors and
need to find the causative mutations. So far, we have sequenced more than 20 strains and
were able to identify every mutation reported by a previous study published in Nature
Genetics in 2006 using tiling arrays and mass spec. Additionally, we discovered false
negatives they missed before.
Tiling arrays have very high false positive rate, and even
after mass spec filtering, there are a lot of false positives and it is thus quite time
consuming to verify them individually. We essentially do not have false positive as our
coverage is very high and the consensus accuracy is thus very good. We even identified
a few reference genome errors in the process. We are preparing a manuscript comparing
the tiling array and Illumina sequencing platforms.
We are also sequencing various strains of Geobacter, a bacteria that can use organic
waste to produce electricity. The strain of Geobacter evolved at Dr. Lovley’s lab
is extremely interesting because this organism has accumulated so many mutations. My
collaborators spent over a year using tiling arrays to study this organism, but it was
difficult because of the high number of mutations. Using the Genome Analyzer, we
discovered more than 10,000 mutations in the first run.
We are preparing manuscripts for both of these experiments after just a few months of work. Without the Genome
Analyzer, it would have taken years to get this kind of work done and published.
It’s really the simplicity of the workflow, the ease of library construction
and the automation, the speed and amount of data generated by the Genome Analyzer
that has enabled us to obtain these amazing data. Any small lab will be able to do
the same amount of work we are doing, if not more.
Last year when you interviewed me, I projected I would be getting 3 Gb by this summer,
which we are routinely generating for single-read runs. We are getting an average read
length of 40 to 41 bases, and we've calculated our accuracy rate out to 32 bases to be
over 99%, which we are very pleased with. Now with our paired-end module, we are going
to start doing de novo sequencing of small genomes and we expect to double our output
per run.
Illumina’s long-insert paired-end protocol will give us a big advantage for
assembling repetitive and homologous regions. But there are small indels,
maybe only 1 or 2 bp, that we are also interested in studying. Combining the current
short-insert protocol with the long-insert protocol will offer some theoretical
advantage which hopefully will let us move a lot of expensive work from other
sequencing platforms to the Genome Analyzer.
- Read the October 2007 iCommunity article "How to Sequence DNA at 200 MPH." (PDF)
|
Yuan Gao, Ph.D.
Center for the
Study of Biological Complexity and
Department of Computer Science,
Virginia Commonwealth University
"The operational cost, ease of use, and scalability of the Genome Analyzer
has put the power of large-scale genomic experimentation into my hands. My small
lab is now doing the kinds of experiments once only possible at the large genome
centers. Its low sample input requirements, simple workflow, high-quality data,
and applications flexibility distinguishes the Illumina Genome Analyzer from
other high-throughput sequencing technologies."
- User Experiences
|

|