User Experiences: Q&A with Yuan Gao

Last year we spoke with Dr. Yuan Gao about his experience using Illumina's Genome Analyzer. Since then, he has published a paper in Nature Methods and is preparing four more manuscripts presenting data generated by the Genome Analyzer. He recently sat down with us to discuss his current accomplishments with this high-throughput sequencing system.


Have you made improvements to the targeted resequencing method described in your October, 2007 Nature Methods publication?

Yes, we are now 1000× more efficient at capturing our genomic regions of interest, which are 50,000 exons in the human genome, than at the time of our paper. At the time of the Nature Methods publication we could only capture about 20%, or approximately 10,000 exons. There were definitely some efficiency problems, which were due to the capture protocol. In a half year’s work and over numerous runs, we have increased our exon capture efficiency over 1000× such that we can now capture more than 90% of the same 50,000 exon targets.

This dramatic increase in efficiency is a result of very fast turnaround time for evaluating and optimizing our capture protocol through joint effort with Dr. Jin Billy Li and Dr. Kun Zhang from Dr. Church's lab at Harvard and Dr. Zhang's lab at UCSD. Such speedy results were made possible by the ease of use, the fast workflow, and the data quality of the Genome Analyzer. We are able to prepare a sample with several conditions, run it, and have feedback in minimal time. The speed and the simplicity of the Genome Analyzer workflow helped us to dramatically shorten the optimization cycles and reach our goal faster.


Can you describe your lab and the type of research you can now do with the Genome Analyzer?

My lab is very small. It’s me, one assistant, and one student. With the Genome Analyzer, we have a very powerful tool. If we can dream up some experiments where we can leverage this sequencing technology, we can try it out relatively quickly, and with ease. We can do a lot of exploration work and perform a wide range of experiments. Besides the targeted resequencing work, I am involved with comparative whole-genome studies of E. coli and Geobacter. My collaborators, Dr. Palsson's lab and Dr. Lovley's lab, have been evolving E. coli and Geobacter to use different energy sources or different electron acceptors and need to find the causative mutations. So far, we have sequenced more than 20 strains and were able to identify every mutation reported by a previous study published in Nature Genetics in 2006 using tiling arrays and mass spec. Additionally, we discovered false negatives they missed before.

Tiling arrays have very high false positive rate, and even after mass spec filtering, there are a lot of false positives and it is thus quite time consuming to verify them individually. We essentially do not have false positive as our coverage is very high and the consensus accuracy is thus very good. We even identified a few reference genome errors in the process. We are preparing a manuscript comparing the tiling array and Illumina sequencing platforms.

We are also sequencing various strains of Geobacter, a bacteria that can use organic waste to produce electricity. The strain of Geobacter evolved at Dr. Lovley’s lab is extremely interesting because this organism has accumulated so many mutations. My collaborators spent over a year using tiling arrays to study this organism, but it was difficult because of the high number of mutations. Using the Genome Analyzer, we discovered more than 10,000 mutations in the first run.

We are preparing manuscripts for both of these experiments after just a few months of work. Without the Genome Analyzer, it would have taken years to get this kind of work done and published. It’s really the simplicity of the workflow, the ease of library construction and the automation, the speed and amount of data generated by the Genome Analyzer that has enabled us to obtain these amazing data. Any small lab will be able to do the same amount of work we are doing, if not more.


Can you share your thoughts on Genome Analyzer data quality, paired-end sequencing, and the future of longer reads?

Last year when you interviewed me, I projected I would be getting 3 Gb by this summer, which we are routinely generating for single-read runs. We are getting an average read length of 40 to 41 bases, and we've calculated our accuracy rate out to 32 bases to be over 99%, which we are very pleased with. Now with our paired-end module, we are going to start doing de novo sequencing of small genomes and we expect to double our output per run.

Illumina’s long-insert paired-end protocol will give us a big advantage for assembling repetitive and homologous regions. But there are small indels, maybe only 1 or 2 bp, that we are also interested in studying. Combining the current short-insert protocol with the long-insert protocol will offer some theoretical advantage which hopefully will let us move a lot of expensive work from other sequencing platforms to the Genome Analyzer.


- Read the October 2007 iCommunity article "How to Sequence DNA at 200 MPH." (PDF)
Yuan Gao, Ph.D.

Yuan Gao, Ph.D.
Center for the Study of Biological Complexity and Department of Computer Science, Virginia Commonwealth University "The operational cost, ease of use, and scalability of the Genome Analyzer has put the power of large-scale genomic experimentation into my hands. My small lab is now doing the kinds of experiments once only possible at the large genome centers. Its low sample input requirements, simple workflow, high-quality data, and applications flexibility distinguishes the Illumina Genome Analyzer from other high-throughput sequencing technologies."

- User Experiences




©2008 Illumina, Inc. All rights reserved.

contact  legal  privacy