"With the Genome Analyzer, small labs like ours can do very big projects in a very short time. This machine is the equivalent of a genome center in a box!"
Dr. Ecker is one of the leading authorities on plant molecular biology and genetics. He was a principal investigator in the multinational project that sequenced the Arabidopsis thaliana genome and is a leading member of the Arabidopsis 2010 Project, which aims to gather information about the functions of all Arabidopsis genes by the year 2010.
The interest of the lab is to understanding the genetic and epigenetic changes that result in phenotypic variation. Our main interest is in plants, particularly Arabidopsis, although we work with human cells as well. Our lab is very much discovery driven, not hypothesis-driven. Our rational is: if we use this technology at the whole genome level, we will surely discover something interesting. That’s about the only hypothesis we have going into such as study and we have never been disappointed by this approach. For example, we know that we don’t understand all the transcripts in the genome. Now we are moving discovery to the sequence level, which is something we would have wanted to do in the first place, but we couldn’t afford to do these experiments at this scale.
So, by necessity, we used array-based methods with inherently lower resolution. Ultimately, the level of resolution that we need to understand gene structure, to understand methylation, to understand DNA sequence variation, is sequence. We are formulating a plan with our colleagues here in the U.S. and around the world to carry out DNA sequencing for over 1,001 Arabidopsis strains. When completed, this sequence data will allow for whole genome association studies with an unprecedented level of resolution. We are extremely excited about this plan, which would not have been conceivable without the Genome Analyzer.
We started sequencing the methylome in March. In a matter of five months we are now very close to competing this work. This was unthinkable a year ago. You need to be in the 40-50X coverage range because you are looking at a population of cells. If you look at 50 chromosomes there may be methylation, or methylation at a particular site, so if you want to get quantification of how much there is at a specific site you need to sequence very deep. So the question is not only is this specific cytosine methylated or not, but also is it methylated 20% of the time, 40%, or 80% of the time. If we were doing Sanger sequencing and doing 50-fold coverage, that would be like sequencing the entire human genome at ~2X coverage.
The resolution and signal the Genome Analyzer provides is much higher than our previous array-based data. We are seeing very good correlation with tiling arrays, but the Genome Analyzer picks up a lot of information that arrays missed. We’ve found that one kind of event, where the methylation isn’t clustered but spread out, is almost always completely missed in tiling arrays because there are not enough cytosines there for the antibodies to pull down. So that enrichment approach fails when the level of cytosine methylation drops really low. This methylation may be important in regulating genes versus regulating transposons. There is a good correlation between silencing and CG methylation and asymmetric methylation being more involved in regulation than gene expression. So finding the sequence in these cases is going to be particularly useful because we are not seeing a lot of them on the arrays.
I’m sure there are many more applications that people will come up with now that they can look at a billion sequences as their readout. So I think we are just seeing the tip of the iceberg. The papers that have come out are about the things that are fairly obvious to do. People will think of things that I still haven’t thought of such as interaction mapping using DNA sequence as the readout. The Cluster Station* will allow you to do things with it just by itself. You are essentially able to create a substrate you can do something to. The Cluster Station* is creating dense clusters of molecules; you are essentially creating a custom array. You could, for example, use this for binding of proteins. You are creating a millions of double-stranded molecules on a piece of glass, and the only question is what are you going to do with it?
* cBot carries out the same function as the Cluster Station
Dr. Ecker is one of the leading authorities on plant molecular biology and genetics. He was a principal investigator in the multinational project that sequenced the Arabidopsis thaliana genome and is a leading member of the Arabidopsis 2010 Project, which aims to gather information about the functions of all Arabidopsis genes by the year 2010.