"Infinium HD is remarkable. Most samples have an average call rate of 99.9% off the shelf."
Kevin Shianna, Ph.D., has built the Institute for Genome Sciences and Policy Genotyping Facility at Duke University into one of the highest throughput academic genotyping facilities in the United States. An early adopter of Illumina's GoldenGate Genotyping technology, Dr. Shianna now relies on Illumina's broad portfolio of genetic analysis tools to support his institution's research.
We started using Illumina's GoldenGate Genotyping Assay for a large project we did on epilepsy, and we were pleased with the data quality. When the Infinium BeadChips became available, we jumped to doing genome-wide association studies. As far as using the Illumina’s Infinium HD arrays over others, it's a no-brainer. You might pay a little more for Infinium arrays, but the data quality you get back is what really matters, particularly for downstream statistical analysis. You put bad data in, you get bad results out. Also, the use of tag SNPs versus random SNPs offers greater statistical genomic coverage, which has been invaluable to us.
We think our data curation has to be one of the most stringent in the field. For our curation, we'll recluster any SNP that is below a call frequency of 99%. Then we’ll go through and filter the data based on a few metrics and look at 5,000 to 10,000 SNPs to make sure that the reclustering didn't introduce any miscalls. At the end, we have this threshold we call the "one-percent rule" which basically means we delete any SNP that has more than one percent of samples not being called or ambiguously called. Even with our very stringent curation we only lose 1 to 3 percent of the SNPs from our Infinum BeadChip data.
The original Infinium II Assay was great. But Infinium HD is remarkable. Most samples have an average call rate of 99.9% off the shelf. A 99.9% average call rate makes a huge difference on the downstream curation of data. It has made curation a lot easier. And for those who aren’t curating their data, there's going to be much less downstream work because there will be less likelihood of false positives and negatives due to the higher sample call rates. In addition to data quality, the ability to run four samples per chip clearly gets us to the answers a lot faster.
For studying copy number variation, we’ve found that the Human1M BeadChip gives us great coverage. We are using the PennCNV algorithm to analyze CNV data. We treat deletions or duplications just like we would treat a SNP. For example, we systematically go through and assign genotypes to deletions as an AB for heterozygous
or hemizygous deletion, AA for homozygous deletion, and BB for wild-type. These deletions and duplications will go into the same type of statistical pipeline that we've set up for looking for specific associations with a SNP. Once we find something that's associated with whatever CNV genotype we're looking at, we'll go back and manually look at the data and fine-map the breakpoints. We've run probably 1,500 to 2,000 samples on this array and can say that in most regions of the genome, we can identify breakpoints within 10–20 kb.
At the end of 2007, we published a paper in Science describing results from a genome-wide association study using Infinium BeadChips. We were looking for genetic variation involved with control of HIV viral load. We found two genetic variants that are involved with control, and we found another genome-wide significant association with HIV disease progression. Interestingly, all three variants are in the MHC region on chromosome 6, a region full of genes involved with immune function. That's really exciting, but it makes a functional follow-up study very difficult because the region is so repetitive. So one thing we're doing now to try to validate some of these results is flow sorting cell lines that have the specific genotypes that we're interested in. We used the Genome Analyzer to sequence flow-sorted chromosomes from a couple of cell lines and the data looked great. We received error rates of about 0.5% at 36 cycles, which is excellent.
We've found that having both platforms to run genome-wide genotyping and sequencing from one company has been very advantageous for us. Using Illumina's technology, we can perform a genome-wide association study to cover our bases for common variants and then go back and sequence a subset of the sample population to identify rare variants with the Genome Analyzer. After that we can create a custom Infinium iSelect BeadChip with rare SNPs to screen our population of interest. We are taking this approach in one of our current studies.
Kevin Shianna, Ph.D., has built the Institute for Genome Sciences and Policy Genotyping Facility at Duke University into one of the highest throughput academic genotyping facilities in the United States. An early adopter of Illumina's GoldenGate Genotyping technology, Dr. Shianna now relies on Illumina's broad portfolio of genetic analysis tools to support his institution's research.