Data Sets

A subset of Illumina sequencing data generated internally or externally can be downloaded from the appropriate subsection below. Data sets were contributed from original manuscripts or made available through the company website. URLs to the online data locations are listed below.

For additional data sets that can be filtered by species, system, or application, visit our Data Library.


Illumina-Generated Data:
Description Platform File Type Suitable Analysis1 Download
Sequencing Runs
Chr 21 NA185074 Illumina v3 cBOT chemistry HiSeq3 BAM Q,C,V,FN [Download]
Chr 19 NA185074 Illumina v3 cBOT chemistry HiSeq3 BAM Q,C,V,FN [Download]
Chr 4 NA185074 Illumina v3 cBOT chemistry HiSeq3 BAM Q,C,V,FN [Download]
Chr 21 NA192402 Illumina GA IIx BAM Q,C,V,FN [Download]
Chr 21 NA192402 Illumina HiSeq BAM Q,C,V,FN [Download]
Chr 21 NA185072 Illumina GA IIx BAM Q,C,V,FN [Download]
E. coli (MG1655) Read 1
Illumina GA IIx BAM Q,C [Download]
E. coli (MG1655) Read 2
Illumina GA IIx BAM Q,C [Download]
E. coli (MG1655) Paired
Illumina GA IIx BAM Q,C [Download]
Variant Calls
NA18507 SNPs Illumina GA IIx TXT T,FN [Download]
NA18506 SNPs Illumina GA IIx TXT T,FN [Download]
NA18508 SNPs Illumina GA IIx TXT T,FN [Download]
NA19240 SNPs Illumina GA IIx TXT FN [Download]
NA19240 Indels Illumina GA IIx TXT FN [Download]
Coverage Output
NA19240 coverage2 Illumina GA IIx TXT G [Download]

1 Q = Quality Scores; C = Coverage; V = Variant Calling; T = Trio Inheritance; FN = False Negatives; G = Gene Region Gaps
2 Chromosome 21. All human data were aligned to the b36/hg18 reference genome.
3 v3 cBOT Kit Chemistry, available Q2, 2011.
4 All data generated with v3 cBOT Kit Chemistry on HiSeq2000 and aligned to the b37/hg19 reference genome.

 

 

Externally Generated Data:
Description Platform File Type Suitable Analysis1 URL
Sequencing Runs
KB1 (Bushman)2 Illumina GA IIx BAM Q,C,V,FN ftp://ftp.bx.psu.edu/data/bushman/hg18/bam/
as of 13-10-2010
ABT (Bantu)2 SBL (version 3)
BAM Q,C,V,FN ftp://ftp.bx.psu.edu/data/bushman/hg18/bam/
as of 13-10-2010
Human1 (NA18507)3 Illumina GA II FASTQ Q,C,V,FN http://www.ncbi.nlm.nih.gov/sra/?term=SRA000271
as of 13-10-2010
E. coli (MG1655) Illumina GA IIx FASTQ Q,C, http://www.ebi.ac.uk/ena/data/view/ERP000092
as of 13-10-2010
E. coli (DH10B) SBL (version 4)
CSFASTA; QUAL Q,C http://solidsoftwaretools.com/gf/project/dh10bfrag/
as of 13-10-2010
1000 Genomes4 Illumina, SBL
BAM Q,C,V,T,FN ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data
as of 13-10-2010
Variant Calls
NA18507 SNPs SBL (version 2) TXT FN http://solidsoftwaretools.com/gf/project/yoruban
as of 13-10-2010
NA18507 Indels SBL (version 2) TXT FN http://solidsoftwaretools.com/gf/project/yoruban
as of 13-10-2010

1 Q = Quality Scores; C = Coverage; V = Variant Calling; T = Trio Inheritance; FN = False Negatives
2 Complete Khoisan and Bantu genomes from southern Africa. All human data were aligned to the b36/hg18 reference genome.  (Schuster SC, et al. Nature. 2010 Feb 18; 463 (7283): 943-947)
3 Accurate whole human genome sequencing using reversible terminator chemistry.  All human data were aligned to the b36/hg18 reference genome. (Bentley DR, et al. Nature. 2008 Nov 6; 456 (7218): 53-59)
4 1000 Genomes files can be used by individuals for personal script assessment, but cannot be published or used for competitive purposes. Data from 1000 Genomes are aligned to the b37/hg19 reference genome.

Reference Files:
Description File Type Download
Reference Genomes
NCBI build 36 FASTA http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/
as of 13-10-2010
NCBI build 37 FASTA http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/
as of 13-10-2010
NCBI build 36 Chr 21 FASTA [Download]
E. coli (MG1655) FASTA ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655
as of 13-10-2010
E. coli (DH10B) Option 1
FASTA ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__DH10B/
as of 13-10-2010
E. coli (DH10B) Option 2
FASTA http://solidsoftwaretools.com/gf/project/dh10bfrag/DH10B_WithDup_FinalEdit_validated.fasta.zip
as of 13-10-2010
Variant Databases
dbSNP130 TXT ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/b130_archive/
as of 13-10-2010
NA192401 SNPs
TXT [Download]
NA192401 Indels
TXT [Download]
NA19240&Yoruba2 SNPs
TXT [Download]
NA19240&Yoruba2 Indels
TXT [Download]
NA19240&18507& Yoruba3 SNPs
TXT [Download]
NA19240&18507& Yoruba3 Indels
TXT [Download]
Gene Databases
OMIM genes4 BED [Download]

1 Consists of those dbSNP 130 variants that were also observed in a capillary sequencing study of NA19240 (Kidd, et al. Nature. 2008 May 1; 453 (7191): 56-64)
2 Consists of those dbSNP 130 variants that were also observed in a capillary sequencing study of NA19240 and in at least one other Yoruban individual (of NA18506, NA18507, and NA18508) (Kidd, et al. Nature. 2008 May 1; 453 (7191): 56-64)
3 Consists of those dbSNP 130 variants that were also observed in a capillary sequencing study of NA19240 and NA18507, and at least one other Yoruban individual (of NA18506 and NA18508) (Kidd, et al. Nature. 2008 May 1; 453 (7191): 56-64)
4 Also available from http://genome.ucsc.edu/cgi-bin/hgTables?command=start

References

  • Evaluation of next generation sequencing platforms for population targeted sequencing studies. Harismendy O, et al. Genome Biol. 2009; 10 (3): R32.
  • Complete Khoisan and Bantu genomes from southern Africa. Schuster SC, et al. Nature. 2010 Feb 18; 463 (7283): 943-947.
  • Mapping and sequencing of structural variation from eight human genomes. Kidd JM, et al. Nature. 2008 May 1; 453 (7191): 56-64.
  • Whole exome capture in solution with 3 Gbp of data. Bainbridge MN,et al. Genome Biol. 2010; 11 (6): R62.
  • Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. McKernan KJ,et al. Genome Res. 2009 Sep; 19 (9): 1527-1541.
  • Accurate whole human genome sequencing using reversible terminator chemistry. Bentley DR, et al. Nature. 2008 Nov 6; 456 (7218): 53-9.