MGI DNBSEQ-T7 sequencer with MegaBOLT bioinformatics analysis vs Illumina NovaSeq X Series with DRAGEN secondary analysis.
Key takeaways
A head-to-head comparison evaluated WGS using the MGI DNBSEQ-T7 sequencing platform with MegaBOLT v2.4.0 software against the NovaSeq X Series with DRAGEN v4.4 software. This evaluation found that the Illumina solution:
Results in 8–12× fewer SNV and indel errors than MGI
Maintains higher coverage in challenging genomic regions, including GC-rich sequences, homopolymers ≥ 10 base pairs, and dinucleotide and trinucleotide repeats, compared to MGI
Provides more insights into genes relevant to clinical research compared to MGI
Illumina sequencing vs. Complete Genomics/MGI sequencing
The NovaSeq X Series demonstrates the commitment of Illumina to innovate next-generation sequencing (NGS) capabilities and build future methods.[1] Capable of data-intensive applications at production scale, the NovaSeq X Series empowers scientists to make new discoveries. In 2019, Complete Genomics/MGI launched the production-scale DNBSEQ-T7 sequencing platform, including claims that it is capable of sequencing 60 whole human genomes in one day with “exceptionally high accuracy”.[2] Indeed, several independent studies have reported comparable levels of sequencing quality and coverage between the DNBSEQ-T7 sequencing platform and the NovaSeq 6000 System, the predecessor to the NovaSeq X Series.[3][4] To evaluate performance and accuracy claims, Illumina conducted a comparative analysis of WGS using the NovaSeq X Series with DRAGEN v4.4 secondary analysis (the Illumina WGS solution) and the Complete Genomics/MGI DNBSEQ-T7 sequencing platform with MegaBOLT v2.4.0 bioinformatics pipeline (the MGI WGS solution). The results of this evaluation demonstrate that the Illumina WGS solution delivers more accurate variant calling, provides more comprehensive coverage in challenging regions of the genome, and enables more insights into the molecular mechanisms of disease than the MGi WGS solution.
WGS evaluation study design
We sequenced libraries prepared from NA24385 (HG002) reference samples (obtained from the Coriell Institute for Medical Research). Illumina WGS libraries were prepared at Illumina using TruSeq DNA PCR-Free following the manufacturer’s instructions. Libraries were sequenced on the NovaSeq X Plus System with NovaSeq X 25B reagents using 2 × 151 bp read length, followed by secondary analysis with DRAGEN v4.4 software. An additional WGS library for the NA12878 (HG001) reference sample was prepared at Illumina using TruSeq DNA PCR-Free following the manufacturer’s instructions. The library was sequenced on the NovaSeq X Plus System with NovaSeq X 1.5B reagents using 2 × 300 bp read length, followed by secondary analysis with DRAGEN v4.4 software (Table 1).
Separately, we submitted samples to a sequencing core lab that prepared MGI libraries using the DNBSEQ Fast PCR-Free FS Library Prep Kit v2.0 following the manufacturer’s instructions. Furthermore, the TruSeq DNA PCR-Free libraries were converted to be compatible with the DNBSEQ-T7 sequencer using the DNBSEQ Universal Library Conversion Kit, following the manufacturer’s instructions. All libraries were sequenced on the DNBSEQ-T7 platform, and analysis was performed with MegaBOLT v2.4.0 (GATK 4.1.8.1_MGI-2.8.2-hc) software or MegaBOLT v2.4.0 (DeepVariant) software (Table 1).
All data sets were downsampled to 35× coverage depth before removing duplicates to ensure a fair comparison across platforms for the same number of input bases.
Table 1: Experiment design for head-to-head comparison
Higher variant calling accuracy with the NovaSeq X Series and DRAGEN software
The National Institute of Standards and Technology (NIST) Genome in a Bottle (GIAB) benchmarks are widely used to assess the accuracy and performance of WGS and variant calling analysis tools. The NIST v4.2.1 benchmark provides high-confidence genotype calls for single-nucleotide variants (SNVs) and small insertions and deletions (indels) across seven human genomes.[5] In addition, the NIST Challenging MedicallyRelevant Genes (CMRG) benchmark was developed to address 273 clinically important genes that are largely excluded from v4.2.1 due to their location in challenging genomic contexts, including segmental duplications, lowmappability regions, and repetitive sequences.[6]
We compared the variant calling performance of the NovaSeq X Series with DRAGEN v4.4 software against the DNBSEQ-T7 sequencing platform with MegaBOLT v2.4.0 software. The MGI WGS solution resulted in 8–12× more SNV + indel errors than the Illumina WGS solution when assessed against the NIST v4.2.1 benchmark (Figure 1A) and 4–6× more SNV + indel errors against the NIST CMRG benchmark (Figure 1B). Of note, the MGI solution resulted in more variant calling errors in genes relevant to clinical research, including HLA-A,[7] TUBB8,[8] and PDE4D,[9] compared to the Illumina solution (Figure 2).
MGI DNBSEQ sequencing technology relies on making DNA Nanoballs (DNBs) from linear DNA libraries that are loaded onto MGI platforms for sequencing. For researchers that want to sequence non-DNBSEQ libraries on MGI platforms, MGI offers the DNBSEQ Universal Library Conversion Kit. DNBSEQ library conversion includes an adapter conversion PCR amplification step, which could introduce errors and genomic coverage bias. We evaluated variant calling accuracy of Illumina TruSeq DNA PCR-Free libraries that were converted and sequenced on the DNBSEQ-T7 platform. As expected, DNBSEQ library conversion resulted in a 5-fold increase in indel errors and a 2-fold increase in total errors when assessed against the NIST 4.2.1 (Figure 1A) and NIST CMRG benchmarks (Figure 1B).
Figure 1: Significantly more errors in variant calling using the MGI WGS solution─The DNBSEQ-T7 sequencing platform with MegaBOLT v2.4.0 software produced 8–12× more errors in variant calling for SNVs + indels, compared to the NovaSeq X Series with DRAGEN v4.4 software, assessed against (A) the NIST v4.2.1 benchmark and 4–6× more errors in variant calling for SNVs + indels assessed against (B) the NIST CMRG benchmark. Conversion of Illumina libraries to run on the DNBSEQ-T7 resulted in a 5-fold more indel errors and 2-fold more total errors when assessed against (A) the NIST 4.2.1 benchmark and (B) the NIST CMRG benchmark, compared to DNBSEQ libraries run on the DNBSEQ-T7. Variant calling errors are defined as the number of false positives (a variant is called that is not present in the benchmark) and false negatives (a variant present in the benchmark is not called). Median values across replicates are reported.
Figure 2: Increased errors in variant calling using the MGI WGS solution─Sequencing MGI native libraries with the DNBSEQ-T7 sequencing platform with MegaBOLT v2.4.0 software resulted in increased variant calling errors in genes relevant to clinical research, compared to the Illumina solution. We built this set of 4730 genes by bringing together disease-associated genes from trusted clinical databases, including ClinVar,[10] DECIPHER,[11] COSMIC,[12] and Genomics England PanelApp,[13] and incorporating additional genes highlighted by our clinical research collaborators.
Increased genome coverage with the NovaSeq X Series Solution
We evaluated the performance of both systems in challenging GC-rich regions. The results show that relative genome coverage with the MGI WGS solution dropped with GC content > 60% and dropped significantly in GC-rich regions (> 80%), compared to the Illumina WGS solution (Figure 3). Additionally, in genomic regions that are difficult to sequence, including G-quadruplexes, homopolymers ≥ 10 bp, dinucleotide repeats, and trinucleotide repeats, the DNBSEQ-T7 solution experienced coverage losses up to 30% compared to the Illumina solution, which may compromise variant detection accuracy, reduce confidence in calls, and increase the risk of missing important variants in genes for clinical research, especially repeats, indels, and structural variants (Figure 4).[14-16] Drops in genome coverage with the DNBSEQ-T7 solution were seen with MGI libraries and converted TruSeq DNA PCR-Free libraries (Figures 3 and 4). Limited coverage in these regions could lead to the omission of disease-associated genes, reducing the reliability of their subsequent analysis and interpretation. One example is the KMT2A (lysine methyltransferase 2A) gene, which encodes a transcriptional coactivator that functions during early development and hematopoiesis. Mutations in the KMT2A gene have been associated with Wiedmann-Steiner syndrome and several forms of leukemia (Figure 5).[17-19]
Figure 3: Loss of coverage in GC-rich regions with the MGI WGS solution─WGS of MGI native libraries and converted TruSeq DNA PCR-Free libraries on the DNBSEQ-T7 solution resulted in loss of coverage in repetitive, GC-rich regions of the genome, compared to WGS on the NovaSeq X Series. Normalized coverage is plotted to ensure data is comparable across samples or conditions and is calculated by dividing the read depth at a specific position by the mean coverage of the genome.
Figure 4: Loss of coverage in challenging regions with the DNBSEQ-T7 solution─WGS of MGI native libraries and converted TruSeq DNA PCR-Free libraries on the DNBSEQ-T7 solution showed significant reduction in coverage in regions challenging to sequence compared to the NovaSeq X Series.
Figure 5: Loss of coverage in the KMT2A gene with the MGI WGS solution─Sequencing of MGI native libraries with the DNBSEQ-T7 platform resulted in loss of coverage in disease-relevant genes with high GC content, such as the KMT2A gene, compared to WGS on the NovaSeq X Series.
Reduced error rates with the NovaSeq X Series drive innovation
Sequencing on the NovaSeq X Series resulted in reduced error rates across Read 1 and Read 2 compared to the DNBSEQ-T7 sequencing platform (Figure 6). The reduced error rates allow for extension of the sequencing read length out to 300 cycles, enabling more applications to be run on the NovaSeq X Series.
Figure 6: Reduced error rates across sequencing reads with the NovaSeq X Series compared to the DNBSEQ-T7 sequencing platform─Sequencing TruSeq DNA PCR-Free libraries with a mean insert size of 550 bp on the NovaSeq X Series resulted in reduced error rates across Read 1 and Read 2, even out to 300 bp read length, compared to the DNBSEQ-T7 sequencing platform.
Summary
After evaluating the NovaSeq X Series with DRAGEN software and the DNBSEQ-T7 sequencing platform with MegaBOLT software, the results demonstrate the superior performance of Illumina WGS solution compared to the Complete Genomics/MGI WGS solution. The Illumina solution delivers higher accuracy and provides comprehensive coverage across the genome, including challenging regions to deliver biological insights into biologically relevant genes.
Illumina is a trusted global leader in genomics with 27 years of expertise and continues to provide comprehensive support and best-in-class product consistency, setting the standard for NGS solutions. The NovaSeq X Series and DRAGEN secondary analysis solution delivers accuracy and quality for comprehensive WGS at any scale.
Related links
NovaSeq X Series
DRAGEN secondary analysis
Whole-genome sequencing
Illumina SBS chemistry
Have a question about WGS?
Have questions about how Illumina platforms compare to Ultima Genomics? We can help you determine the total cost of ownership and recommend the best solution for your setup.
References
- Illumina. 25 greatest impacts in 25 years: Illumina and the evolution of genomics. illumina.com/company/news-center/feature-articles/25-greatest-impacts-in-25-years--a-look-back-at-illumina-and-the.html. Published April 3, 2023. Accessed November 20, 2025.
- MGI. MGI Tech Complete Genomics, part of MGI, Announces Next-Generation Sequencing Platforms at ASHG Annual Meeting. global-mgitech.com/mgi-tech-complete-genomics-part-of-mgi-announces-next-generation-sequencing-platforms-at-ashg-annual-meeting/ Published October 25, 2022. Accessed November 20, 2025.
- Jeon SA, Park JL, Park SJ, et al. Comparison between MGI and Illumina sequencing platforms for whole genome sequencing. Genes Genomics. 2021;43(7):713-724. doi:10.1007/s13258-021-01096-x
- Kim HM, Jeon S, Chung O, et al. Comparative analysis of 7 short-read sequencing platforms using the Korean Reference Genome: MGI and Illumina sequencing benchmark for whole-genome sequencing. Gigascience. 2021;10(3):giab014. doi:10.1093/gigascience/giab014
- Wagner J, Olson ND, Harris L, et al. Benchmarking challenging small variants with linked and long reads. Cell Genom. 2022;2(5):100128. doi:10.1016/j.xgen.2022.100128
- Wagner J, Olson ND, Harris L, et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol. 2022;40(5):672-680. doi:10.1038/s41587-021-01158-1
- Howell WM. HLA and disease: guilt by association. Int J Immunogenet. 2014;41(1):1-12. doi:10.1111/iji.12088
- Sferra A, Petrini S, Bellacchio E, et al. TUBB Variants Underlying Different Phenotypes Result in Altered Vesicle Trafficking and Microtubule Dynamics. Int J Mol Sci. 2020;21(4):1385. Published 2020 Feb 18. doi:10.3390/ijms21041385
- Das S, Roy S, Munshi A. Association between PDE4D gene and ischemic stroke: recent advancements. Int J Neurosci. 2016;126(7):577-583. doi:10.3109/00207454.2015.1051621
- Landrum MJ, Chitipiralla S, Brown GR, et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 2020;48(D1):D835-D844. doi:10.1093/nar/gkz972
- Foreman J, Brent S, Perrett D, et al. DECIPHER: Supporting the interpretation and sharing of rare disease phenotype-linked variant data to advance diagnosis and research. Hum Mutat. 2022;43(6):682-697. doi:10.1002/humu.24340
- Sondka Z, Dhir NB, Carvalho-Silva D, et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 2024;52(D1):D1210-D1217. doi:10.1093/nar/gkad986
- Stark Z, Foulger RE, Williams E, et al. Scaling national and international improvement in virtual gene panel curation via a collaborative approach to discordance resolution. Am J Hum Genet. 2021;108(9):1551-1557. doi:10.1016/j.ajhg.2021.06.020
- Chen H, Wang B, Cai L, et al. The performance of homopolymer detection using dichromatic and tetrachromatic fluorogenic next-generation sequencing platforms. BMC Genomics. 2024;25(1):542. Published 2024 May 31. doi:10.1186/s12864-024-10474-0
- Jeanjean SI, Shen Y, Hardy LM, et al. A detailed analysis of second and third-generation sequencing approaches for accurate length determination of short tandem repeats and homopolymers. Nucleic Acids Res. 2025;53(5):gkaf131. doi:10.1093/nar/gkaf131
- Hijikata A, Suyama M, Kikugawa S, et al. Exome-wide benchmark of difficult-to-sequence regions using short-read next-generation DNA sequencing. Nucleic Acids Res. 2024;52(1):114-124. doi:10.1093/nar/gkad1140
- Feldman HR, Dlouhy SR, Lah MD, Payne KK, Weaver DD. The progression of Wiedemann-Steiner syndrome in adulthood and two novel variants in the KMT2A gene. Am J Med Genet A. 2019;179(2):300-305. doi:10.1002/ajmg.a.60698
- Forgione MO, McClure BJ, Eadie LN, Yeung DT, White DL. KMT2A rearranged acute lymphoblastic leukaemia: Unravelling the genomic complexity and heterogeneity of this high-risk disease. Cancer Lett. 2020;469:410-418. doi:10.1016/j.canlet.2019.11.005
- Shimony S, Luskin MR. Unraveling KMT2A-rearranged ALL. Blood. 2023;142(21):1764-1766. doi:10.1182/blood.2023021942