Skip to main content

Deep correction of DNA sequencing errors by data mining algorithms

Project Member(s): Li, J., Hutvagner, G., Catchpoole, D.

Funding or Partner Organisation: Australian Research Council (ARC Discovery Projects)
Australian Research Council (ARC Discovery Projects)

Start year: 2018

Summary: Deep correction of DNA sequencing errors by data mining algorithms. This project aims to investigate the many layers of error correction problems in the terabytes of genomic sequence data, and aims to solve these problems by novel data mining algorithms. High-throughput sequencing platforms have generated massive amounts of useful raw data, but also made widespread errors. The new algorithms are capable of correcting errors at deeper layers to further enhance data quality. Expected outcome includes the knowledge advancement of genomic data industry and interdisciplinary collaboration between biotechnology and data mining. This also provides significant benefit for genomic decisions in forensics and personalised medicine which demand accurate genomic information.


Lan, C, Peng, H, Hutvagner, G & Li, J 2019, 'Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information', BMC Genomics, vol. 20, no. S9, p. 943.
View/Download from: Publisher's site

Liu, Y, Yu, Z, Dinger, ME & Li, J 2019, 'Index suffix–prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression', Bioinformatics, vol. 35, no. 12, pp. 2066-2074.
View/Download from: Publisher's site

Smith, CM, Catchpoole, D & Hutvagner, G 2019, 'Non-Coding RNAs in Pediatric Solid Tumors', Frontiers in Genetics, vol. 10.
View/Download from: Publisher's site

Peng, H, Zheng, Y, Zhao, Z, Liu, T & Li, J 2018, 'Recognition of CRISPR/Cas9 off-target sites through ensemble learning of uneven mismatch distributions', Bioinformatics, vol. 34, no. 17, pp. i757-i765.
View/Download from: Publisher's site

Lan, C, Peng, H, McGowan, EM, Hutvagner, G & Li, J 2018, 'An isomIR expression panel based novel breast cancer classification approach using improved mutual information'.

FOR Codes: Application Tools and System Utilities, Pattern Recognition and Data Mining, Bioinformatics Software, Applications in life sciences, Data quality, Information systems, technologies and services not elsewhere classified