A few days ago I posted about that Bioinformatics had published our paper on the Metaxa2 Database Builder (1). Today, I am happy to report that PeerJ has published the first paper in which the database builder is used to create a new Metaxa2 (2) database! My colleagues at Ohio State University has used the software to build a database for the COI gene (3), which is commonly used in arthropod barcoding. The used region was extracted from COI sequences from arthropod whole mitochondrion genomes, and employed to create a database containing sequences from all major arthropod clades, including all insect orders, all arthropod classes and the Onychophora, Tardigrada and Mollusca outgroups.
Similar to what we did in our evaluation of taxonomic classifiers used on non-rRNA barcoding regions (4), we performed a cross-validation analysis to characterize the relationship between the Metaxa2 reliability score, an estimate of classification confidence, and classification error probability. We used this analysis to select a reliability score threshold which minimized error. We then estimated classification sensitivity, false discovery rate and overclassification, the propensity to classify sequences from taxa not represented in the reference database.
Since the database builder was still in its early inception stages when we started doing this work, the software itself saw several improvements because of this project. We believe that our work on the COI database, as well as on the recently released database builder software, will help researchers in designing and evaluating classification databases for metabarcoding on arthropods and beyond. The database is included in the new Metaxa2 2.2 release, and is also downloadable from the Metaxa2 Database Repository (1). The open access paper can be found here.
- Bengtsson-Palme J, Richardson RT, Meola M, Wurzbacher C, Tremblay ED, Thorell K, Kanger K, Eriksson KM, Bilodeau GJ, Johnson RM, Hartmann M, Nilsson RH: Metaxa2 Database Builder: Enabling taxonomic identification from metagenomic and metabarcoding data using any genetic marker. Bioinformatics, advance article (2018). doi: 10.1093/bioinformatics/bty482
- Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399
- Richardson RT, Bengtsson-Palme J, Gardiner MM, Johnson RM: A reference cytochrome c oxidase subunit I database curated for hierarchical classification of arthropod metabarcoding data. PeerJ, 6, e5126 (2018). doi: 10.7717/peerj.5126
- Richardson RT, Bengtsson-Palme J, Johnson RM: Evaluating and Optimizing the Performance of Software Commonly Used for the Taxonomic Classification of DNA Sequence Data. Molecular Ecology Resources, 17, 4, 760–769 (2017). doi: 10.1111/1755-0998.12628