On Friday, Molecular Ecology Resources put online Christian Wurzbacher‘s latest paper, of which I am also a coauthor. The paper presents three sets of general primers that allow for amplification of the complete ribosomal operon from the ribosomal tandem repeats, covering all the ribosomal markers (ETS, SSU, ITS1, 5.8S, ITS2, LSU, and IGS) (1). This paper is important because it introduces a technique to utilize third generation sequencing (PacBio and Nanopore) to generate high‐quality reference data (equivalent or better than Sanger sequencing) in a high‐throughput manner. The paper shows that the quality of the Nanopore generated sequences was 99.85%, which is comparable with the 99.78% accuracy described for Sanger sequencing.
My main contribution to this paper is the consensus sequence generation script – Consension – which is available from my software page. Importantly, there are huge gaps in the reference databases we use for taxonomic classification and this method will facilitate the integration of reference data from all of the ribosomal markers. We hope that this work will stimulate large-scale generation of ribosomal reference data covering several marker genes, linking previously spread-out information together.
One of the questions I have received regarding Metaxa2 is if it is possible to use it on other DNA barcodes. My answer has been “technically, yes, but it is a very cumbersome process of creating a custom database for every additional barcode”. Not anymore, the newly introduced Metaxa2 Database Builder makes this process automatic, with the user just supplying a FASTA file of sequences from the region in question and a file containing the taxonomy information for the sequences (in GenBank, NSD XML, Metaxa2 or SILVA-style formats). The preprint (1) has been out for some time, but today Bioinformatics published the paper describing the software (2).
The paper not only details how the database builder works, but also shows that it is working on a number of different barcoding regions, albeit with different results in terms of accuracy. Still, even with seemingly high misclassification rates for some DNA barcodes, the software performs better than a simple BLAST-based taxonomic assignment (76.5% vs. 41.4% correct classifications for matK, and 76.2% vs. 45.1% for tnrL). The database builder has already found use in building a COI database for anthropods (3), and we envision a range of uses in the near future.
As the paper is now published, I have also moved the Metaxa2 software (4) from beta-status to a full-worthy version 2.2 update. Hopefully, this release should be bug free, but my experience is that when the community gets their hands of the software they tend to discover things our team has missed. I would like to thank the entire team working on this, particularly Rodney Richardson (who initiated this entire thing) and Henrik Nilsson. The software can be downloaded here. Happy barcoding!
- Bengtsson-Palme J, Richardson RT, Meola M, Wurzbacher C, Tremblay ED, Thorell K, Kanger K, Eriksson KM, Bilodeau GJ, Johnson RM, Hartmann M, Nilsson RH: Taxonomic identification from metagenomic or metabarcoding data using any genetic marker. bioRxiv 253377 (2018). doi: 10.1101/253377 [Link]
- Bengtsson-Palme J, Richardson RT, Meola M, Wurzbacher C, Tremblay ED, Thorell K, Kanger K, Eriksson KM, Bilodeau GJ, Johnson RM, Hartmann M, Nilsson RH: Metaxa2 Database Builder: Enabling taxonomic identification from metagenomic and metabarcoding data using any genetic marker. Bioinformatics, advance article (2018). doi: 10.1093/bioinformatics/bty482 [Paper link]
- Richardson RT, Bengtsson-Palme J, Gardiner MM, Johnson RM: A reference cytochrome c oxidase subunit I database curated for hierarchical classification of arthropod metabarcoding data. PeerJ Preprints, 6, e26662v1 (2018). doi: 10.7287/peerj.preprints.26662v1 [Link]
- Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399 [Paper link]