Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg | Wisconsin Institute for Discovery

Browsing Posts tagged 16S

Today marks the five year anniversary for the Metaxa software’s initial release. Much has happened to the software since; Metaxa started off as an rRNA extraction utility for metagenomic data (1), including coarse classification to organism/organelle type. Since it has gained full-scale taxonomic classification ability better or on par with other software packages (2), much greater speed, support for the LSU gene, gained a range of related software tools (3), and spurred development of other tools such as ITSx (4). I have also been involved in no less than four peer-reviewed publications directly related to the software (1-3,5).

But it does not end here; these five years were just the beginning. We are – in different constellations – working on further enhancements to Metaxa2, including support for more genes, an updated classification database, and better customization options. I am very much still devoted to keep Metaxa2 alive and relevant as a tool for taxonomic analysis of metagenomes, applicable whenever accuracy is a key parameter. Thanks for being part of the community for these five years!

References

  1. Bengtsson J, Eriksson KM, Hartmann M, Wang Z, Shenoy BD, Grelet G, Abarenkov K, Petri A, Alm Rosenblad M, Nilsson RH: Metaxa: A software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek, 100, 3, 471–475 (2011). doi:10.1007/s10482-011-9598-6. [Paper link]
  2. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  3. Bengtsson-Palme J, Thorell K, Wurzbacher C, Sjöling Å, Nilsson RH: Metaxa2 Diversity Tools: Easing microbial community analysis with Metaxa2. Ecological Informatics, 33, 45–50 (2016). doi: 10.1016/j.ecoinf.2016.04.004 [Paper link]
  4. Bengtsson-Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, De Wit P, Sánchez-García M, Ebersberger I, de Souza F, Amend AS, Jumpponen A, Unterseher M, Kristiansson E, Abarenkov K, Bertrand YJK, Sanli K, Eriksson KM, Vik U, Veldre V, Nilsson RH: Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for use in environmental sequencing. Methods in Ecology and Evolution, 4, 10, 914–919 (2013). doi: 10.1111/2041-210X.12073 [Paper link]
  5. Bengtsson-Palme J, Hartmann M, Eriksson KM, Nilsson RH: Metaxa, overview. In:Nelson K. (Ed.) Encyclopedia of Metagenomics: SpringerReference (www.springerreference.com). Springer-Verlag Berlin Heidelberg (2013). doi: 10.1007/978-1-4614-6418-1_239-6 [Link]

Yesterday, Ecological Informatics put our paper describing Metaxa2 Diversity Tools online (1). Metaxa2 Diversity Tools was introduced with Metaxa2 version 2.1 and consists of

  • metaxa2_dc – a tool for collecting several .taxonomy.txt output files into one large abundance matrix, suitable for analysis in, e.g., R
  • metaxa2_rf – generates resampling rarefaction curves (2) based on the .taxonomy.txt output
  • metaxa2_si – species inference based on guessing species data from the other species present in the .taxonomy.txt output file
  • metaxa2_uc – a tool for determining if the community composition of a sample is significantly different from others through resampling analysis

At the same time as I did this update to the web site, I also took the opportunity to update the Metaxa2 FAQ to better reflect recent updates to the Metaxa2 software.

Metaxa2 Diversity Tools
One often requested feature of Metaxa2 (3) has been the ability to make simple analyses from the data after classification. The Metaxa2 Diversity Tools included in Metaxa2 2.1 is a seed for such an effort (although not close to a full-fledged community analysis package comparable to QIIME (4) or Mothur (5)). It currently consist of four tools.

The Metaxa2 Data Collector (metaxa2_dc) is the simplest of them (but probably the most requested), designed to merge the output of several *.level_X.txt files from the Metaxa2 Taxonomic Traversal Tool into one large abundance matrix, suitable for further analysis in, for example, R. The Metaxa2 Species Inference tool (metaxa2_si) can be used to further infer taxon information on, for example, the species level at a lower reliability than what would be permitted by the Metaxa2 classifier, using a complementary algorithm. The idea is that is if only a single species is present in, e.g., a family and a read is assigned to this family, but not classified to the species level, that sequence will be inferred to the same species as the other reads, given that it has more than 97% sequence identity to its best reference match. This can be useful if the user really needs species or genus classifications but many organisms in the studied species group have similar rRNA sequences, making it hard for the Metaxa2 classifier to classify sequences to the species level.

The Metaxa2 Rarefaction analysis tool (metaxa2_rf) performs a resampling rarefaction analysis (2) based on the output from the Metaxa2 classifier, taking into account also the unclassified portion of rRNAs. The Metaxa2 Uniqueness of Community analyzer (metaxa2_uc), finally, allows analysis of whether the community composition of two or more samples or groups is significantly different. Using resampling of the community data, the null hypothesis that the taxonomic content of two communities is drawn from the same set of taxa (given certain abundances) is tested. All these tools are further described in the manual and the recent paper (1).

The latest version of Metaxa2, including the Metaxa2 Diversity Tools, can be downloaded here.

References

  1. Bengtsson-Palme J, Thorell K, Wurzbacher C, Sjöling Å, Nilsson RH: Metaxa2 Diversity Tools: Easing microbial community analysis with Metaxa2. Ecological Informatics, 33, 45–50 (2016). doi: 10.1016/j.ecoinf.2016.04.004 [Paper link]
  2. Gotelli NJ, Colwell RK: Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters, 4, 379–391 (2000). doi:10.1046/j.1461-0248.2001.00230.x
  3. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  4. Caporaso JG, Kuczynski J, Stombaugh J et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336 (2010).
  5. Schloss PD, Westcott SL, Ryabin T et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541 (2009).

I have today uploaded an updated version of Metaxa2 (version 2.1.2). This update primarily improves the memory performance of the Metaxa2 Diversity Tools. The core Metaxa2 programs remain the same as for the previous Metaxa2 versions.

New features and bug fixes in this update:

  • Dramatically improved memory performance of metaxa2_uc
  • Added the 'min' option to the -s flag in metaxa2_uc, which will cause the program to sample the number of entries present in the smallest sample from each sample
  • Fixes a bug that disregarded the level specified by the -l option in metaxa2_si
  • Minor updates and improvements on the manual

The updated version of Metaxa2 can be downloaded here.
Happy barcoding!

I got a very nice little e-mail yesterday evening, which made me realize that when I posted the Metaxa 2.1 update, I forgot to thank and credit the wonderful Metaxa/Metaxa2 community who have contributed with input on which Metaxa2 features that they would like to see implemented. Particularly, I would like to thank Thomas Haverkamp who suggested the reference option, Åsa Sjöling who brainstormed what led to the metaxa2_uc tool with me, and everyone who have suggested various downstream analysis tricks that have got baked into the Metaxa2 Diversity Tools.

Within the Metaxa team I would like to specifically thank Kaisa Thorell (particularly for the --split_pairs option) and Martin Hartmann (who said that the software should obviously be able to detect which BLAST version that was installed), who keep pushing for features and ideas to make the software better. Thanks a lot to all of you, and have a nice weekend!

Today I have released Metaxa2 version 2.1.1, containing a fix to an embarrassing bug in the new metaxa2_uc program (part of the Metaxa2 Diversity Tools). A late change of the names of the different modes of that tool had not propagated to all parts of the code, and therefore only the “model” mode was functional in the previous version. No other changes to the Metaxa2 package has been made in this update, which can be downloaded here.

I am very happy to announce that Metaxa2 version 2.1 has been released today. This new version brings a lot of important improvements to the Metaxa2 software (1), in particular by the introduction of the Metaxa2 Diversity Tools. This is the list of new features (further elaboration follows below):

  • The Metaxa2 Diversity Tools:
    • metaxa2_dc – a tool for collecting several .taxonomy.txt output files into one large abundance matrix, suitable for analysis in, e.g., R
    • metaxa2_rf – generates rarefaction curves based on the .taxonomy.txt output
    • metaxa2_si – species inference based on guessing species data from the other species present in the .taxonomy.txt output file
    • metaxa2_uc – a tool for determining if the community composition of a sample is significantly different from others through resampling analysis
  • Added a new detection mode for detection of multiple rRNA in the same sequence, e.g. a genome
  • Added the --reference option to improve the use of Metaxa2 as a tool to sort out host rRNA sequences from a dataset
  • Added the --split_pairs option causing Metaxa2 to output paired-end sequences into two separate files, which is nice for further analysis of rRNA reads
  • The default setting for the --align option has been changed to ‘none
  • Automatic detection of which BLAST package that is installed
  • Fixed a bug causing the last read of paired-end FASTA input to be ignored
  • Fixed an occasionally occurring BLAST+ related warning message
  • Fixed a bug that could cause the classifier to crash on highly divergent BLAST matches

The new version of Metaxa2 can be downloaded here, and for those interested I will spend the rest of this post outlining the new features.

Metaxa2 Diversity Tools
One often requested feature of Metaxa2 is the ability to further make simple analysis from the data after classification. The Metaxa2 Diversity Tools included in Metaxa2 2.1 is a seed for such an effort (although not close to a full-fledge community analysis package compared to QIIME (2) or Mothur (3)). The set currently consist of four tools

The Metaxa2 Data Collector (metaxa2_dc) is the simplest of them (but probably the most requested), designed to merge the output of several *.level_X.txt files from the Metaxa2 Taxonomic Traversal Tool into one large abundance matrix, suitable for further analysis in, for example, R. The Metaxa2 Species Inference tool (metaxa2_si) can be used to further infer taxon information on, for example, the species level at a lower reliability than what would be permitted by the Metaxa2 classifier, using a complementary algorithm. The idea is that is if only a single species is present in, e.g., a family and a read is assigned to this family, but not classified to the species level, that sequence will be inferred to the same species as the other reads, given that it has more than 97% sequence identity to its best reference match. This can be useful if the user really needs species or genus classifications but many organisms in the studied species group have similar rRNA sequences, making it hard for the Metaxa2 classifier to classify sequences to the species level.

The Metaxa2 Rarefaction analysis tool (metaxa2_rf) performs a rarefaction analysis based on the output from the Metaxa2 classifier, taking into account also the unclassified portion of rRNAs. The Metaxa2 Uniqueness of Community analyzer (metaxa2_uc), finally, allows analysis of whether the community composition of two or more samples or groups is significantly different. Using resampling of the community data, the null hypothesis that the taxonomic content of two communities is drawn from the same set of taxa (given certain abundances) is tested. All these tools are further described in the manual.

The genome mode
Metaxa2 has long been said not to be useful for predicting rRNA in longer sequences, such as full genomes or chromosomes, since it has traditionally only looked for a single rRNA hit. With Metaxa2 2.1, it is now possible to use Metaxa2 on longer sequences to detect multiple rRNA occurrences. To do this, you need to change the operating mode using the new --mode option to either ‘auto‘ or ‘genome‘. The auto mode will treat sequences longer than 2500 bp as “genome” sequences and look for multiple matches in these.

The reference mode
Another feature request that has been addressed in the new Metaxa2 version is the ability to filter out certain sequences from the data set. For example, you may want to exclude all rRNA sequences that are derived from to host organism, but keep the analysis of all other rRNA reads. This is now possible by supplying a file of reference rRNA sequences to exclude in FASTA format to the --reference option.

Experimental Usearch support
Finally, we have toyed around with support for Usearch (4) instead of BLAST (5) as the search algorithm for the classification step. However, this is far from fine-tuned and it is included as an experimental feature that you may use on your own risk! We recommend that you not use it for classification of data for publication yet. However, we are interested in how this works for you, so if you like you may test to run the Usearch algorithm in parallel with your BLAST-based analysis and compare the results and send me your input on how it works. You can read more about using Usearch at the end of the Metaxa2 manual.

References

  1. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  2. Caporaso JG, Kuczynski J, Stombaugh J et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336 (2010).
  3. Schloss PD, Westcott SL, Ryabin T et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541 (2009).
  4. Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461 (2010).
  5. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389–3402 (1997).

After almost a year in different stages of review and revision, in which the paper (but not the software) saw a total transformation, I am happy to announce that the paper describing Metaxa2 has been accepted in Molecular Ecology Resources and is available in a rudimentary online early form. The figures in this version are not that pretty, but those who wants to read the paper asap, you have the possibility to do so.

This means that if you have been using Metaxa2 for a publication, there is now a new preferred way of citing this, namely:

Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399

The paper (1), apart from describing the new Metaxa version, also brings a very thorough evaluation of the software, compared to other tools for taxonomic classification implemented in QIIME (2). In short, we show that:

  • Metaxa2 can make trustworthy taxonomic classifications even with reads as short as 100 bp
  • Generally, the performance is reliable across the entire SSU rRNA gene, regardless of which V-region a read is derived from
  • Metaxa2 can reliably recapture species composition from short-read metagenomic data, comparable with results of amplicon sequencing
  • Metaxa2 outperforms other popular tools such as Mothur (3), the RDP Classifier (4), Rtax (5) and the QIIME implementation of Uclust (6) in terms of proportion of correctly classified reads from metagenomic data
  • The false positive rate of Metaxa2 is very close to zero; far superior to many of the above mentioned tools, many of which assume that reads must derive from the rRNA gene

Metaxa2 can be downloaded here. We have already used it for around two years internally, and it forms the base of the taxonomic classifications in e.g. our recently published paper on antibiotic resistance in a polluted Indian lake (7).

References

  1. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  2. Caporaso JG, Kuczynski J, Stombaugh J et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336 (2010).
  3. Schloss PD, Westcott SL, Ryabin T et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541 (2009).
  4. Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73, 5261–5267 (2007).
  5. Soergel DAW, Dey N, Knight R, Brenner SE: Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. The ISME Journal, 6, 1440–1444 (2012).
  6. Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461 (2010).
  7. Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014).

Metaxa2 update

Comments off

An update to Metaxa2 that has long remained in internal testing has been deemed bug-free (as far as we can tell) and has been uploaded to the Metaxa2 web site. The update brings a slightly improved classifier, and is the first release that we declare full stable, although we have found no problems with the previously available version (release candidate 3). This also means that we take a jump directly from version 2.0, release candidate 3 to version 2.0.1 without passing a final 2.0 release. The update is available here.

Metaxa2 is here!

1 comment

The new version of MetaxaMetaxa2 – which I first started talking about more than 1.5 years ago, has finally been determined to be so stable that we can officially release it! The release come around the same time as we submitted a paper describing the changes in it, but I will briefly go through the changes here:

  • Metaxa2 now handles extraction and classification of LSU rRNA sequences in addition to SSU rRNA
  • The classification engine has been completely redesigned, and now enables accurate taxonomic classifications down to the genus – or in some cases – species level
  • The classification database has been updated, and is now based on the SILVA 111 release
  • The Metaxa2 Taxonomic Traversal Tool – metaxa2_ttt – has been added to the package, to ease the counting of rRNA sequences in different organism groups (at various taxonomic levels)
  • Metaxa2 adds support for paired-end libraries
  • It is now possible to directly input of sequences in FASTQ-format to Metaxa2
  • The support for libraries with short read lengths (~100 bp) has been vastly improved (and is now assumed to be the case for default settings)
  • Metaxa2 can do quality pre-filtering of reads in FASTQ-format
  • Metaxa2 adds support for the modern BLAST+ package (although the old blastall version is still default)
  • Compatibility with the HMMER 3.1 beta

Metaxa2 brings together a large set of features that we have been gradually incorporating since 2011, many of which have been dependent on each other. Most of the new features and changes are thoroughly explained in the manual. While we hope Metaxa2 is bug free, there will likely be bugs caused by usage scenarios we have not envisioned. I therefore encourage anyone who come across some unexpected behavior to send me an e-mail. Especially, I would like to know about how the software performs using HMMER 3.1 and BLAST+, where testing has been limited compared to older parts of the code.

We hope that you will find Metaxa2 useful, and that it will bring taxonomic assessment of metagenomes another step forward! Metaxa2 can be downloaded here.

As you might be aware, a new version of HMMER is out since late May. You might wonder how Metaxa (relying on HMMER3) will work if you update to the new version of HMMER, and I have finally got around to test it! The answer, according to my somewhat limited testing, is that Metaxa 1.1.2 seems to be working fine with HMMER 3.1.

You might need to go into the database directory (“metaxa_db”; should be located in the same directory as the Metaxa binaries), and remove all the files ending with suffixes .h3f .h3i .h3m and .h3p inside the “HMMs” directory. On most installation, this should not be necessary. Myself, I just plugged HMMER 3.1 in and started Metaxa, but if you get error messages complaining that “Error: bad format, binary auxfiles, .hmm:
binary auxfiles are in an outdated HMMER format (3/b); please hmmpress your HMM file again”, then you should try removing the files and re-running Metaxa. This might especially be a problem on older Metaxa versions. [Update: Note that this fix will likely not work with ITSx!]

Bear in mind that I have not run thorough testing on Metaxa and HMMER 3.1, and probably won’t for the 1.1.2 version, since there’s a 2.0 version waiting just around the corner…

Additionally, if you experience problems with Megraft, you should try the same fix as for Metaxa, but with the Megraft database directory instead. Regarding ITSx, a minor update will be released very soon, which also will address HMMER 3.1b compatibility. [Update: See this post for how to work around HMMER 3.1 problems with ITSx.]

Happy barcoding everyone!