Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

Yesterday, Ecological Informatics put our paper describing Metaxa2 Diversity Tools online (1). Metaxa2 Diversity Tools was introduced with Metaxa2 version 2.1 and consists of

  • metaxa2_dc – a tool for collecting several .taxonomy.txt output files into one large abundance matrix, suitable for analysis in, e.g., R
  • metaxa2_rf – generates resampling rarefaction curves (2) based on the .taxonomy.txt output
  • metaxa2_si – species inference based on guessing species data from the other species present in the .taxonomy.txt output file
  • metaxa2_uc – a tool for determining if the community composition of a sample is significantly different from others through resampling analysis

At the same time as I did this update to the web site, I also took the opportunity to update the Metaxa2 FAQ to better reflect recent updates to the Metaxa2 software.

Metaxa2 Diversity Tools
One often requested feature of Metaxa2 (3) has been the ability to make simple analyses from the data after classification. The Metaxa2 Diversity Tools included in Metaxa2 2.1 is a seed for such an effort (although not close to a full-fledged community analysis package comparable to QIIME (4) or Mothur (5)). It currently consist of four tools.

The Metaxa2 Data Collector (metaxa2_dc) is the simplest of them (but probably the most requested), designed to merge the output of several *.level_X.txt files from the Metaxa2 Taxonomic Traversal Tool into one large abundance matrix, suitable for further analysis in, for example, R. The Metaxa2 Species Inference tool (metaxa2_si) can be used to further infer taxon information on, for example, the species level at a lower reliability than what would be permitted by the Metaxa2 classifier, using a complementary algorithm. The idea is that is if only a single species is present in, e.g., a family and a read is assigned to this family, but not classified to the species level, that sequence will be inferred to the same species as the other reads, given that it has more than 97% sequence identity to its best reference match. This can be useful if the user really needs species or genus classifications but many organisms in the studied species group have similar rRNA sequences, making it hard for the Metaxa2 classifier to classify sequences to the species level.

The Metaxa2 Rarefaction analysis tool (metaxa2_rf) performs a resampling rarefaction analysis (2) based on the output from the Metaxa2 classifier, taking into account also the unclassified portion of rRNAs. The Metaxa2 Uniqueness of Community analyzer (metaxa2_uc), finally, allows analysis of whether the community composition of two or more samples or groups is significantly different. Using resampling of the community data, the null hypothesis that the taxonomic content of two communities is drawn from the same set of taxa (given certain abundances) is tested. All these tools are further described in the manual and the recent paper (1).

The latest version of Metaxa2, including the Metaxa2 Diversity Tools, can be downloaded here.


  1. Bengtsson-Palme J, Thorell K, Wurzbacher C, Sjöling Å, Nilsson RH: Metaxa2 Diversity Tools: Easing microbial community analysis with Metaxa2. Ecological Informatics, 33, 45–50 (2016). doi: 10.1016/j.ecoinf.2016.04.004 [Paper link]
  2. Gotelli NJ, Colwell RK: Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters, 4, 379–391 (2000). doi:10.1046/j.1461-0248.2001.00230.x
  3. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  4. Caporaso JG, Kuczynski J, Stombaugh J et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336 (2010).
  5. Schloss PD, Westcott SL, Ryabin T et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541 (2009).

Metaxa2 has been updated again today to version 2.1.3. This update adds a few features to the Metaxa2 Diversity Tools (metaxa2_uc and metaxa2_rf). The core Metaxa2 programs remain the same as for the previous Metaxa2 versions. The new features were suggested as part of the review process of a Metaxa2-related manuscript, and we thank the anonymous reviewers for their great suggestions!

New features and bug fixes in this update:

  • Added the Chao1, iChao1 and ACE estimators in addition to the original species abundance (“Bengtsson-Palme”) model in metaxa2_rf
  • Added the Raup-Crick dissimilarity method to the metaxa2_uc tool
  • Added a warning message when data is highly skewed for metaxa2_uc
  • Improved robustness of the ‘model’ mode of metaxa2_uc for highly skewed sample groups
  • Fixed a bug causing miscalculation of Euclidean distances on binary data in metaxa2_uc

The updated version of Metaxa2 can be downloaded here.

Happy barcoding!

After a long wait (1) Sara Lundström’s paper establishing minimal selective concentrations (MSCs) for the antibiotic tetracycline in complex microbial communities (2), of which I am a co-author, has gone online. Personally, I think this paper is among the finest work I have been involved in; a lot of good science have gone into this publication. Risk assessment and management of antibiotics pollution is in great need of scientific data to underpin mitigation efforts (3). This paper describes a method to determine the minimal selective concentrations of antibiotics, and investigates different endpoints for measuring those MSCs. The method involves a testing system highly relevant for aquatic communities, in which bacteria are allowed to form biofilms in aquaria under controlled antibiotic exposure. Using the system, we find that 1 μg/L tetracycline selects for the resistance genes tetA and tetG, while 10 μg/L tetracycline is required to detect changes of phenotypic resistance. In short, the different endpoints studied (and their corresponding MSCs) were:

  • CFU counts on R2A plates with 20 μg/mL tetracycline – MSC = 10 μg/L
  • MIC range – MSC ~ 10-100 μg/L
  • PICT, leucine uptake after short-term TC challenge – MSC ~ 100 μg/L
  • Increased resistance gene abundances, metagenomics – MSC range: 0.1-10 μg/L
  • Increased resistance gene abundances, qPCR (tetA and tetG) – MSC ≤ 1 μg/L
  • Changes to taxonomic diversity – no significant changes detected
  • Changes to taxonomic community composition – MSC ~ 1-10 μg/L

This study confirms that the estimated PNECs we reported recently (4) correspond well to experimentally determined MSCs, at least for tetracycline. Importantly, the selective concentrations we report for tetracycline overlap with those that have been reported in sewage treatment plants (5). We also see that tetracycline not only selects for tetracycline resistance genes, but also resistance genes against other classes of antibiotics, including sulfonamides, beta-lactams and aminoglycosides. Finally, the approach we describe can be used for improved in risk assessment for (also other) antibiotics, and to refine the emission limits we suggested in a recent paper based on theoretical calculations (4).

References and notes

  1. Okay, seriously: how can a journal’s production team return the proofs for a paper within 24 hours of acceptance, and then wait literally five weeks before putting the final proofs online? Nothing against STOTEN, but I honestly wonder what was going on beyond the scenes here.
  2. Lundström SV, Östman M, Bengtsson-Palme J, Rutgersson C, Thoudal M, Sircar T, Blanck H, Eriksson KM, Tysklind M, Flach C-F, Larsson DGJ: Minimal selective concentrations of tetracycline in complex aquatic bacterial biofilms. Science of the Total Environment, 553, 587–595 (2016). doi: 10.1016/j.scitotenv.2016.02.103 [Paper link]
  3. Ågerstrand M, Berg C, Björlenius B, Breitholtz M, Brunstrom B, Fick J, Gunnarsson L, Larsson DGJ, Sumpter JP, Tysklind M, Rudén C: Improving environmental risk assessment of human pharmaceuticals. Environmental Science and Technology (2015). doi:10.1021/acs.est.5b00302
  4. Bengtsson-Palme J, Larsson DGJ: Concentrations of antibiotics predicted to select for resistant bacteria: Proposed limits for environmental regulation. Environment International, 86, 140-149 (2016). doi: 10.1016/j.envint.2015.10.015
  5. Michael I, Rizzo L, McArdell CS, Manaia CM, Merlin C, Schwartz T, Dagot C, Fatta-Kassinos D: Urban wastewater treatment plants as hotspots for the release of antibiotics in the environment: a review. Water Research, 47, 957–995 (2013). doi:10.1016/j.watres.2012.11.027

I have been asked to give a short talk on the metals and biocides and antibiotic resistance co-selection conference I mentioned in February. My presentation will take place late Tuesday afternoon, and is entitled “Elucidating biocide and metal co-selection for antibiotic resistance in sewage treatment plants using metagenomics“. I hope to see you there!

The Royal Swedish Academy of Sciences (KVA) is, together with Joakim Larsson, arranging a conference on the mechanisms and evidence for the involvement of metals and biocides in selection of antibiotic resistant bacteria. Several experts from around Europe will attend and give talks, including for example Dan Andersson, Kristian Brandt, Teresa Coque, Will Gaze, Åsa Melhus and Chris Rensing. The symposium is open to everyone and is free of charge (although registration is binding).

The conference take place between 15th and 16th of March, at The Royal Swedish Academy of Sciences’ facilities at Lilla Frescativägen in Stockholm. And you should join us; register here!

TriMetAss has today been updated to version 1.2. The new version addresses a number of minor issues, some of which I thought was fixed with the previous version. The update can be found here.

The main problem with the previous version of TriMetAss was that the Trinity developers had changed many options in the Trinity software, which rendered more recent versions of Trinity incompatible with TriMetAss. TriMetAss was not the only external software using Trinity that was affected by these changes. As far as my testing goes, these incompatibilities should now be fixed, by improved Trinity version determination in TriMetAss. This is still not a guarantee for future changes though, so just to make sure, use one of the Trinity versions tested with TriMetAss (versions v2.1.1 or trinityrnaseq_r2013_08_14).

This time I would like to thank Artemis Louyakis at the Univesity of Florida and Tatsuya Unno at the Jeju National University (Korea) for their input on TriMetAss.


Comments off

Even for Sweden, this was a pleasant surprise to come to work to this morning.

My workplace, in the winter.

The view from my office.

I have today uploaded an updated version of Metaxa2 (version 2.1.2). This update primarily improves the memory performance of the Metaxa2 Diversity Tools. The core Metaxa2 programs remain the same as for the previous Metaxa2 versions.

New features and bug fixes in this update:

  • Dramatically improved memory performance of metaxa2_uc
  • Added the 'min' option to the -s flag in metaxa2_uc, which will cause the program to sample the number of entries present in the smallest sample from each sample
  • Fixes a bug that disregarded the level specified by the -l option in metaxa2_si
  • Minor updates and improvements on the manual

The updated version of Metaxa2 can be downloaded here.
Happy barcoding!

I have made my yearly updates to the web site (changing pictures and adding the yearly summary), and I just want to take the opportunity to wish all my visitors a happy 2016! My little family has been sick (at least one of us) during most of the holidays, so we have had a very calm Christmas and a very calm New Year’s. Hope you have had more fun!

A problem with annotating contigs from genomic and metagenomic projects is that there are few tools that allow the visualization of the annotated features, particularly if those features come from different sources. To alleviate this problem, I have (with assistance from Rickard Hammarén and Chandan Pal) over the last years developed a new annotation and read coverage visualization package – FARAO – which we today introduce to the public. FARAO has been used to produce the basis for the the contig annotation figures in my paper on the polluted Indian lake. Storing and visualizing annotation and coverage information in FARAO has a number of advantages. FARAO is able to:

  • Integrate annotation and coverage information for the same sequence set, enabling coverage estimates of annotated features
  • Scale across millions of sequences and annotated features
  • Filter sequences, such that only entries with annotations satisfying certain given criteria will be outputted
  • Handle annotation and coverage data produced by a range of different bioinformatics tools
  • Handle custom parsers through a flexible interface, allowing for adaption of the software to virtually any bioinformatic tool
  • Produce high-quality EPS output
  • Integrate with MySQL databases

FARAO is today moved from a private pre-release state to a public beta state. It is still possible that this version contains bug that we have not discovered in our testing. Please send me an e-mail and make us aware of the potential shortcomings of our software if you find any unexpected behavior in this version of FARAO.