Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg | Wisconsin Institute for Discovery

Browsing Posts tagged Biodiversity

Metaxa2 has been updated again today to version 2.1.3. This update adds a few features to the Metaxa2 Diversity Tools (metaxa2_uc and metaxa2_rf). The core Metaxa2 programs remain the same as for the previous Metaxa2 versions. The new features were suggested as part of the review process of a Metaxa2-related manuscript, and we thank the anonymous reviewers for their great suggestions!

New features and bug fixes in this update:

  • Added the Chao1, iChao1 and ACE estimators in addition to the original species abundance (“Bengtsson-Palme”) model in metaxa2_rf
  • Added the Raup-Crick dissimilarity method to the metaxa2_uc tool
  • Added a warning message when data is highly skewed for metaxa2_uc
  • Improved robustness of the ‘model’ mode of metaxa2_uc for highly skewed sample groups
  • Fixed a bug causing miscalculation of Euclidean distances on binary data in metaxa2_uc

The updated version of Metaxa2 can be downloaded here.

Happy barcoding!

After a long wait (1) Sara Lundström’s paper establishing minimal selective concentrations (MSCs) for the antibiotic tetracycline in complex microbial communities (2), of which I am a co-author, has gone online. Personally, I think this paper is among the finest work I have been involved in; a lot of good science have gone into this publication. Risk assessment and management of antibiotics pollution is in great need of scientific data to underpin mitigation efforts (3). This paper describes a method to determine the minimal selective concentrations of antibiotics, and investigates different endpoints for measuring those MSCs. The method involves a testing system highly relevant for aquatic communities, in which bacteria are allowed to form biofilms in aquaria under controlled antibiotic exposure. Using the system, we find that 1 μg/L tetracycline selects for the resistance genes tetA and tetG, while 10 μg/L tetracycline is required to detect changes of phenotypic resistance. In short, the different endpoints studied (and their corresponding MSCs) were:

  • CFU counts on R2A plates with 20 μg/mL tetracycline – MSC = 10 μg/L
  • MIC range – MSC ~ 10-100 μg/L
  • PICT, leucine uptake after short-term TC challenge – MSC ~ 100 μg/L
  • Increased resistance gene abundances, metagenomics – MSC range: 0.1-10 μg/L
  • Increased resistance gene abundances, qPCR (tetA and tetG) – MSC ≤ 1 μg/L
  • Changes to taxonomic diversity – no significant changes detected
  • Changes to taxonomic community composition – MSC ~ 1-10 μg/L

This study confirms that the estimated PNECs we reported recently (4) correspond well to experimentally determined MSCs, at least for tetracycline. Importantly, the selective concentrations we report for tetracycline overlap with those that have been reported in sewage treatment plants (5). We also see that tetracycline not only selects for tetracycline resistance genes, but also resistance genes against other classes of antibiotics, including sulfonamides, beta-lactams and aminoglycosides. Finally, the approach we describe can be used for improved in risk assessment for (also other) antibiotics, and to refine the emission limits we suggested in a recent paper based on theoretical calculations (4).

References and notes

  1. Okay, seriously: how can a journal’s production team return the proofs for a paper within 24 hours of acceptance, and then wait literally five weeks before putting the final proofs online? Nothing against STOTEN, but I honestly wonder what was going on beyond the scenes here.
  2. Lundström SV, Östman M, Bengtsson-Palme J, Rutgersson C, Thoudal M, Sircar T, Blanck H, Eriksson KM, Tysklind M, Flach C-F, Larsson DGJ: Minimal selective concentrations of tetracycline in complex aquatic bacterial biofilms. Science of the Total Environment, 553, 587–595 (2016). doi: 10.1016/j.scitotenv.2016.02.103 [Paper link]
  3. Ågerstrand M, Berg C, Björlenius B, Breitholtz M, Brunstrom B, Fick J, Gunnarsson L, Larsson DGJ, Sumpter JP, Tysklind M, Rudén C: Improving environmental risk assessment of human pharmaceuticals. Environmental Science and Technology (2015). doi:10.1021/acs.est.5b00302
  4. Bengtsson-Palme J, Larsson DGJ: Concentrations of antibiotics predicted to select for resistant bacteria: Proposed limits for environmental regulation. Environment International, 86, 140-149 (2016). doi: 10.1016/j.envint.2015.10.015
  5. Michael I, Rizzo L, McArdell CS, Manaia CM, Merlin C, Schwartz T, Dagot C, Fatta-Kassinos D: Urban wastewater treatment plants as hotspots for the release of antibiotics in the environment: a review. Water Research, 47, 957–995 (2013). doi:10.1016/j.watres.2012.11.027

I have today uploaded an updated version of Metaxa2 (version 2.1.2). This update primarily improves the memory performance of the Metaxa2 Diversity Tools. The core Metaxa2 programs remain the same as for the previous Metaxa2 versions.

New features and bug fixes in this update:

  • Dramatically improved memory performance of metaxa2_uc
  • Added the 'min' option to the -s flag in metaxa2_uc, which will cause the program to sample the number of entries present in the smallest sample from each sample
  • Fixes a bug that disregarded the level specified by the -l option in metaxa2_si
  • Minor updates and improvements on the manual

The updated version of Metaxa2 can be downloaded here.
Happy barcoding!

I am very happy to announce that our paper on the metagenomes of periphyton communities (1) have been accepted in Frontiers in Microbiology (Aquatic Microbiology section). This project has been one of my longest running, as it started as my master thesis in 2010 and has gone through several metamorphoses before hitting its final form.

Briefly, our main findings are that:

  1. Periphyton communities harbor an extraordinary diversity of organisms, including viruses, bacteria, algae, fungi, protozoans and metazoans
  2. Bacteria are by far the most abundant
  3. We find functional indicators of the biofilm form of life in periphyton involve genes coding for enzymes that catalyze the production and degradation of extracellular polymeric substances
  4. Genes encoding enzymes that participate in anaerobic pathways are found in the biofilms suggesting that anaerobic or low-oxygen micro-zones within the biofilms exist

Most of this work has been carried out by my colleague Kemal Sanli, who have been doing a wonderful job pulling this together, with the help of Henrik Nilsson and Martin Eriksson. It also deserves to be noted that this work was the starting point for the Metaxa software (2,3), which recently reached version 2.1.1.


  1. Sanli K, Bengtsson-Palme J, Nilsson RH, Kristiansson E, Alm Rosenblad M, Blanck H, Eriksson KM: Metagenomic sequencing of marine periphyton: Taxonomic and functional insights into biofilm communities. Frontiers in Microbiology, 6, 1192 (2015). doi: 10.3389/fmicb.2015.01192 [Paper link]
  2. Bengtsson J, Eriksson KM, Hartmann M, Wang Z, Shenoy BD, Grelet G, Abarenkov K, Petri A, Alm Rosenblad M, Nilsson RH: Metaxa: A software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek, 100, 3, 471-475 (2011). doi:10.1007/s10482-011-9598-6. [Paper link]
  3. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399 [Paper link]

Today I have released Metaxa2 version 2.1.1, containing a fix to an embarrassing bug in the new metaxa2_uc program (part of the Metaxa2 Diversity Tools). A late change of the names of the different modes of that tool had not propagated to all parts of the code, and therefore only the “model” mode was functional in the previous version. No other changes to the Metaxa2 package has been made in this update, which can be downloaded here.

I am very happy to announce that Metaxa2 version 2.1 has been released today. This new version brings a lot of important improvements to the Metaxa2 software (1), in particular by the introduction of the Metaxa2 Diversity Tools. This is the list of new features (further elaboration follows below):

  • The Metaxa2 Diversity Tools:
    • metaxa2_dc – a tool for collecting several .taxonomy.txt output files into one large abundance matrix, suitable for analysis in, e.g., R
    • metaxa2_rf – generates rarefaction curves based on the .taxonomy.txt output
    • metaxa2_si – species inference based on guessing species data from the other species present in the .taxonomy.txt output file
    • metaxa2_uc – a tool for determining if the community composition of a sample is significantly different from others through resampling analysis
  • Added a new detection mode for detection of multiple rRNA in the same sequence, e.g. a genome
  • Added the --reference option to improve the use of Metaxa2 as a tool to sort out host rRNA sequences from a dataset
  • Added the --split_pairs option causing Metaxa2 to output paired-end sequences into two separate files, which is nice for further analysis of rRNA reads
  • The default setting for the --align option has been changed to ‘none
  • Automatic detection of which BLAST package that is installed
  • Fixed a bug causing the last read of paired-end FASTA input to be ignored
  • Fixed an occasionally occurring BLAST+ related warning message
  • Fixed a bug that could cause the classifier to crash on highly divergent BLAST matches

The new version of Metaxa2 can be downloaded here, and for those interested I will spend the rest of this post outlining the new features.

Metaxa2 Diversity Tools
One often requested feature of Metaxa2 is the ability to further make simple analysis from the data after classification. The Metaxa2 Diversity Tools included in Metaxa2 2.1 is a seed for such an effort (although not close to a full-fledge community analysis package compared to QIIME (2) or Mothur (3)). The set currently consist of four tools

The Metaxa2 Data Collector (metaxa2_dc) is the simplest of them (but probably the most requested), designed to merge the output of several *.level_X.txt files from the Metaxa2 Taxonomic Traversal Tool into one large abundance matrix, suitable for further analysis in, for example, R. The Metaxa2 Species Inference tool (metaxa2_si) can be used to further infer taxon information on, for example, the species level at a lower reliability than what would be permitted by the Metaxa2 classifier, using a complementary algorithm. The idea is that is if only a single species is present in, e.g., a family and a read is assigned to this family, but not classified to the species level, that sequence will be inferred to the same species as the other reads, given that it has more than 97% sequence identity to its best reference match. This can be useful if the user really needs species or genus classifications but many organisms in the studied species group have similar rRNA sequences, making it hard for the Metaxa2 classifier to classify sequences to the species level.

The Metaxa2 Rarefaction analysis tool (metaxa2_rf) performs a rarefaction analysis based on the output from the Metaxa2 classifier, taking into account also the unclassified portion of rRNAs. The Metaxa2 Uniqueness of Community analyzer (metaxa2_uc), finally, allows analysis of whether the community composition of two or more samples or groups is significantly different. Using resampling of the community data, the null hypothesis that the taxonomic content of two communities is drawn from the same set of taxa (given certain abundances) is tested. All these tools are further described in the manual.

The genome mode
Metaxa2 has long been said not to be useful for predicting rRNA in longer sequences, such as full genomes or chromosomes, since it has traditionally only looked for a single rRNA hit. With Metaxa2 2.1, it is now possible to use Metaxa2 on longer sequences to detect multiple rRNA occurrences. To do this, you need to change the operating mode using the new --mode option to either ‘auto‘ or ‘genome‘. The auto mode will treat sequences longer than 2500 bp as “genome” sequences and look for multiple matches in these.

The reference mode
Another feature request that has been addressed in the new Metaxa2 version is the ability to filter out certain sequences from the data set. For example, you may want to exclude all rRNA sequences that are derived from to host organism, but keep the analysis of all other rRNA reads. This is now possible by supplying a file of reference rRNA sequences to exclude in FASTA format to the --reference option.

Experimental Usearch support
Finally, we have toyed around with support for Usearch (4) instead of BLAST (5) as the search algorithm for the classification step. However, this is far from fine-tuned and it is included as an experimental feature that you may use on your own risk! We recommend that you not use it for classification of data for publication yet. However, we are interested in how this works for you, so if you like you may test to run the Usearch algorithm in parallel with your BLAST-based analysis and compare the results and send me your input on how it works. You can read more about using Usearch at the end of the Metaxa2 manual.


  1. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  2. Caporaso JG, Kuczynski J, Stombaugh J et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336 (2010).
  3. Schloss PD, Westcott SL, Ryabin T et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541 (2009).
  4. Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461 (2010).
  5. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389–3402 (1997).

A couple of days ago, a paper I have co-authored describing an ITS sequence dataset for chimera control in fungi went online as an advance online publication in Microbes and Environments. There are several software tools available for chimera detection (e.g. Henrik Nilsson’s fungal chimera checker (1) and UCHIME (2)), but these generally rely on the presence of a chimera-free reference dataset. Until now, there was no such dataset is for the fungal ITS region, and we in this paper (3) introduce a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database (4). This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. We estimated the dataset performance on a large set of artificial chimeras to be above 99.5%, and also used the dataset to remove nearly 1,000 chimeric fungal ITS sequences from the UNITE database. The dataset can be downloaded from the UNITE repository. Thereby, it is also possible for users to curate the dataset in the future through the UNITE interactive editing tools.


  1. Nilsson RH, Abarenkov K, Veldre V, Nylinder S, Wit P de, Brosché S, Alfredsson JF, Ryberg M, Kristiansson E: An open source chimera checker for the fungal ITS region. Molecular Ecology Resources, 10, 1076–1081 (2010).
  2. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, 27, 16, 2194-2200 (2011). doi:10.1093/bioinformatics/btr381
  3. Nilsson RH, Tedersoo L, Ryberg M, Kristiansson E, Hartmann M, Unterseher M, Porter TM, Bengtsson-Palme J, Walker D, de Sousa F, Gamper HA, Larsson E, Larsson K-H, Kõljalg U, Edgar R, Abarenkov K: A comprehensive, automatically updated fungal ITS sequence dataset for reference-based chimera control in environmental sequencing efforts. Microbes and Environments, Advance Online Publication (2015). doi: 10.1264/jsme2.ME14121
  4. Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Molecular Ecology, 22, 21, 5271–5277 (2013). doi: 10.1111/mec.12481

My colleague Henrik Nilsson has been interviewed by the ResearchGate news team about the recent effort to better annotate ITS data for plant pathogenic fungi. It’s an interesting read, and I think Henrik nicely underscores why large-scale efforts for improving and correcting sequence annotations are important. You can read the interview here, and the paper they talk about is referenced below.

Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity, Volume 67, Issue 1 (2014), 11–19. doi: 10.1007/s13225-014-0291-8 [Paper link]

Another paper I have co-authored related to the UNITE database for fungal rDNA ITS sequences is now published as an Online Early article in Fungal Diversity. The paper describes an effort to improve the annotation of ITS sequences from fungal plant pathogens. Why is this important? Well, apart from fungal plant pathogens being responsible for great economic losses in agriculture, the paper is also conceptually important as it shows that together we can accomplish a substantial improvement to the metadata in sequence databases. In this work we have hunted down high-quality reference sequences for various plant pathogenic fungi, and re-annotated incorrectly or insufficiently annotated ITS sequences from the same fungal lineages. In total, the 59 authors have made 31,954 changes to UNITE database data, on average 540 changes per author. While one, or a few, persons could not feasibly have made this effort alone, this work shows that in larger consortia vast improvements can be made to the quality of databases, by distributing the work among many scientists. In many ways, this relates to proposals to “wikify” GenBank, and after Rfam and Pfam it might now be time to take the user-contribution model to, at least, the RefSeq portion of GenBank, which despite its description as being “comprehensive, integrated, non-redundant, [and] well-annotated” still contains errors and examples of non-usable annotation. More on that at a later point…

Paper reference:

Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity Online early (2014). doi: 10.1007/s13225-014-0291-8 [Paper link]

I got informed by a colleague that today is Taxonomist Appreciation Day! This is a very important day; quoting from the original post:

We need active work on taxonomy and systematics if our work is going to progress, and if we are to apply our findings. Without taxonomists, entire fields wouldn’t exist. We’d be working in darkness. (…) Taxonomists and systematists often work in obscurity, and some of the most painstaking projects come to fruition after long years with only a small dose of the recognition that is required.

So, send your favorite taxonomist(s) some love today, and remember they are the foundation for much of what we bioinformaticians do!