Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg


Comments off

Even for Sweden, this was a pleasant surprise to come to work to this morning.

My workplace, in the winter.

The view from my office.

I have today uploaded an updated version of Metaxa2 (version 2.1.2). This update primarily improves the memory performance of the Metaxa2 Diversity Tools. The core Metaxa2 programs remain the same as for the previous Metaxa2 versions.

New features and bug fixes in this update:

  • Dramatically improved memory performance of metaxa2_uc
  • Added the 'min' option to the -s flag in metaxa2_uc, which will cause the program to sample the number of entries present in the smallest sample from each sample
  • Fixes a bug that disregarded the level specified by the -l option in metaxa2_si
  • Minor updates and improvements on the manual

The updated version of Metaxa2 can be downloaded here.
Happy barcoding!

I have made my yearly updates to the web site (changing pictures and adding the yearly summary), and I just want to take the opportunity to wish all my visitors a happy 2016! My little family has been sick (at least one of us) during most of the holidays, so we have had a very calm Christmas and a very calm New Year’s. Hope you have had more fun!

A problem with annotating contigs from genomic and metagenomic projects is that there are few tools that allow the visualization of the annotated features, particularly if those features come from different sources. To alleviate this problem, I have (with assistance from Rickard Hammarén and Chandan Pal) over the last years developed a new annotation and read coverage visualization package – FARAO – which we today introduce to the public. FARAO has been used to produce the basis for the the contig annotation figures in my paper on the polluted Indian lake. Storing and visualizing annotation and coverage information in FARAO has a number of advantages. FARAO is able to:

  • Integrate annotation and coverage information for the same sequence set, enabling coverage estimates of annotated features
  • Scale across millions of sequences and annotated features
  • Filter sequences, such that only entries with annotations satisfying certain given criteria will be outputted
  • Handle annotation and coverage data produced by a range of different bioinformatics tools
  • Handle custom parsers through a flexible interface, allowing for adaption of the software to virtually any bioinformatic tool
  • Produce high-quality EPS output
  • Integrate with MySQL databases

FARAO is today moved from a private pre-release state to a public beta state. It is still possible that this version contains bug that we have not discovered in our testing. Please send me an e-mail and make us aware of the potential shortcomings of our software if you find any unexpected behavior in this version of FARAO.

Yesterday was an intensive day for typesetters apparently, since they put two of my papers online on the same day. This second paper was published in Environment International, and focuses on predicting minimal selective concentrations for all antibiotics present in the EUCAST database (1).

Today (well, up until yesterday at least), we have virtually no knowledge of which environmental concentrations that can exert a selection pressure for antibiotic resistant bacteria. However, experimentally determining minimal selective concentrations (MSCs) in complex ecosystems would involve immense efforts if done for all antibiotics. Therefore, efforts to theoretically determine MSCs for different antibiotics have been suggested (2,3). In this paper we therefore estimate upper boundaries for selective concentrations for all antibiotics in the EUCAST database, based on the assumption that selective concentrations a priori must be lower than those completely inhibiting growth. Data on Minimal Inhibitory Concentrations (MICs) were obtained for 122 antibiotics and antibiotics combinations, the lowest observed MICs were identified for each of those across all tested species, and to compensate for limited species coverage, we adjusted the lowest MICs for the number of tested species. We finally assessed Predicted No Effect Concentrations (PNECs) for resistance selection using an assessment factor of 10 to account for the differences between MICs and MSCs. Since we found that the link between taxonomic similarity between species and lowest MIC was weak, we have not compensated for the taxonomic diversity that each antibiotic was tested against – only for limited number of species tested. In most cases, our PNECs for selection of resistance were below available PNECs for ecotoxicological effects retrieved from FASS. Also, concentrations predicted to be selective have, for some antibiotics, been detected in regular sewage treatment plants (4), and are greatly exceeded in environments polluted by pharmaceutical pollution (5-7), often with drastic consequences in terms of resistance gene enrichments (8-10). This is a central issue since in principle a transfer event of a novel resistance determinant from an environmental bacteria to an (opportunistic) human pathogen only need to occur once to become a clinical problem (11). Once established, the gene could then spread through human activities, such as trade and travel (7,13). Importantly, this paper:

The paper is available under open access here. We hope, and believe, that the data will be of great use in environmental risk assessments, in efforts by industries, regulatory agencies or purchasers of medicines to define acceptable environmental emissions of antibiotics, in the implementation of environmental monitoring programs, for directing mitigations, and for prioritizing future studies on environmental antibiotic resistance.


  1. Bengtsson-Palme J, Larsson DGJ: Concentrations of antibiotics predicted to select for resistant bacteria: Proposed limits for environmental regulation. Environment International, 86, 140-149 (2016). doi: 10.1016/j.envint.2015.10.015 [Paper link]
  2. Ågerstrand M, Berg C, Björlenius B, Breitholtz M, Brunstrom B, Fick J, Gunnarsson L, Larsson DGJ, Sumpter JP, Tysklind M, Rudén C: Improving environmental risk assessment of human pharmaceuticals. Environmental Science and Technology (2015). doi:10.1021/acs.est.5b00302
  3. Tello A, Austin B, Telfer TC: Selective pressure of antibiotic pollution on bacteria of importance to public health. Environmental Health Perspectives, 120, 1100–1106 (2012). doi:10.1289/ehp.1104650
  4. Michael I, Rizzo L, McArdell CS, Manaia CM, Merlin C, Schwartz T, Dagot C, Fatta-Kassinos D: Urban wastewater treatment plants as hotspots for the release of antibiotics in the environment: a review. Water Research, 47, 957–995 (2013). doi:10.1016/j.watres.2012.11.027
  5. Larsson DGJ, de Pedro C, Paxeus N: Effluent from drug manufactures contains extremely high levels of pharmaceuticals. Journal of Hazardous Materials, 148, 751–755 (2007). doi:10.1016/j.jhazmat.2007.07.008
  6. Fick J, Söderström H, Lindberg RH, Phan C, Tysklind M, Larsson DGJ: Contamination of surface, ground, and drinking water from pharmaceutical production. Environmental Toxicology and Chemistry, 28, 2522–2527 (2009). doi:10.1897/09-073.1
  7. Larsson DGJ: Pollution from drug manufacturing: review and perspectives. Philosophical Transactions of the Royal Society London, Series B Biological Sciences, 369 (2014). doi:10.1098/rstb.2013.0571
  8. Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, Volume 5, Issue 648 (2014). doi: 10.3389/fmicb.2014.00648 [Paper link]
  9. Kristiansson E, Fick J, Janzon A, Grabic R, Rutgersson C, Weijdegård B, Söderström H, Larsson DGJ: Pyrosequencing of antibiotic-contaminated river sediments reveals high levels of resistance and gene transfer elements. PLoS ONE, Volume 6, e17038 (2011). doi:10.1371/journal.pone.0017038.
  10. Marathe NP, Regina VR, Walujkar SA, Charan SS, Moore ERB, Larsson DGJ, Shouche YS: A Treatment Plant Receiving Waste Water from Multiple Bulk Drug Manufacturers Is a Reservoir for Highly Multi-Drug Resistant Integron-Bearing Bacteria. PLoS ONE, Volume 8, e77310 (2013). doi:10.1371/journal.pone.0077310
  11. Bengtsson-Palme J, Larsson DGJAntibiotic resistance genes in the environment: prioritizing risks. Nature Reviews Microbiology, 13, 369 (2015). doi: 10.1038/nrmicro3399-c1 [Paper link]
  12. Bengtsson-Palme J, Angelin M, Huss M, Kjellqvist S, Kristiansson E, Palmgren H, Larsson DGJ, Johansson A: The human gut microbiome as a transporter of antibiotic resistance genes between continents. Antimicrobial Agents and Chemotherapy, 59, 10, 6551-6560 (2015). doi: 10.1128/AAC.00933-15 [Paper link]

Yesterday, a paper I co-authored with my colleagues Chandan Pal, Erik Kristiansson and Joakim Larsson on the co-occurences of resistance genes against antibiotics, biocides and metals in bacterial genomes and plasmids became published in BMC Genomics. In this paper (1) we utilize the publicly available, fully sequenced, genomes and plasmids in GenBank to investigate the co-occurence network of resistance genes, to better understand risks for co-selection for resistance against different types of compounds. In short, the findings of the paper are that:

  • ARGs are associated with BMRG-carrying bacteria and the co-selection potential of biocides and metals is specific towards certain antibiotics
  • Clinically important genera host the largest numbers of ARGs and BMRGs and those also have the highest co-selection potential
  • Bacteria isolated from human and domestic animal origins have the highest co-selection potential
  • Plasmids with co-selection potential tend to be conjugative and carry toxin-antitoxin systems
  • Mercury and QACs are potential co-selectors of ARGs on plasmids, however BMRGs are common on chromosomes and could still have indirect co-selection potential
  • 14 percent of bacteria and more than 70% of the plasmids completely lacked resistance genes

This analysis was possible thanks to the BacMet database of antibacterial biocide and metal resistance genes, published about two years ago (2). The visualization of the plasmid co-occurence network we ended up with can be seen below. Note the strong connection between the mercury resistance mer operon and the antibiotic resistance genes to the right.

On a side note, it is interesting to note that the underrepresentation of detoxification systems in marine environments we noted last year (3) still seems to hold for genomes (and particularly plasmids), supporting the genome streamlining hypothesis (4).


  1. Pal C, Bengtsson-Palme J, Kristiansson E, Larsson DGJ: Co-occurrence of resistance genes to antibiotics, biocides and metals reveals novel insights into their co-selection potential. BMC Genomics, 16, 964 (2015). doi: 10.1186/s12864-015-2153-5 [Paper link]
  2. Pal C, Bengtsson-Palme J, Rensing C, Kristiansson E, Larsson DGJ: BacMet: Antibacterial Biocide and Metal Resistance Genes Database. Nucleic Acids Research, 42, D1, D737-D743 (2014). doi: 10.1093/nar/gkt1252 [Paper link]
  3. Bengtsson-Palme J, Alm Rosenblad M, Molin M, Blomberg A: Metagenomics reveals that detoxification systems are underrepresented in marine bacterial communities. BMC Genomics, 15, 749 (2014). doi: 10.1186/1471-2164-15-749 [Paper link]
  4. Giovannoni SJ, Cameron TJ, Temperton B: Implications of streamlining theory for microbial ecology. ISME Journal, 8, 1553-1565 (2014).

I got a very nice little e-mail yesterday evening, which made me realize that when I posted the Metaxa 2.1 update, I forgot to thank and credit the wonderful Metaxa/Metaxa2 community who have contributed with input on which Metaxa2 features that they would like to see implemented. Particularly, I would like to thank Thomas Haverkamp who suggested the reference option, Åsa Sjöling who brainstormed what led to the metaxa2_uc tool with me, and everyone who have suggested various downstream analysis tricks that have got baked into the Metaxa2 Diversity Tools.

Within the Metaxa team I would like to specifically thank Kaisa Thorell (particularly for the --split_pairs option) and Martin Hartmann (who said that the software should obviously be able to detect which BLAST version that was installed), who keep pushing for features and ideas to make the software better. Thanks a lot to all of you, and have a nice weekend!

I am very happy to announce that our paper on the metagenomes of periphyton communities (1) have been accepted in Frontiers in Microbiology (Aquatic Microbiology section). This project has been one of my longest running, as it started as my master thesis in 2010 and has gone through several metamorphoses before hitting its final form.

Briefly, our main findings are that:

  1. Periphyton communities harbor an extraordinary diversity of organisms, including viruses, bacteria, algae, fungi, protozoans and metazoans
  2. Bacteria are by far the most abundant
  3. We find functional indicators of the biofilm form of life in periphyton involve genes coding for enzymes that catalyze the production and degradation of extracellular polymeric substances
  4. Genes encoding enzymes that participate in anaerobic pathways are found in the biofilms suggesting that anaerobic or low-oxygen micro-zones within the biofilms exist

Most of this work has been carried out by my colleague Kemal Sanli, who have been doing a wonderful job pulling this together, with the help of Henrik Nilsson and Martin Eriksson. It also deserves to be noted that this work was the starting point for the Metaxa software (2,3), which recently reached version 2.1.1.


  1. Sanli K, Bengtsson-Palme J, Nilsson RH, Kristiansson E, Alm Rosenblad M, Blanck H, Eriksson KM: Metagenomic sequencing of marine periphyton: Taxonomic and functional insights into biofilm communities. Frontiers in Microbiology, 6, 1192 (2015). doi: 10.3389/fmicb.2015.01192 [Paper link]
  2. Bengtsson J, Eriksson KM, Hartmann M, Wang Z, Shenoy BD, Grelet G, Abarenkov K, Petri A, Alm Rosenblad M, Nilsson RH: Metaxa: A software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek, 100, 3, 471-475 (2011). doi:10.1007/s10482-011-9598-6. [Paper link]
  3. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399 [Paper link]

Today I have released Metaxa2 version 2.1.1, containing a fix to an embarrassing bug in the new metaxa2_uc program (part of the Metaxa2 Diversity Tools). A late change of the names of the different modes of that tool had not propagated to all parts of the code, and therefore only the “model” mode was functional in the previous version. No other changes to the Metaxa2 package has been made in this update, which can be downloaded here.

I am very happy to announce that Metaxa2 version 2.1 has been released today. This new version brings a lot of important improvements to the Metaxa2 software (1), in particular by the introduction of the Metaxa2 Diversity Tools. This is the list of new features (further elaboration follows below):

  • The Metaxa2 Diversity Tools:
    • metaxa2_dc – a tool for collecting several .taxonomy.txt output files into one large abundance matrix, suitable for analysis in, e.g., R
    • metaxa2_rf – generates rarefaction curves based on the .taxonomy.txt output
    • metaxa2_si – species inference based on guessing species data from the other species present in the .taxonomy.txt output file
    • metaxa2_uc – a tool for determining if the community composition of a sample is significantly different from others through resampling analysis
  • Added a new detection mode for detection of multiple rRNA in the same sequence, e.g. a genome
  • Added the --reference option to improve the use of Metaxa2 as a tool to sort out host rRNA sequences from a dataset
  • Added the --split_pairs option causing Metaxa2 to output paired-end sequences into two separate files, which is nice for further analysis of rRNA reads
  • The default setting for the --align option has been changed to ‘none
  • Automatic detection of which BLAST package that is installed
  • Fixed a bug causing the last read of paired-end FASTA input to be ignored
  • Fixed an occasionally occurring BLAST+ related warning message
  • Fixed a bug that could cause the classifier to crash on highly divergent BLAST matches

The new version of Metaxa2 can be downloaded here, and for those interested I will spend the rest of this post outlining the new features.

Metaxa2 Diversity Tools
One often requested feature of Metaxa2 is the ability to further make simple analysis from the data after classification. The Metaxa2 Diversity Tools included in Metaxa2 2.1 is a seed for such an effort (although not close to a full-fledge community analysis package compared to QIIME (2) or Mothur (3)). The set currently consist of four tools

The Metaxa2 Data Collector (metaxa2_dc) is the simplest of them (but probably the most requested), designed to merge the output of several *.level_X.txt files from the Metaxa2 Taxonomic Traversal Tool into one large abundance matrix, suitable for further analysis in, for example, R. The Metaxa2 Species Inference tool (metaxa2_si) can be used to further infer taxon information on, for example, the species level at a lower reliability than what would be permitted by the Metaxa2 classifier, using a complementary algorithm. The idea is that is if only a single species is present in, e.g., a family and a read is assigned to this family, but not classified to the species level, that sequence will be inferred to the same species as the other reads, given that it has more than 97% sequence identity to its best reference match. This can be useful if the user really needs species or genus classifications but many organisms in the studied species group have similar rRNA sequences, making it hard for the Metaxa2 classifier to classify sequences to the species level.

The Metaxa2 Rarefaction analysis tool (metaxa2_rf) performs a rarefaction analysis based on the output from the Metaxa2 classifier, taking into account also the unclassified portion of rRNAs. The Metaxa2 Uniqueness of Community analyzer (metaxa2_uc), finally, allows analysis of whether the community composition of two or more samples or groups is significantly different. Using resampling of the community data, the null hypothesis that the taxonomic content of two communities is drawn from the same set of taxa (given certain abundances) is tested. All these tools are further described in the manual.

The genome mode
Metaxa2 has long been said not to be useful for predicting rRNA in longer sequences, such as full genomes or chromosomes, since it has traditionally only looked for a single rRNA hit. With Metaxa2 2.1, it is now possible to use Metaxa2 on longer sequences to detect multiple rRNA occurrences. To do this, you need to change the operating mode using the new --mode option to either ‘auto‘ or ‘genome‘. The auto mode will treat sequences longer than 2500 bp as “genome” sequences and look for multiple matches in these.

The reference mode
Another feature request that has been addressed in the new Metaxa2 version is the ability to filter out certain sequences from the data set. For example, you may want to exclude all rRNA sequences that are derived from to host organism, but keep the analysis of all other rRNA reads. This is now possible by supplying a file of reference rRNA sequences to exclude in FASTA format to the --reference option.

Experimental Usearch support
Finally, we have toyed around with support for Usearch (4) instead of BLAST (5) as the search algorithm for the classification step. However, this is far from fine-tuned and it is included as an experimental feature that you may use on your own risk! We recommend that you not use it for classification of data for publication yet. However, we are interested in how this works for you, so if you like you may test to run the Usearch algorithm in parallel with your BLAST-based analysis and compare the results and send me your input on how it works. You can read more about using Usearch at the end of the Metaxa2 manual.


  1. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  2. Caporaso JG, Kuczynski J, Stombaugh J et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336 (2010).
  3. Schloss PD, Westcott SL, Ryabin T et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541 (2009).
  4. Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461 (2010).
  5. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389–3402 (1997).