Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg | Wisconsin Institute for Discovery

Browsing Posts tagged DNA sequencing

I am very happy to announce that a first public beta version of Metaxa2 version 2.2 has been released today! This new version brings two big and a number of small improvements to the Metaxa2 software (1). The first major addition is the introduction of the Metaxa2 Database Builder, which allows the user to create custom databases for virtually any genetic barcoding region. The second addition, which is related to the first, is that the classifier has been rewritten to have a more solid mathematical foundation. I have been promising that these updates were coming “soon” for one and a half years, but finally the end-product is good enough to see some real world testing. Bear in mind though that this is still a beta version that could contain obscure bugs. Here follows a list of new features (with further elaboration on a few below):

  • The Metaxa2 Database Builder
  • Support for additional barcoding genes, virtually any genetic region can now be used for taxonomic classification in Metaxa2
  • The Metaxa2 database repository, which can be accessed through the new metaxa2_install_database tool
  • Improved classification scoring model for better clarity and sensitivity
  • A bundled COI database for athropods, showing off the capabilities of the database builder
  • Support for compressed input files (gzip, zip, bzip, dsrc)
  • Support for auto-detection of database locations
  • Added output of probable taxonomic origin for sequences with reliability scores at each rank, made possible by the updated classifier
  • Added the -x option for running only the extraction without the classification step
  • Improved memory handling for very large rRNA datasets in the classifier (millions of sequences)
  • This update also fixes a bug in the metaxa2_rf tool that could cause bias in very skewed datasets with small numbers of taxa

The new version of Metaxa2 can be downloaded here, and for those interested I will spend the rest of this post outlining the Metaxa2 Database Builder. The information below is also available in a slightly extended version in the software manual.

The major enhancement in Metaxa2 version 2.2 is the ability to use custom databases for classification. This means that the user can now make their own database for their own barcoding region of choice, or download additional databases from the Metaxa2 Database Repository. The selection of other databases is made through the “-g” option already existing in Metaxa2. As part of these changes, we have also updated the classification scoring model for better stringency and sensitivity across multiple databases and different genes. The old scoring system can still be used by specifying the –scoring_model option to “old”.

There are two different main operating modes of the Metaxa2 Database Builder, as well as a hybrid mode combining the features of the two other modes. The divergent and conserved modes work in almost completely different ways and deal with two different types of barcoding regions. The divergent mode is designed to deal with barcoding regions that exhibit fairly large variation between taxa within the same taxonomic domain. Such regions include, e.g., the eukaryotic ITS region, or the trnL gene used for plant barcoding. In the other mode – the conserved mode – a highly conserved barcoding region is expected (at least within the different taxonomic domains). Genes that fall into this category would be, e.g., the 16S SSU rRNA, and the bacterial rpoB gene. This option would most likely also be suitable for barcoding within certain groups of e.g. plants, where similarity of the barcoding regions can be expected to be high. There is also a third mode – the hybrid mode – that incorporates features of both the other. The hybrid mode is more experimental in nature, but could be useful in situations where both the other modes perform poorer than desired.

In the divergent (default) mode, the database builder will start by clustering the input sequences at 20% identity using USEARCH (2). All clusters generated from this process are then individually aligned using MAFFT (3). Those alignments are split into two regions, which are used to build two hidden Markov models for each cluster of sequences. These models will be less precise, but more sensitive than those generated in the conserved mode. In the divergent mode, the database builder will attempt to extract full-length sequences from the input data, but this may be less successful than in the conserved mode.

In the conserved mode, on the other hand, the database builder will first extract the barcoding region from the input sequences using models built from a reference sequence provided (see above) and the Metaxa2 extractor (1). It will then align all the extracted sequences using MAFFT and determine the conservation of each position in the alignment. When the criteria for degree of conservation are met, all conserved regions are extracted individually and are then re-aligned separately using MAFFT. The re-aligned sequences are used to build hidden Markov models representing the conserved regions with HMMER (4). In this mode, the classification database will only consist of the extracted full-length sequences.

In the hybrid mode, finally, the database builder will cluster the input sequences at 20% identity using USEARCH, and then proceed with the conserved mode approach on each cluster separately .

The actual taxonomic classification in Metaxa2 is done using a sequence database. It was shown in the original Metaxa2 paper that replacing the built-in database with a generic non-processed one was detrimental to performance in terms of accuracy (1). In the database builder, we have tried to incorporate some of the aspects of the manual database curation we did for the built-in database that can be automated. By default, all these filtration steps are turned off, but enabling them might drastically increase the accuracy of classifications based on the database.

To assess the accuracy of the constructed database, the Metaxa2 Database Builder allows for testing the detection ability and classification accuracy of the constructed database. This is done by sub-dividing the database sequences into subsets and rebuilding the database using a smaller (by default 90%), randomly selected, set of the sequence data (5). The remaining sequences (10% by default) are then classified using Metaxa2 with the subset database. The number of detections, and the numbers of correctly or incorrectly classified entries are recorded and averaged over a number of iterations (10 by default). This allows for obtaining a picture of the lower end of the accuracy of the database. However, since the evaluation only uses a subset of all sequences included in the full database, the performance of the full database actually constructed is likely to be slightly better. The evaluation can be turned on using the “–evaluate T” option.

Metaxa2 2.2 also introduces the database repository, from which the user can download additional databases for Metaxa2. To download new databases from the repository, the metaxa2_install_database command is used. This is a simple piece of software but requires internet access to function.

References

  1. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  2. Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461 (2010).
  3. Katoh K, Standley DM: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution, 30, 772–780 (2013).
  4. Eddy SR: Accelerated profile HMM searches. PLoS Computational Biology, 7, e1002195 (2011).
  5. Richardson RT, Bengtsson-Palme J, Johnson RM: Evaluating and Optimizing the Performance of Software Commonly Used for the Taxonomic Classification of DNA Sequence Data. Molecular Ecology Resources, 17, 4, 760–769 (2017). doi: 10.1111/1755-0998.12628

Yesterday, BMC Microbiology published a paper which I have co-authored with Joakim Forsell and his colleagues in at Umeå University. The paper (1) investigates the prevalence and subtype composition of Blastocystis – a eukaryotic microbe commonly present in the human intestine – among the 35 Swedish university students that we investigated for antibiotic resistance before and after travel to the Indian peninsula or Central Africa using shotgun metagenomics, and published in 2015 (2). In this paper, we used the same metagenomic data, but to assess the impact of travel on Blastocystis carriage and to understand the associations between Blastocystis and the bacterial gut microbiota. We found that 46% of the students carried Blastocystis before travel and 43% after. The two most commonly identified Blastocystis subtypes were ST3 and ST4, accounting for 20 of the 31 samples positive for Blastocystis. Interestingly, we detected no mixed subtype carriage in any individual, and all the ten individuals with a typable subtype before and after travel maintained their initial subtype.

Furthermore, we found that the composition of the gut bacterial community was not significantly altered between Blastocystis-carriers and non-carriers. Curiously, Blastocystis was accompanied with higher abundances of the bacterial genera Sporolactobacillus and Candidatus Carsonella. As perviously observed (3), Blastocystis carriage was positively associated with higher bacterial genus richness, and negatively correlated to the Bacteroides-driven enterotype. We, however, took this observation further, and could show that these associations were both largely driven by ST4 – a subtype commonly described in Europe – while the globally prevalent ST3 did not show such significant relationships.

The persistence of Blastocystis subtypes before and after travel indicates that long-term carriage of Blastocystis is common. The associations between Blastocystis and the bacterial microbiota found in this study could imply a link between Blastocystis and a healthy microbiota, as well as with diets high in vegetables. However, we cannot answer whether the associations between Blastocystis and the microbiota are resulting from the presence of Blastocystis per se, or are a prerequisite for colonization with Blastocystis, which are interesting opportunities for follow-up studies.

I think this type of data reuse for completely different questions is highly useful, and I am very happy that Joakim Forsell and his colleagues contacted me to hear if it was possible to do a Blastocystis screen of this data. The full paper can be read here.

References

  1. Forsell J, Bengtsson-Palme J, Angelin M, Johansson A, Evengård B, Granlund M: The relation between Blastocystis and the intestinal microbiota in Swedish travellers. BMC Microbiology, 17, 231 (2017). doi: 10.1186/s12866-017-1139-7 [Paper link]
  2. Bengtsson-Palme J, Angelin M, Huss M, Kjellqvist S, Kristiansson E, Palmgren H, Larsson DGJ, Johansson A: The human gut microbiome as a transporter of antibiotic resistance genes between continents. Antimicrobial Agents and Chemotherapy, 59, 10, 6551–6560 (2015). doi: 10.1128/AAC.00933-15 [Paper link]
  3. Andersen LO, Bonde I, Nielsen HB, Stensvold CR: A retrospective metagenomics approach to studying Blastocystis. FEMS Microbiology Ecology, 91, fiv072 (2015). doi: 10.1093/femsec/fiv072 [Paper link]

Today, Microbiome put online a paper lead-authored by my colleague Fanny Berglund – one of Erik Kristiansson’s brilliant PhD students – in which we identify 76 novel metallo-ß-lactamases (1). This feat was made possible because of a new computational method designed by Fanny, which uses a hidden Markov model based on known B1 metallo-ß-lactamases. We analyzed over 10,000 bacterial genomes and plasmids and over 5 terabases of metagenomic data and could thereby predict 76 novel genes. These genes clustered into 59 new families of metallo-β-lactamases (given a 70% identity threshold). We also verified the functionality of 21 of these genes experimentally, and found that 18 were able to hydrolyze imipenem when inserted into Escherichia coli. Two of the novel genes contained atypical zinc-binding motifs in their active sites. Finally, we show that the B1 metallo-β-lactamases can be divided into five major groups based on their phylogenetic origin. It seems that nearly all of the previously characterized mobile B1 β-lactamases we identify in this study were likely to have originated from chromosomal genes present in species within the Proteobacteria, particularly Shewanella spp.

This study more than doubles the number of known B1 metallo-β-lactamases. As with the study by Boulund et al. (2) which we published last month on computational discovery of novel fluoroquinolone resistance genes (which used a very similar approach but on a completely different type of genes), this study also supports the hypothesis that environmental bacterial communities act as sources of uncharacterized antibiotic resistance genes (3-7). Fanny have done a fantastic job on this paper, and I highly recommend reading it in its entirety (it’s open access so you have virtually no excuse not to). It can be found here.

References

  1. Berglund F, Marathe NP, Österlund T, Bengtsson-Palme J, Kotsakis S, Flach C-F, Larsson DGJ, Kristiansson E: Identification of 76 novel B1 metallo-β-lactamases through large-scale screening of genomic and metagenomic data. Microbiome, 5, 134 (2017). doi: 10.1186/s40168-017-0353-8
  2. Boulund F, Berglund F, Flach C-F, Bengtsson-Palme J, Marathe NP, Larsson DGJ, Kristiansson E: Computational discovery and functional validation of novel fluoroquinolone resistance genes in public metagenomic data sets. BMC Genomics, 18, 682 (2017). doi: 10.1186/s12864-017-4064-0
  3. Bengtsson-Palme J, Larsson DGJ: Antibiotic resistance genes in the environment: prioritizing risks. Nature Reviews Microbiology, 13, 369 (2015). doi: 10.1038/nrmicro3399-c1
  4. Allen HK, Donato J, Wang HH et al.: Call of the wild: antibiotic resistance genes in natural environments. Nature Reviews Microbiology, 8, 251–259 (2010).
  5. Berendonk TU, Manaia CM, Merlin C et al.: Tackling antibiotic resistance: the environmental framework. Nature Reviews Microbiology, 13, 310–317 (2015).
  6. Martinez JL: Bottlenecks in the transferability of antibiotic resistance from natural ecosystems to human bacterial pathogens. Frontiers in Microbiology, 2, 265 (2011).
  7. Finley RL, Collignon P, Larsson DGJ et al.: The scourge of antibiotic resistance: the important role of the environment. Clinical Infectious Diseases, 57, 704–710 (2013).

Mitochondrial DNA Part B today published a mitochondrial genome announcement paper (1) in which I was involved in doing the assemblies and annotating them. The paper describes the mitogenome of Calanus glacialis, a marine planktonic copepod, which is a keystone species in the Arctic Ocean. The mitogenome is 20,674 bp long, and includes 13 protein-coding genes, 2 rRNA genes and 22 tRNA genes. While this is of course note a huge paper, we believe that this new resource will be of interest in understanding the structure and dynamics of C. glacialis populations. The main work in this paper has been carried out by Marvin Choquet at Nord University in Bodø, Norway. So hats off to him for great work, thanks Marvin! The paper can be read here.

Reference

  1. Choquet M, Alves Monteiro HJ, Bengtsson-Palme J, Hoarau G: The complete mitochondrial genome of the copepod Calanus glacialis. Mitochondrial DNA Part B, 2, 2, 506–507 (2017). doi: 10.1080/23802359.2017.1361357 [Paper link]

Today, a review paper which I wrote together with Joakim Larsson and Erik Kristiansson was published in Journal of Antimicrobial Chemotherapy (1). We have for a long time used metagenomic DNA sequencing to study antibiotic resistance in different environments (2-6), including in the human microbiota (7). Generally, our ultimate purpose has been to assess the risks to human health associated with resistance genes in the environment. However, a multitude of methods exist for metagenomic data analysis, and over the years we have learned that not all methods are suitable for the investigation of resistance genes for this purpose. In our review paper, we describe and discuss current methods for sequence handling, mapping to databases of resistance genes, statistical analysis and metagenomic assembly. We also provide an overview of important considerations related to the analysis of resistance genes, and end by recommending some of the currently used tools, databases and methods that are best equipped to inform research and clinical practice related to antibiotic resistance (see the figure from the paper below). We hope that the paper will be useful to researchers and clinicians interested in using metagenomic sequencing to better understand the resistance genes present in environmental and human-associated microbial communities.

References

  1. Bengtsson-Palme J, Larsson DGJ, Kristiansson E: Using metagenomics to investigate human and environmental resistomes. Journal of Antimicrobial Chemotherapy, advance access (2017). doi: 10.1093/jac/dkx199 [Paper link]
  2. Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014). doi: 10.3389/fmicb.2014.00648 [Paper link]
  3. Lundström S, Östman M, Bengtsson-Palme J, Rutgersson C, Thoudal M, Sircar T, Blanck H, Eriksson KM, Tysklind M, Flach C-F, Larsson DGJ: Minimal selective concentrations of tetracycline in complex aquatic bacterial biofilms. Science of the Total Environment, 553, 587–595 (2016). doi: 10.1016/j.scitotenv.2016.02.103 [Paper link]
  4. Bengtsson-Palme J, Hammarén R, Pal C, Östman M, Björlenius B, Flach C-F, Kristiansson E, Fick J, Tysklind M, Larsson DGJ: Elucidating selection processes for antibiotic resistance in sewage treatment plants using metagenomics. Science of the Total Environment, 572, 697–712 (2016). doi: 10.1016/j.scitotenv.2016.06.228 [Paper link]
  5. Pal C, Bengtsson-Palme J, Kristiansson E, Larsson DGJ: The structure and diversity of human, animal and environmental resistomes. Microbiome, 4, 54 (2016). doi: 10.1186/s40168-016-0199-5 [Paper link]
  6. Flach C-F, Pal C, Svensson CJ, Kristiansson E, Östman M, Bengtsson-Palme J, Tysklind M, Larsson DGJ: Does antifouling paint select for antibiotic resistance? Science of the Total Environment, 590–591, 461–468 (2017). doi: 10.1016/j.scitotenv.2017.01.213 [Paper link]
  7. Bengtsson-Palme J, Angelin M, Huss M, Kjellqvist S, Kristiansson E, Palmgren H, Larsson DGJ, Johansson A: The human gut microbiome as a transporter of antibiotic resistance genes between continents. Antimicrobial Agents and Chemotherapy, 59, 10, 6551–6560 (2015). doi: 10.1128/AAC.00933-15 [Paper link]

In March, I attended a workshop on the role of NGS technologies in the coordinated action plan against antimicrobial resistance, organised by JRC in Italy. I was, together with 14 other experts, invited to discuss where and how sequencing can be used to investigate and manage antibiotic resistance. The report from the workshop has just recently been published, and is available here. There will be follow-up activities on this workshop, which I also hope that I will be able to participate in, since this is an important and very interesting pet topic of mine.

Reference

  • Angers A, Petrillo P, Patak, A, Querci M, Van den Eede G: The Role and Implementation of Next-Generation Sequencing Technologies in the Coordinated Action Plan against Antimicrobial Resistance. JRC Conference and Workshop Report, EUR 28619 (2017). doi: 10.2760/745099 [Link]
  • Sorry for the late notice, but if you have half an hour to spare later today I will discuss our findings on resistance genes in Beijing air on a webinar organised by Healthcare Without Harm on “The (un)recognised pathways of AMR: Air pollution and food“. Tune in a few minutes before 16.00 CEST!

    After the usual (1,2) long wait between acceptance and publication, Science of the Total Environment today put a paper online in which I have played a role in the bioinformatic analysis. In the paper, we investigate whether antifouling paint containing copper and zinc could co-select for antibiotic resistance, using microbiological methods and metagenomic sequencing (3).

    In this work, we have studied marine microbial biofilms allowed to grow on surfaces painted with antifouling paint submerged in sea water. Such antifouling paints often contain metals that potentially could co-select for antibiotic resistance (4). Using microbiological culturing, we found that the heavy-metal based paint co-selected for bacteria resistant to tetracycline. However, the paint did not enrich neither the total abundance of known mobile antibiotic resistance genes nor the abundance of tetracycline resistance genes in the biofilm communities. Rather, the communities from the painted surfaces were enriched for bacteria with genetic profiles suggesting increased capacity for extrusion of antibiotics via RND efflux systems. In addition, these communities were also enriched for genes involved in mobilization of DNA, such as ISCR transposases and integrases. Finally, the biofilm communities from painted surfaces displayed lower taxonomic diversity and were at the same time enriched for Gammaproteobacteria. The paper builds on our previous work in which we identify certain co-occurences between genes conferring metal and antibiotic resistance (4). However, the findings of this paper do not lend support for that mobile resistance genes are co-selected for by copper and zinc in the marine environment – rather the increase in antibiotic resistance seem to be due to taxonomic changes and cross-resistance mechanisms. The entire paper can be read here.

    References

    1. Bengtsson-Palme J: Published paper: Community MSCs for tetracycline. http://microbiology.se/2016/03/22/published-paper-community-mscs-for-tetracycline/
    2. Bengtsson-Palme J: Published paper: Antibiotic resistance in sewage treatment plants . http://microbiology.se/2016/08/17/published-paper-antibiotic-resistance-in-sewage-treatment-plants/
    3. Flach C-F, Pal C, Svensson CJ, Kristiansson E, Östman M, Bengtsson-Palme J, Tysklind M, Larsson DGJ: Does antifouling paint select for antibiotic resistance? Science of the Total Environment, in press (2017). doi: 10.1016/j.scitotenv.2017.01.213 [Paper link]
    4. Pal C, Bengtsson-Palme J, Kristiansson E, Larsson DGJ: Co-occurrence of resistance genes to antibiotics, biocides and metals reveals novel insights into their co-selection potential. BMC Genomics, 16, 964 (2015). doi: 10.1186/s12864-015-2153-5 [Paper link]

    As the 8th Next Generation Sequencing Congress in London is drawing to a close as I write this, I have a few reflections that might warrant sharing. The first thing that has been apparent this year compared to the two previous times I have visited the event (in 2012 and 2013) is that there was very little talk about where Illumina sequencing is heading next. Instead the discussion was about the applications of Illumina sequencing in the clinical setting; so apparently this is now so mainstream that we only expect slow progress towards longer reads. Apart from that, Illumina is a completed, mature technology. Instead, the flashlight is now pointing entirely towards long-read sequencing (PacBio, NanoPore) as the next big thing. However, the excitement around these technologies has also sort of faded compared to in 2013 when they were soon-to-arrive. Indeed, it seems like there’s not much to be excited about in the sequencing field at the moment, or at least Oxford Global (who are hosting the conference) has failed to get these technologies here.

    What also strikes me is the vast amounts of talk about RNAseq of cancer cells. The scope of this event has narrowed dramatically in the past three years. Which makes me substantially less interested in returning next year. If there is not much to be excited about, and the focus is only on cancer sequencing – despite the human microbiota being a very hot topic at the moment – what is the reason for non-cancer researchers to come to the event? There will need to be a stark shift towards another direction of this event if the arrangers want it to remain a broad NGS event. Otherwise, they may just as well go all in and rename the event the Next Generation Sequencing of Cancer Congress. But I hope they choose to widen the scope again; conferences discussing technology as a foundation for a variety of applications are important meeting points and spawning grounds for novel ideas.

    Yesterday, Molecular Ecology Resources put online an unedited version of a recent paper which I co-authored. This time, Rodney Richardson at Ohio State University has made a tremendous work of evaluating three taxonomic classification software – the RDP Naïve Bayesian Classifier, RTAX and UTAX – on a set of DNA barcoding regions commonly used for plants, namely the ITS2, and the matK, rbcL, trnL and trnH genes.

    In the paper (1), we discuss the results, merits and limitations of the classifiers. In brief, we found that:

    • There is a considerable trade-off between accuracy and sensitivity for the classifiers tested, which indicates a need for improved sequence classification tools (2)
    • UTAX was superior with respect to error rate, but was exceedingly stringent and thus suffered from a low assignment rate
    • The RDP Naïve Bayesian Classifier displayed high sensitivity and low error at the family and order levels, but had a genus-level error rate of 9.6 percent
    • RTAX showed high sensitivity at all taxonomic ranks, but at the same time consistently produced the high error rates
    • The choice of locus has significant effects on the classification sensitivity of all tested tools
    • All classifiers showed strong relationships between database completeness, classification sensitivity and classification accuracy

    We believe that the methods of comparison we have used are simple and robust, and thereby provides a methodological and conceptual foundation for future software evaluations. On a personal note, I will thoroughly enjoy working with Rodney and Reed again; I had a great time discussing the ins and outs of taxonomic classification with them! The paper can be found here.

    References and notes

    1. Richardson RT, Bengtsson-Palme J, Johnson RM: Evaluating and Optimizing the Performance of Software Commonly Used for the Taxonomic Classification of DNA Sequence Data. Molecular Ecology Resources, Early view (2016). doi: 10.1111/1755-0998.12628 [Paper link]
    2. This is something that several classifiers also showed in the evaluation we did for the Metaxa2 paper (3). Interestingly enough, Metaxa2 is better at maintaining high accuracy also as sensitivity is increased.
    3. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399 [Paper link]