Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg | Wisconsin Institute for Discovery

Browsing Posts tagged Taxonomy

On Friday, Molecular Ecology Resources put online Christian Wurzbacher’s latest paper, of which I am also a coauthor. The paper presents three sets of general primers that allow for amplification of the complete ribosomal operon from the ribosomal tandem repeats, covering all the ribosomal markers (ETS, SSU, ITS1, 5.8S, ITS2, LSU, and IGS) (1). This paper is important because it introduces a technique to utilize third generation sequencing (PacBio and Nanopore) to generate high‐quality reference data (equivalent or better than Sanger sequencing) in a high‐throughput manner. The paper shows that the quality of the Nanopore generated sequences was 99.85%, which is comparable with the 99.78% accuracy described for Sanger sequencing.

My main contribution to this paper is the consensus sequence generation script – Consension – which is available from my software page. Importantly, there are huge gaps in the reference databases we use for taxonomic classification and this method will facilitate the integration of reference data from all of the ribosomal markers. We hope that this work will stimulate large-scale generation of ribosomal reference data covering several marker genes, linking previously spread-out information together.

Reference

  1. Wurzbacher C, Larsson E, Bengtsson-Palme J, Van den Wyngaert S, Svantesson S, Kristiansson E, Kagami M, Nilsson RH: Introducing ribosomal tandem repeat barcoding for fungi. Molecular Ecology Resources, Accepted article (2018). doi: 10.1111/1755-0998.12944 [Paper link]

Last week, I uploaded a new database to the Metaxa2 Database Repository, called DAIRYdb. DAIRYdb (1) is a manually curated reference database for 16S rRNA amplicon sequences from dairy products. Significant efforts have been put into improving annotation algorithms, such as Metaxa2 (2), while less attention has been put into curation of reliable and consistent databases (3). Previous studies have shown that databases restricted to the studied environment improve unambiguous taxonomy annotation to the species level, thanks to consistent taxonomy, lack of blanks and reduced competition between different reference taxonomies (4-5). The usage of DAIRYdb in combination with different classification tools allows taxonomy annotation accuracy of over 90% at species level for microbiome samples from dairy products, where species identification is mandatory due to the affiliation to few closely related genera of most dominant lactic acid bacteria.

The database can be added to your Metaxa2 (version 2.2 or later) installation by using the following command:

metaxa2_install_database -g SSU_DAIRYdb_v1.1.2

Further adaptations of the DAIRYdb can be found on GitHub and the preprint has been deposited in BioRxiv (1). DAIRYdb was developed by Marco Meola, Etienne Rifa and their collaborators, who also provided most of the text for this post. Thanks Marco for this excellent addition to the database collection!

References

  1. Meola M, Rifa E, Shani N, Delbes C, Berthoud H, Chassard C: DAIRYdb: A manually curated gold standard reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products. bioRxiv, 386151 (2018). doi: 10.1101/386151
  2. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399
  3. Edgar RC: Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences. PeerJ, 6, e4652 (2018). doi: 10.7717/peerj.4652
  4. Ritari J, Salojärvi J, Last L, de Vos WM: Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database. BMC Genomics, 16, 1, 1056 (2015). doi: 10.1186/s12864-015-2265-y
  5. Newton ILG, Roeselers G: The effect of training set on the classification of honey bee gut microbiota using the naïve bayesian classifier. BMC Microbiology, 12, 1, 221 (2012). doi: 10.1186/1471-2180-12-221

I’m really late at this ball for a number of reasons, but last week Nature published our paper on the structure and function of the global topsoil microbiome (1). This paper has a long story, but in short I got contacted by Mohammad Bahram (the first author) about two years ago about a project using metagenomic sequencing to look at a lot of soil samples for patterns of antibiotic resistance gene abundances and diversity. The project had made the interesting discovery that resistance gene abundances were linked to the ratio of fungi and bacteria (so that more fungi was linked to more resistance genes). During the following year, we together worked on deciphering these discoveries, which are now published in Nature. The paper also deals with the taxonomic patterns linked to geography (1), but as evident from the above, my main contribution here has been on the antibiotic resistance side.

In short, we find that:

  • Bacterial diversity is highest in temperate habitats, and lower both closer to the equator and the poles
  • For bacteria, the diversity of biological functions follows the same pattern, but for fungi, the functional diversity is higher closer to the poles and the equator
  • Higher abundance of fungi is linked to higher abundance and diversity of antibiotic resistance genes. Specifically, this is related to known antibiotic producing fungal lineages, such as Penicillium and Oidiodendron. There also seems to be a link between the Actinobacteria, encompassing the antibiotic-producing bacterial genus of Streptomyces and higher resistance gene diversity.
  • Similar relationships between the fungus-like Oomycetes and resistance genes was also found in ocean samples from the Tara Oceans project (2)

The results of this study indicate that both environmental filtering and niche differentiation determine soil microbial composition, and that the role of dispersal limitation is minor at this scale. Soil pH and precipitation seems to be the strongest drivers of community composition. Furthermore, we interpret our data to reveal that inter-kingdom antagonism is important in structuring microbial communities. This speaks against the notion put forward that antibiotic resistance genes might not have a resistance function in natural settings (3). That said, the most likely explanation here is probably a bit of both warfare and repurposing of genes. Soil seems to be the largest untapped source of resistance genes for human pathogens (4), and the finding that natural antagonism may be driving resistance gene diversification and enrichment may be important for future management of environmental antibiotic resistance (5,6).

It was really great to work with Mohammad and his team, and I sure hope that we will collaborate again in the future. The entire paper can be found in the issue of Nature coming out this week, and is already online at Nature’s website.

References

  1. Bahram M°, Hildebrand F°, Forslund SK, Anderson JL, Soudzilovskaia NA, Bodegom PM, Bengtsson-Palme J, Anslan S, Coelho LP, Harend H, Huerta-Cepas J, Medema MH, Maltz MR, Mundra S, Olsson PA, Pent M, Põlme S, Sunagawa S, Ryberg M, Tedersoo L, Bork P: Structure and function of the global topsoil microbiome. Nature, 560, 233–237 (2018). doi: 10.1038/s41586-018-0386-6
  2. Sunagawa S et al. Structure and function of the global ocean microbiome. Science 348, 6237, 1261359 (2015). doi: 10.1126/science.1261359
  3. Aminov RI: The role of antibiotics and antibiotic resistance in nature. Environmental Microbiology, 11, 12, 2970-2988 (2009). doi: 10.1111/j.1462-2920.2009.01972.x
  4. Bengtsson-Palme J: The diversity of uncharacterized antibiotic resistance genes can be predicted from known gene variants – but not always. Microbiome, 6, 125 (2018). doi: 10.1186/s40168-018-0508-2
  5. Bengtsson-Palme J, Kristiansson E, Larsson DGJ: Environmental factors influencing the development and spread of antibiotic resistance. FEMS Microbiology Reviews, 42, 1, 68–80 (2018). doi: 10.1093/femsre/fux053
  6. Larsson DGJ, Andremont A, Bengtsson-Palme J, Brandt KK, de Roda Husman AM, Fagerstedt P, Fick J, Flach C-F, Gaze WH, Kuroda M, Kvint K, Laxminarayan R, Manaia CM, Nielsen KM, Ploy M-C, Segovia C, Simonet P, Smalla K, Snape J, Topp E, van Hengel A, Verner-Jeffreys DW, Virta MPJ, Wellington EM, Wernersson A-S: Critical knowledge gaps and research needs related to the environmental dimensions of antibiotic resistance. Environment International, 117, 132–138 (2018).

A few days ago I posted about that Bioinformatics had published our paper on the Metaxa2 Database Builder (1). Today, I am happy to report that PeerJ has published the first paper in which the database builder is used to create a new Metaxa2 (2) database! My colleagues at Ohio State University has used the software to build a database for the COI gene (3), which is commonly used in arthropod barcoding. The used region was extracted from COI sequences from arthropod whole mitochondrion genomes, and employed to create a database containing sequences from all major arthropod clades, including all insect orders, all arthropod classes and the Onychophora, Tardigrada and Mollusca outgroups.

Similar to what we did in our evaluation of taxonomic classifiers used on non-rRNA barcoding regions (4), we performed a cross-validation analysis to characterize the relationship between the Metaxa2 reliability score, an estimate of classification confidence, and classification error probability. We used this analysis to select a reliability score threshold which minimized error. We then estimated classification sensitivity, false discovery rate and overclassification, the propensity to classify sequences from taxa not represented in the reference database.

Since the database builder was still in its early inception stages when we started doing this work, the software itself saw several improvements because of this project. We believe that our work on the COI database, as well as on the recently released database builder software, will help researchers in designing and evaluating classification databases for metabarcoding on arthropods and beyond. The database is included in the new Metaxa2 2.2 release, and is also downloadable from the Metaxa2 Database Repository (1). The open access paper can be found here.

References

  1. Bengtsson-Palme J, Richardson RT, Meola M, Wurzbacher C, Tremblay ED, Thorell K, Kanger K, Eriksson KM, Bilodeau GJ, Johnson RM, Hartmann M, Nilsson RH: Metaxa2 Database Builder: Enabling taxonomic identification from metagenomic and metabarcoding data using any genetic marker. Bioinformatics, advance article (2018). doi: 10.1093/bioinformatics/bty482
  2. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399
  3. Richardson RT, Bengtsson-Palme J, Gardiner MM, Johnson RM: A reference cytochrome c oxidase subunit I database curated for hierarchical classification of arthropod metabarcoding data. PeerJ, 6, e5126 (2018). doi: 10.7717/peerj.5126
  4. Richardson RT, Bengtsson-Palme J, Johnson RM: Evaluating and Optimizing the Performance of Software Commonly Used for the Taxonomic Classification of DNA Sequence Data. Molecular Ecology Resources, 17, 4, 760–769 (2017). doi: 10.1111/1755-0998.12628

One of the questions I have received regarding Metaxa2 is if it is possible to use it on other DNA barcodes. My answer has been “technically, yes, but it is a very cumbersome process of creating a custom database for every additional barcode”. Not anymore, the newly introduced Metaxa2 Database Builder makes this process automatic, with the user just supplying a FASTA file of sequences from the region in question and a file containing the taxonomy information for the sequences (in GenBank, NSD XML, Metaxa2 or SILVA-style formats). The preprint (1) has been out for some time, but today Bioinformatics published the paper describing the software (2).

The paper not only details how the database builder works, but also shows that it is working on a number of different barcoding regions, albeit with different results in terms of accuracy. Still, even with seemingly high misclassification rates for some DNA barcodes, the software performs better than a simple BLAST-based taxonomic assignment (76.5% vs. 41.4% correct classifications for matK, and 76.2% vs. 45.1% for tnrL). The database builder has already found use in building a COI database for anthropods (3), and we envision a range of uses in the near future.

As the paper is now published, I have also moved the Metaxa2 software (4) from beta-status to a full-worthy version 2.2 update. Hopefully, this release should be bug free, but my experience is that when the community gets their hands of the software they tend to discover things our team has missed. I would like to thank the entire team working on this, particularly Rodney Richardson (who initiated this entire thing) and Henrik Nilsson. The software can be downloaded here. Happy barcoding!

References

  1. Bengtsson-Palme J, Richardson RT, Meola M, Wurzbacher C, Tremblay ED, Thorell K, Kanger K, Eriksson KM, Bilodeau GJ, Johnson RM, Hartmann M, Nilsson RH: Taxonomic identification from metagenomic or metabarcoding data using any genetic marker. bioRxiv 253377 (2018). doi: 10.1101/253377 [Link]
  2. Bengtsson-Palme J, Richardson RT, Meola M, Wurzbacher C, Tremblay ED, Thorell K, Kanger K, Eriksson KM, Bilodeau GJ, Johnson RM, Hartmann M, Nilsson RH: Metaxa2 Database Builder: Enabling taxonomic identification from metagenomic and metabarcoding data using any genetic marker. Bioinformatics, advance article (2018). doi: 10.1093/bioinformatics/bty482 [Paper link]
  3. Richardson RT, Bengtsson-Palme J, Gardiner MM, Johnson RM: A reference cytochrome c oxidase subunit I database curated for hierarchical classification of arthropod metabarcoding data. PeerJ Preprints, 6, e26662v1 (2018). doi: 10.7287/peerj.preprints.26662v1 [Link]
  4. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399 [Paper link]

MycoKeys earlier this week published a paper describing the results of a workshop in Aberdeen in April last year, where we refined annotations for fungal ITS sequences from the built environment (1). This was a follow-up on a workshop in May 2016 (2) and the results have been implemented in the UNITE database and shared with other online resources. The paper has also been highlighted at microBEnet. I have very little time to further comment on this at this very moment, but I believe, as I wrote last time, that distributed initiatives like this (and the ones I have been involved in in the past (3,4)) serve a very important purpose for establishing better annotation of sequence data (5). The full paper can be found here.

References

  1. Nilsson RH, Taylor AFS, Adams RI, Baschien C, Bengtsson-Palme J, Cangren P, Coleine C, Daniel H-M, Glassman SI, Hirooka Y, Irinyi L, Iršenaite R, Martin-Sánchez PM, Meyer W, Oh S-O, Sampaio JP, Seifert KA, Sklenár F, Stubbe D, Suh S-O, Summerbell R, Svantesson S, Unterseher M, Visagie CM, Weiss M, Woudenberg J, Wurzbacher C, Van den Wyngaert S, Yilmaz N, Yurkov A, Kõljalg U, Abarenkov K: Annotating public fungal ITS sequences from the built environment according to the MIxS-Built Environment standard – a report from an April 10-11, 2017 workshop (Aberdeen, UK). MycoKeys, 28, 65–82 (2018). doi: 10.3897/mycokeys.28.20887 [Paper link]
  2. Abarenkov K, Adams RI, Laszlo I, Agan A, Ambrioso E, Antonelli A, Bahram M, Bengtsson-Palme J, Bok G, Cangren P, Coimbra V, Coleine C, Gustafsson C, He J, Hofmann T, Kristiansson E, Larsson E, Larsson T, Liu Y, Martinsson S, Meyer W, Panova M, Pombubpa N, Ritter C, Ryberg M, Svantesson S, Scharn R, Svensson O, Töpel M, Untersehrer M, Visagie C, Wurzbacher C, Taylor AFS, Kõljalg U, Schriml L, Nilsson RH: Annotating public fungal ITS sequences from the built environment according to the MIxS-Built Environment standard – a report from a May 23-24, 2016 workshop (Gothenburg, Sweden). MycoKeys, 16, 1–15 (2016). doi: 10.3897/mycokeys.16.10000
  3. Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Molecular Ecology, 22, 21, 5271–5277 (2013). doi: 10.1111/mec.12481
  4. Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity, 67, 1, 11–19 (2014). doi: 10.1007/s13225-014-0291-8
  5. Bengtsson-Palme J, Boulund F, Edström R, Feizi A, Johnning A, Jonsson VA, Karlsson FH, Pal C, Pereira MB, Rehammar A, Sánchez J, Sanli K, Thorell K: Strategies to improve usability and preserve accuracy in biological sequence databases. Proteomics, Early view (2016). doi: 10.1002/pmic.201600034

Due to an extremely embarrassing for-loop error in the classifier of the most recent Metaxa2 beta (beta 8), which was released a few weeks ago, the classifier often would (on certain platforms and configurations) enter an endless loop and hang. I apologize for this mistake, which has been corrected in the new beta 9 released today, available from this download link. No other changes have been made since the previous version. Thanks for your patience (and thanks Kaisa Thorell for first bringing my attention the error!)

I am very happy to announce that a first public beta version of Metaxa2 version 2.2 has been released today! This new version brings two big and a number of small improvements to the Metaxa2 software (1). The first major addition is the introduction of the Metaxa2 Database Builder, which allows the user to create custom databases for virtually any genetic barcoding region. The second addition, which is related to the first, is that the classifier has been rewritten to have a more solid mathematical foundation. I have been promising that these updates were coming “soon” for one and a half years, but finally the end-product is good enough to see some real world testing. Bear in mind though that this is still a beta version that could contain obscure bugs. Here follows a list of new features (with further elaboration on a few below):

  • The Metaxa2 Database Builder
  • Support for additional barcoding genes, virtually any genetic region can now be used for taxonomic classification in Metaxa2
  • The Metaxa2 database repository, which can be accessed through the new metaxa2_install_database tool
  • Improved classification scoring model for better clarity and sensitivity
  • A bundled COI database for athropods, showing off the capabilities of the database builder
  • Support for compressed input files (gzip, zip, bzip, dsrc)
  • Support for auto-detection of database locations
  • Added output of probable taxonomic origin for sequences with reliability scores at each rank, made possible by the updated classifier
  • Added the -x option for running only the extraction without the classification step
  • Improved memory handling for very large rRNA datasets in the classifier (millions of sequences)
  • This update also fixes a bug in the metaxa2_rf tool that could cause bias in very skewed datasets with small numbers of taxa

The new version of Metaxa2 can be downloaded here, and for those interested I will spend the rest of this post outlining the Metaxa2 Database Builder. The information below is also available in a slightly extended version in the software manual.

The major enhancement in Metaxa2 version 2.2 is the ability to use custom databases for classification. This means that the user can now make their own database for their own barcoding region of choice, or download additional databases from the Metaxa2 Database Repository. The selection of other databases is made through the “-g” option already existing in Metaxa2. As part of these changes, we have also updated the classification scoring model for better stringency and sensitivity across multiple databases and different genes. The old scoring system can still be used by specifying the –scoring_model option to “old”.

There are two different main operating modes of the Metaxa2 Database Builder, as well as a hybrid mode combining the features of the two other modes. The divergent and conserved modes work in almost completely different ways and deal with two different types of barcoding regions. The divergent mode is designed to deal with barcoding regions that exhibit fairly large variation between taxa within the same taxonomic domain. Such regions include, e.g., the eukaryotic ITS region, or the trnL gene used for plant barcoding. In the other mode – the conserved mode – a highly conserved barcoding region is expected (at least within the different taxonomic domains). Genes that fall into this category would be, e.g., the 16S SSU rRNA, and the bacterial rpoB gene. This option would most likely also be suitable for barcoding within certain groups of e.g. plants, where similarity of the barcoding regions can be expected to be high. There is also a third mode – the hybrid mode – that incorporates features of both the other. The hybrid mode is more experimental in nature, but could be useful in situations where both the other modes perform poorer than desired.

In the divergent (default) mode, the database builder will start by clustering the input sequences at 20% identity using USEARCH (2). All clusters generated from this process are then individually aligned using MAFFT (3). Those alignments are split into two regions, which are used to build two hidden Markov models for each cluster of sequences. These models will be less precise, but more sensitive than those generated in the conserved mode. In the divergent mode, the database builder will attempt to extract full-length sequences from the input data, but this may be less successful than in the conserved mode.

In the conserved mode, on the other hand, the database builder will first extract the barcoding region from the input sequences using models built from a reference sequence provided (see above) and the Metaxa2 extractor (1). It will then align all the extracted sequences using MAFFT and determine the conservation of each position in the alignment. When the criteria for degree of conservation are met, all conserved regions are extracted individually and are then re-aligned separately using MAFFT. The re-aligned sequences are used to build hidden Markov models representing the conserved regions with HMMER (4). In this mode, the classification database will only consist of the extracted full-length sequences.

In the hybrid mode, finally, the database builder will cluster the input sequences at 20% identity using USEARCH, and then proceed with the conserved mode approach on each cluster separately .

The actual taxonomic classification in Metaxa2 is done using a sequence database. It was shown in the original Metaxa2 paper that replacing the built-in database with a generic non-processed one was detrimental to performance in terms of accuracy (1). In the database builder, we have tried to incorporate some of the aspects of the manual database curation we did for the built-in database that can be automated. By default, all these filtration steps are turned off, but enabling them might drastically increase the accuracy of classifications based on the database.

To assess the accuracy of the constructed database, the Metaxa2 Database Builder allows for testing the detection ability and classification accuracy of the constructed database. This is done by sub-dividing the database sequences into subsets and rebuilding the database using a smaller (by default 90%), randomly selected, set of the sequence data (5). The remaining sequences (10% by default) are then classified using Metaxa2 with the subset database. The number of detections, and the numbers of correctly or incorrectly classified entries are recorded and averaged over a number of iterations (10 by default). This allows for obtaining a picture of the lower end of the accuracy of the database. However, since the evaluation only uses a subset of all sequences included in the full database, the performance of the full database actually constructed is likely to be slightly better. The evaluation can be turned on using the “–evaluate T” option.

Metaxa2 2.2 also introduces the database repository, from which the user can download additional databases for Metaxa2. To download new databases from the repository, the metaxa2_install_database command is used. This is a simple piece of software but requires internet access to function.

References

  1. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  2. Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461 (2010).
  3. Katoh K, Standley DM: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution, 30, 772–780 (2013).
  4. Eddy SR: Accelerated profile HMM searches. PLoS Computational Biology, 7, e1002195 (2011).
  5. Richardson RT, Bengtsson-Palme J, Johnson RM: Evaluating and Optimizing the Performance of Software Commonly Used for the Taxonomic Classification of DNA Sequence Data. Molecular Ecology Resources, 17, 4, 760–769 (2017). doi: 10.1111/1755-0998.12628

Yesterday, BMC Microbiology published a paper which I have co-authored with Joakim Forsell and his colleagues in at Umeå University. The paper (1) investigates the prevalence and subtype composition of Blastocystis – a eukaryotic microbe commonly present in the human intestine – among the 35 Swedish university students that we investigated for antibiotic resistance before and after travel to the Indian peninsula or Central Africa using shotgun metagenomics, and published in 2015 (2). In this paper, we used the same metagenomic data, but to assess the impact of travel on Blastocystis carriage and to understand the associations between Blastocystis and the bacterial gut microbiota. We found that 46% of the students carried Blastocystis before travel and 43% after. The two most commonly identified Blastocystis subtypes were ST3 and ST4, accounting for 20 of the 31 samples positive for Blastocystis. Interestingly, we detected no mixed subtype carriage in any individual, and all the ten individuals with a typable subtype before and after travel maintained their initial subtype.

Furthermore, we found that the composition of the gut bacterial community was not significantly altered between Blastocystis-carriers and non-carriers. Curiously, Blastocystis was accompanied with higher abundances of the bacterial genera Sporolactobacillus and Candidatus Carsonella. As perviously observed (3), Blastocystis carriage was positively associated with higher bacterial genus richness, and negatively correlated to the Bacteroides-driven enterotype. We, however, took this observation further, and could show that these associations were both largely driven by ST4 – a subtype commonly described in Europe – while the globally prevalent ST3 did not show such significant relationships.

The persistence of Blastocystis subtypes before and after travel indicates that long-term carriage of Blastocystis is common. The associations between Blastocystis and the bacterial microbiota found in this study could imply a link between Blastocystis and a healthy microbiota, as well as with diets high in vegetables. However, we cannot answer whether the associations between Blastocystis and the microbiota are resulting from the presence of Blastocystis per se, or are a prerequisite for colonization with Blastocystis, which are interesting opportunities for follow-up studies.

I think this type of data reuse for completely different questions is highly useful, and I am very happy that Joakim Forsell and his colleagues contacted me to hear if it was possible to do a Blastocystis screen of this data. The full paper can be read here.

References

  1. Forsell J, Bengtsson-Palme J, Angelin M, Johansson A, Evengård B, Granlund M: The relation between Blastocystis and the intestinal microbiota in Swedish travellers. BMC Microbiology, 17, 231 (2017). doi: 10.1186/s12866-017-1139-7 [Paper link]
  2. Bengtsson-Palme J, Angelin M, Huss M, Kjellqvist S, Kristiansson E, Palmgren H, Larsson DGJ, Johansson A: The human gut microbiome as a transporter of antibiotic resistance genes between continents. Antimicrobial Agents and Chemotherapy, 59, 10, 6551–6560 (2015). doi: 10.1128/AAC.00933-15 [Paper link]
  3. Andersen LO, Bonde I, Nielsen HB, Stensvold CR: A retrospective metagenomics approach to studying Blastocystis. FEMS Microbiology Ecology, 91, fiv072 (2015). doi: 10.1093/femsec/fiv072 [Paper link]

Last summer, I was approached by Muniyandi Nagarajan to write a book chapter for a book on metagenomics. The book was published earlier this month, and is now available online (1). I have to admit that I have not yet read the entire book, but my own chapter deals with selecting the right tools for metagenomic analysis, and discusses different strategies to perform taxonomic classification, functional analysis, metagenomic assembly, and statistical comparisons between metagenomes (2). The chapter also considers the pros and cons of automated computational “pipelines” for analysis of metagenomic data. While I do not point to a specific set of software that obviously perform better in all situations, I do highlight some analysis strategies that clearly should be avoided. The chapter also suggests a few among the set of robust and well-functioning software tools that, in my opinion, should be used for metagenomic analyses. To some degree, this paper overlaps with the review paper we wrote on using metagenomics to analyze antibiotic resistance genes in various environments, published earlier this year (3), but the discussion in the book chapter is far more general. I imagine that the book chapter could be used, for example, in teaching metagenomics to students in bioinformatics (that’s at least a use I envision myself). Finally, apart from my own chapter, I can also highly recommend the chapter by Boulund et al. on statistical considerations for metagenomic data analysis (4). The book is available to buy from here, and the chapter can be read here.

References

  1. Nagarajan M (Ed.) Metagenomics: Perspectives, Methods, and Applications. ISBN: 9780081022689. Academic Press, Elsevier, USA (2018). doi: 10.1016/B978-0-08-102268-9 [Link]
  2. Bengtsson-Palme J: Strategies for Taxonomic and Functional Annotation of Metagenomes. In: Nagarajan M (Ed.) Metagenomics: Perspectives, Methods, and Applications, 55–79. Academic Press, Elsevier, USA (2018). doi: 10.1016/B978-0-08-102268-9.00003-3 [Link]
  3. Bengtsson-Palme J, Larsson DGJ, Kristiansson E: Using metagenomics to investigate human and environmental resistomes. Journal of Antimicrobial Chemotherapy, 72, 2690–2703 (2017). doi: 10.1093/jac/dkx199 [Paper link]
  4. Boulund F, Pereira MB, Jonsson V, Kristiansson E: Computational and Statistical Considerations in the Analysis of Metagenomic Data. In: Nagarajan M (Ed.) Metagenomics: Perspectives, Methods, and Applications, 81–102. Academic Press,, Elsevier, USA (2018). doi: 10.1016/B978-0-08-102268-9.00004-5 [Link]