Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

Browsing Posts in Bioinformatics

I have had the pleasure to be chosen as a speaker for next week’s (ten days from now) Swedish Bioinformatics Workshop. My talk is entitled “Turn up the signal – wipe out the noise: Gaining insights into bacterial community functions using metagenomic data“, and will largely deal with the same questions as my talk on EDAR3 in May this year. As then, the talk will highlight the some particular pitfalls related to interpretation of data, and exemplify how flawed analysis practices can result in misleading conclusions regarding community function, and use examples from our studies of environments subjected to pharmaceutical pollution in India, the effect of travel on the human resistome, and modern municipal wastewater treatment processes.

The talk will take place on Thursday, September 24, 2015 at 16:30. The full program for the conference can be found here. And also, if you want a sneak peak of the talk, you can drop by on Friday 13.00 at Chemistry and Molecular Biology, where I will give a seminar on the same topic in the Monthly Bioinformatic Practical Meetings series.

Earlier today, my most recent paper (1) became available online, describing resistance gene patterns in the gut microbiota of Swedes before and after travel to the Indian peninsula and central Africa. In this work, we have used metagenomic sequencing of the intestinal microbiome of Swedish students returning from exchange programs to show that the abundance of antibiotic resistance genes in several classes are increased after travel. This work reiterates the findings of several papers describing uptake of resistant bacteria (2-8) or resistance genes (9-11) after travel to destinations with worse resistance situation.

Our paper is important because it:

  1. Addresses the abundance of a vast range of resistance genes (more than 300).
  2. Finds evidence for that the overall relative abundance of antibiotic resistance genes increased after travel, without any intake of antibiotics.
  3. Shows that the sensitivity of metagenomics was, despite very deep sequencing efforts, not sufficient to detect acquisition of the low-abundant (CTX-M) resistance genes responsible for observed ESBL phenotypes.
  4. Reveals a “core resistome” of resistance genes that are more or less omnipresent, and remain relatively stable regardless of travel, while changes seem to occur in the more variable part of the resistome.
  5. Hints at increased abundance of Proteobacteria after travel, although this increase could not specifically be linked to resistance gene increases.
  6. Uses de novo metagenomic assembly to physically link resistance genes in the same sample, giving hints of co-resistance patterns in the gut microbiome.

The paper was a collaboration with Martin Angelin, Helena Palmgren and Anders Johansson at Umeå University, and was made possible by bioinformatics support from SciLifeLab in Stockholm. I highly recommend reading it as a complement to e.g. the Forslund et al. paper (12) describing country-specific antibiotic resistance patterns in the gut microbiota.

Taken together, this study offers a broadened perspective on how the antibiotic resistance potential of the human gut microbiome changes after travel, providing an independent complement to previous studies targeting a limited number of bacterial species or antibiotic resistance genes. Understanding how resistance genes travels the globe is hugely important, since resistance in principle only need to appear in a pathogen once; improper hygiene and travel may then spread novel resistance genes across continents rapidly (13,14).

References

  1. Bengtsson-Palme J, Angelin M, Huss M, Kjellqvist S, Kristiansson E, Palmgren H, Larsson DGJ, Johansson A: The human gut microbiome as a transporter of antibiotic resistance genes between continents. Antimicrob Agents Chemother Accepted manuscript posted online (2015). doi: 10.1128/AAC.00933-15 [Paper link]
  2. Gaarslev K, Stenderup J: Changes during travel in the composition and antibiotic resistance pattern of the intestinal Enterobacteriaceae flora: results from a study of mecillinam prophylaxis against travellers’ diarrhoea. Curr Med Res Opin 9:384–387 (1985).
  3. Paltansing S, Vlot JA, Kraakman MEM, Mesman R, Bruijning ML, Bernards AT, Visser LG, Veldkamp KE: Extended-spectrum β-lactamase-producing enterobacteriaceae among travelers from the Netherlands. Emerging Infect. Dis. 19:1206–1213 (2013).
  4. Ruppé E, Armand-Lefèvre L, Estellat C, El-Mniai A, Boussadia Y, Consigny PH, Girard PM, Vittecoq D, Bouchaud O, Pialoux G, Esposito-Farèse M, Coignard B, Lucet JC, Andremont A, Matheron S: Acquisition of carbapenemase-producing Enterobacteriaceae by healthy travellers to India, France, February 2012 to March 2013. Euro Surveill. 19 (2014).
  5. Kennedy K, Collignon P: Colonisation with Escherichia coli resistant to “critically important” antibiotics: a high risk for international travellers. Eur J Clin Microbiol Infect Dis 29:1501–1506 (2010).
  6. Tham J, Odenholt I, Walder M, Brolund A, Ahl J, Melander E: Extended-spectrum beta-lactamase-producing Escherichia coli in patients with travellers’ diarrhoea. Scand. J. Infect. Dis. 42:275–280 (2010).
  7. Östholm-Balkhed Å, Tärnberg M, Nilsson M, Nilsson LE, Hanberger H, Hällgren A, Travel Study Group of Southeast Sweden: Travel-associated faecal colonization with ESBL-producing Enterobacteriaceae: incidence and risk factors. J Antimicrob Chemother 68:2144–2153 (2013).
  8. Kantele A, Lääveri T, Mero S, Vilkman K, Pakkanen SH, Ollgren J, Antikainen J, Kirveskari J: Antimicrobials increase travelers’ risk of colonization by extended-spectrum betalactamase-producing enterobacteriaceae. Clin Infect Dis 60:837–846 (2015).
  9. von Wintersdorff CJH, Penders J, Stobberingh EE, Oude Lashof AML, Hoebe CJPA, Savelkoul PHM, Wolffs PFG: High rates of antimicrobial drug resistance gene acquisition after international travel, The Netherlands. Emerging Infect. Dis. 20:649–657 (2014).
  10. Tängdén T, Cars O, Melhus A, Löwdin E: Foreign travel is a major risk factor for colonization with Escherichia coli producing CTX-M-type extended-spectrum beta-lactamases: a prospective study with Swedish volunteers. Antimicrob Agents Chemother 54:3564–3568 (2010).
  11. Dhanji H, Patel R, Wall R, Doumith M, Patel B, Hope R, Livermore DM, Woodford N: Variation in the genetic environments of bla(CTX-M-15) in Escherichia coli from the faeces of travellers returning to the United Kingdom. J Antimicrob Chemother 66:1005–1012 (2011).
  12. Forslund K, Sunagawa S, Kultima JR, Mende DR, Arumugam M, Typas A, Bork P: Country-specific antibiotic use practices impact the human gut resistome. Genome Res 23:1163–1169 (2013).
  13. Bengtsson-Palme J, Larsson DGJ: Antibiotic resistance genes in the environment: prioritizing risks. Nat Rev Microbiol 13:396 (2015).
  14. Larsson DGJ: Antibiotics in the environment. Ups J Med Sci 119:108–112 (2014).

TriMetAss has been updated to version 1.1. The new version addresses a number of minor issues and brings two new handy features. The update can be found here.

New features:

  • Multiple input files can now be specified by adding several -1 and -2 options.
  • TriMetAss now automatically stops if the candidate reads are the same for two iterations in a row.

Fixed issues:

  • Support for recent versions of Trinity that no longer contain the Trinity.pl script.
  • A minor bug causing TriMetAss to use more memory than necessary has been fixed.
  • Fixed the --stop_total option so that TriMetAss actually uses this option (rather than --stop_length)
  • Allowed complicated paths to be supplied for the output directory.

I would like to thank users Rickard Hammarén, Dr. Tatsuya Unno, Dr. Gisle Vestergaard and Dr. Joseph Nesme for providing me with the underlying information to provide these fixes. Thanks a lot!

I will be giving a talk at the Third International symposium on the environmental dimension of antibiotic resistance (EDAR2015) next month (five weeks from now. The talk is entitled “Turn up the signal – wipe out the noise: Gaining insights into antibiotic resistance of bacterial communities using metagenomic data“, and will deal with handling of metagenomic data in antibiotic resistance gene research. The talk will highlight the some particular pitfalls related to interpretation of data, and exemplify how flawed analysis practices can result in misleading conclusions regarding antibiotic resistance risks. I will particularly address how taxonomic composition influences the frequencies of resistance genes, the importance of knowledge of the functions of the genes in the databases used, and how normalization strategies influence the results. Furthermore, we will show how the context of resistance genes can allow inference of their potential to spread to human pathogens from environmental or commensal bacteria. All these aspects will be exemplified by data from our studies of environments subjected to pharmaceutical pollution in India, the effect of travel on the human resistome, and modern municipal wastewater treatment processes.

The talk will take place on Monday, May 18, 2015 at 13:20. The full scientific program for the conference can be found here. Registration for the conference is still possible, although not for the early-bird price. I look forward to see a lot of the people who will attend the conference, and hopefully also you!

Metaxa2 has been updated to version 2.0.2 and can be downloaded from the Metaxa2 web site. The 2.0.2 update fixes two minor bugs; one causing the “.graph” file to display incorrect or no names for the regions of the LSU regions, and one causing misreporting of the number of sequences in single-end FASTQ files (paired-end files were reported correctly). The update also brings a slightly improved classifier. Thanks to Marco Severgnini for reporting the FASTQ file issue! The update is available here.

Some of you who think ITSx is running slowly despite being assigned multiple CPUs, particularly on datasets with only one kind of sequences (e.g. fungal) using the -t F option might be interested in trying out Andrew Krohn’s parallel ITSx implementation. The solution essentially employs a bash script spawning multiple ITSx instances running on different portions of the input file. Although there are some limitations to the script (e.g. you cannot select a custom name for the output and you will only get the ITS1 and ITS2 + full sequences FASTA files, as far as I understand the script), it may prove useful for many of you until we write up a proper solution to the poor multi-thread performance of ITSx (planned for version 1.1). In the coming months, I recommend that you check this solution out! See also the wiki documentation.

My speed tests shows the following (on a quite small test set of fungal ITS sequences):
ITSx parallel on 16 CPUs, all ITS types (option “-t all“):
3 min, 16 sec
ITSx parallel on 16 CPUs, only fungal ITS types (option “-t f“):
54 sec
ITSx native on 16 CPUs, all ITS types (options “-t all --cpu 16“):
4 min, 59 sec
ITSx native on 16 CPUs, only fungal types (options “-t f --cpu 16“):
5 min, 50 sec

Why fungal only took longer time in the native implementation is a mystery to me, but probably shows why there is a need to rewrite the multithreading code, as we did with Metaxa a couple of years ago. Stay tuned for ITSx updates!

A couple of days ago, a paper I have co-authored describing an ITS sequence dataset for chimera control in fungi went online as an advance online publication in Microbes and Environments. There are several software tools available for chimera detection (e.g. Henrik Nilsson’s fungal chimera checker (1) and UCHIME (2)), but these generally rely on the presence of a chimera-free reference dataset. Until now, there was no such dataset is for the fungal ITS region, and we in this paper (3) introduce a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database (4). This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. We estimated the dataset performance on a large set of artificial chimeras to be above 99.5%, and also used the dataset to remove nearly 1,000 chimeric fungal ITS sequences from the UNITE database. The dataset can be downloaded from the UNITE repository. Thereby, it is also possible for users to curate the dataset in the future through the UNITE interactive editing tools.

References:

  1. Nilsson RH, Abarenkov K, Veldre V, Nylinder S, Wit P de, Brosché S, Alfredsson JF, Ryberg M, Kristiansson E: An open source chimera checker for the fungal ITS region. Molecular Ecology Resources, 10, 1076–1081 (2010).
  2. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, 27, 16, 2194-2200 (2011). doi:10.1093/bioinformatics/btr381
  3. Nilsson RH, Tedersoo L, Ryberg M, Kristiansson E, Hartmann M, Unterseher M, Porter TM, Bengtsson-Palme J, Walker D, de Sousa F, Gamper HA, Larsson E, Larsson K-H, Kõljalg U, Edgar R, Abarenkov K: A comprehensive, automatically updated fungal ITS sequence dataset for reference-based chimera control in environmental sequencing efforts. Microbes and Environments, Advance Online Publication (2015). doi: 10.1264/jsme2.ME14121
  4. Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Molecular Ecology, 22, 21, 5271–5277 (2013). doi: 10.1111/mec.12481

After almost a year in different stages of review and revision, in which the paper (but not the software) saw a total transformation, I am happy to announce that the paper describing Metaxa2 has been accepted in Molecular Ecology Resources and is available in a rudimentary online early form. The figures in this version are not that pretty, but those who wants to read the paper asap, you have the possibility to do so.

This means that if you have been using Metaxa2 for a publication, there is now a new preferred way of citing this, namely:

Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399

The paper (1), apart from describing the new Metaxa version, also brings a very thorough evaluation of the software, compared to other tools for taxonomic classification implemented in QIIME (2). In short, we show that:

  • Metaxa2 can make trustworthy taxonomic classifications even with reads as short as 100 bp
  • Generally, the performance is reliable across the entire SSU rRNA gene, regardless of which V-region a read is derived from
  • Metaxa2 can reliably recapture species composition from short-read metagenomic data, comparable with results of amplicon sequencing
  • Metaxa2 outperforms other popular tools such as Mothur (3), the RDP Classifier (4), Rtax (5) and the QIIME implementation of Uclust (6) in terms of proportion of correctly classified reads from metagenomic data
  • The false positive rate of Metaxa2 is very close to zero; far superior to many of the above mentioned tools, many of which assume that reads must derive from the rRNA gene

Metaxa2 can be downloaded here. We have already used it for around two years internally, and it forms the base of the taxonomic classifications in e.g. our recently published paper on antibiotic resistance in a polluted Indian lake (7).

References

  1. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  2. Caporaso JG, Kuczynski J, Stombaugh J et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336 (2010).
  3. Schloss PD, Westcott SL, Ryabin T et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541 (2009).
  4. Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73, 5261–5267 (2007).
  5. Soergel DAW, Dey N, Knight R, Brenner SE: Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. The ISME Journal, 6, 1440–1444 (2012).
  6. Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461 (2010).
  7. Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014).

My colleague Henrik Nilsson has been interviewed by the ResearchGate news team about the recent effort to better annotate ITS data for plant pathogenic fungi. It’s an interesting read, and I think Henrik nicely underscores why large-scale efforts for improving and correcting sequence annotations are important. You can read the interview here, and the paper they talk about is referenced below.

Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity, Volume 67, Issue 1 (2014), 11–19. doi: 10.1007/s13225-014-0291-8 [Paper link]

A minor bug in the “its1.full_and_partial.fasta” file has been fixed in a minor update to ITSx (1.0.11) released to day. The bug occasionally caused newline characters at the end of a sequence to be skipped and the next entry to begin at the same row. The bug only manifested itself when ITSx was used with the --partial option and only in the above mentioned FASTA file. If you have been affected by the bug, you should have noticed as the resulting FASTA file would be considered corrupted by most bioinformatics software. The updated version of ITSx can be downloaded here.