Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

One of the highlights of the Swedish Bioinformatics Workshop 2014 was of course the dinner entertainment, a song specially crafted for the event. It has now, fortunately, been put online. For anyone who might not catch all the words, here’s the complete lyrics for the song (which is based on the song “Java Jive” in the Manhattan Transfer arrangement):

The Bioinformatics ABC

Grab your coffee
Grab your tea
Put down your spoon now and listen to me
For the bioinformatics ABC
Wake up, wake up, wake up, wake up, wake up
(Boy)

A for ABYSS
B for BLAST
And C for Clustal, though it’s not that fast
Alternatives are Muscle and MAFFT
ABYSS and BLAST and Clustal, Muscle, MAFFT
(Yeah)

D count reads with DESeq or E for EdgeR
And F for FastQC and G for GLIMMER
H for HMMER using Markov Chains
(Explain)
Hidden hidden Markov model

I for Inchworm;
Jellyfish
Add Chrysalis and Butterfly and wish
Assemble fast with a sound that goes swish
Contig, contig, contig, contig, transcript
(Girl)

KBASE
Lasergene
And the ton of tools for metagenomics
MEGAN, Megraft,
MetaPhlan, MG-RAST, Meta-GeneMark
And that’s just mentioning a few of them
(Talk it boy)

N for Newbler, old-school it is
If you’re still using 454 it’s a bliss
O for Oases, P for PyroNoise
45-45-45-454-454

Q is for Quake for that great quality
And R is for all those neat statistics
S for the Spades assembler, oh yeah

T for TopHat
U for Uclust
V for Velvet

There is Wham to align
XMatchView to review
And YASS to pursue
(But do you)
Know any tools beginning with Z?
What?
Yeah, Zorro, Zorro, Zorro
Oh yeah

With the publication of my latest paper last week (1), I also would like to highlight some of the software underpinning the findings a bit. To get around the problem that extremely common resistance genes could be present in multiple contexts and variants, causing assembler such as Velvet (2) to perform sub-optimally, we have written a software tool that utilizes Vmatch (3) and Trinity (4) to iteratively construct contigs from reads associated with resistance genes. This could of course be used in many other situations as well, when you want to specifically assemble a certain portion of a metagenome, but suspect that that portion might be found in multiple contexts.

TriMetAss is a Perl program, employing Vmatch and Trinity to construct multi-context contigs. TriMetAss uses extracted reads associated with, e.g., resistance genes as seeds for a Vmatch search against the complete set of read pairs, extracting reads matching with at least 49 bp (by default) to any of the seed reads. These reads are then assembled using Trinity. The resulting contigs are then used as seeds for another search using Vmatch to the complete set of reads, as above. All matches (including the previously matching read pairs) are again then used for a Trinity assembly. This iterative process is repeated until a stop criteria is met, e.g. when the total number of assembled nucleotides starts to drop rather than increase. The software can be downloaded here.

References:

  1. Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014). doi: 10.3389/fmicb.2014.00648
  2. Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18, 821–829 (2008). doi:10.1101/gr.074492.107
  3. Kurtz S: The Vmatch large scale sequence analysis software (2010). http://vmatch.de/
  4. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644–652 (2011). doi:10.1038/nbt.1883

The first work in which I have employed metagenomics to investigate antibiotic resistance has been accepted in Frontiers in Microbiology, and is (at the time of writing) available as a provisional PDF. In the paper (1), which is co-authored by Fredrik Boulund, Jerker Fick, Erik Kristiansson and Joakim Larsson, we have used shotgun metagenomic sequencing of an Indian lake polluted by dumping of waste from pharmaceutical production. We used this data to describe the diversity of antibiotic resistance genes and the genetic context of those, to try to predict their genetic transferability. We found resistance genes against essentially every major class of antibiotics, as well as large abundances of genes responsible for mobilization of genetic material. Resistance genes were estimated to be 7000 times more abundant in the polluted lake than in a Swedish lake included for comparison, where only eight resistance genes were found. The abundances of resistance genes have previously only been matched by river sediment subject to pollution from pharmaceutical production (2). In addition, we describe twenty-six known and twenty-one putative novel plasmids from the Indian lake metagenome, indicating that there is a large potential for horizontal gene transfer through conjugation. Based on the wide range and high abundance of known resistance factors detected, we believe that it is plausible that novel resistance genes are also present in the lake. We conclude that environments polluted with waste from antibiotic manufacturing could be important reservoirs for mobile antibiotic resistance genes. This work further highlights previous findings that pharmaceutical production settings could provide sufficient selection pressure from antibiotics (3) to drive the development of multi-resistant bacteria (4,5), resistance which may ultimately end up in pathogenic species (6,7). The paper can be read in its entirety here.

References:

  1. Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, Volume 5, Issue 648 (2014). doi: 10.3389/fmicb.2014.00648
  2. Kristiansson E, Fick J, Janzon A, Grabic R, Rutgersson C, Weijdegård B, Söderström H, Larsson DGJ: Pyrosequencing of antibiotic-contaminated river sediments reveals high levels of resistance and gene transfer elements. PLoS ONE, Volume 6, e17038 (2011). doi:10.1371/journal.pone.0017038.
  3. Larsson DGJ, de Pedro C, Paxeus N: Effluent from drug manufactures contains extremely high levels of pharmaceuticals. J Hazard Mater, Volume 148, 751–755 (2007). doi:10.1016/j.jhazmat.2007.07.008
  4. Marathe NP, Regina VR, Walujkar SA, Charan SS, Moore ERB, Larsson DGJ, Shouche YS: A Treatment Plant Receiving Waste Water from Multiple Bulk Drug Manufacturers Is a Reservoir for Highly Multi-Drug Resistant Integron-Bearing Bacteria. PLoS ONE, Volume 8, e77310 (2013). doi:10.1371/journal.pone.0077310
  5. Johnning A, Moore ERB, Svensson-Stadler L, Shouche YS, Larsson DGJ, Kristiansson E: Acquired genetic mechanisms of a multiresistant bacterium isolated from a treatment plant receiving wastewater from antibiotic production. Appl Environ Microbiol, Volume 79, 7256–7263 (2013). doi:10.1128/AEM.02141-13
  6. Pruden A, Larsson DGJ, Amézquita A, Collignon P, Brandt KK, Graham DW, Lazorchak JM, Suzuki S, Silley P, Snape JR., et al.: Management options for reducing the release of antibiotics and antibiotic resistance genes to the environment. Environ Health Perspect, Volume 121, 878–885 (2013). doi:10.1289/ehp.1206446
  7. Finley RL, Collignon P, Larsson DGJ, McEwen SA, Li X-Z, Gaze WH, Reid-Smith R, Timinouni M, Graham DW, Topp E: The scourge of antibiotic resistance: the important role of the environment. Clin Infect Dis, Volume 57, 704–710 (2013). doi:10.1093/cid/cit355

Metaxa2 update

Comments off

An update to Metaxa2 that has long remained in internal testing has been deemed bug-free (as far as we can tell) and has been uploaded to the Metaxa2 web site. The update brings a slightly improved classifier, and is the first release that we declare full stable, although we have found no problems with the previously available version (release candidate 3). This also means that we take a jump directly from version 2.0, release candidate 3 to version 2.0.1 without passing a final 2.0 release. The update is available here.

I don’t have much time to attend to the web site these days, and there are probably other things I should/could do right now, but it’s Saturday night and my baby is sleeping so… I found this nice little story covering our little family in the latest Mistra newsletter (out a couple of weeks ago). It is a kind of cute take on the “synthesis” of two Mistra-funded programs. I guess our daughter will grow up with the pains of having two research parents in different fields…

Story in English
Story in Swedish

After a long delay-time in testing ITSx version 1.0.10 has been made public. The new version patches a bug causing the 3′ anchor not being properly written to file when using the “--anchor hmm” option. If a number was used for the “--anchor” option, this bug did not apply. Thus, if you have not been using the “--anchor” option together with “hmm”, you have not been affected in any way by this bug. Nevertheless, I encourage updating in case you would use the “--anchor hmm” option in the future. The update can be downloaded here. Happy barcoding!

I would like to sincerely apologize for that I have been terrible at responding to support issues pertaining to ITSx, Metaxa, Atosh etc. lately. I am currently on 50% parental leave and at the same time I am wrapping up three first-author papers, organizing a workshop and preparing a talk. Thus, support issues has been lagging a bit behind the last weeks to be able to cope with everything else. I have been ticking off most (all?) of my support questions the last couple of days, but if I have any remaining issues that I have missed to reply to, please re-send them to me!

I will try to improve response times, but it is hard when I am working less than usual (also, note that I (strangely) don’t get paid for supporting software, so I have to do this on my “sparetime”). My aim is to respond within a few days, so if I have not done so, please resend your e-mail with a friendly reminder that you are waiting for my response. Reminding me will very likely put your question up the priority pile.

So, my advice to becoming dads is: Do take paternal leave. Do take a lot of it. Share responsibilities with your partner. Because what you get back is awesome. (And also you get a good reason not to answer support questions in time.) But finally, don’t plan to wrap up the last couple of year’s worth of work and arrange a conference at the same time as you take out paternal leave. That will only make you feel insufficient at all fronts.

Keep the spirit high!

Another paper I have made a contribution to have just recently been published in Molecular Ecology Resources. The paper (1), which is lead-authored by Xin-Cun Wang and Chang Liu at the Institute of Medicinal Plant Development in Beijing, investigates the usability of the ITS1 and ITS2 as separate barcodes across the Eukaryotes. The study is a large scale meta-analysis comparing available high-quality sequence data in as many taxonomic groups at possible from three different aspects: PCR amplification, DNA sequencing efficiency and species discrimination ability. Specifically, we have looked for the presence of DNA barcoding gaps, species discrimination efficiency, sequence length distribution, GC content distribution and primer universality, using bioinformatic approaches. We found that the ITS1 had significantly higher efficiencies than the ITS2 in 17 of 47 families and 20 of 49 investigated genera, which was markedly better than the performance of ITS2. We conclude that, in general, ITS1 represents a better DNA barcode than ITS2 for a majority of eukaryotic taxonomic groups. This of course doesn’t mean that using the ITS2 or the ITS region in its entirety should be dismissed, but our results can serve as a ground for making informed decisions about which region to choose for your amplicon sequencing project. The results complement what have previously been observed for e.g. fungi, where the difference between ITS1 and ITS2 were much less pronounced (2).

References:

  1. Wang X-C, Liu C, Huang L, Bengtsson-Palme J, Chen H, Zhang J-H, Cai D, Li J-Q: ITS1: A DNA barcode better than ITS2 in eukaryotes? Molecular Ecology Resources. Early view. doi: 10.1111/1755-0998.12325 [Paper link]
  2. Blaalid R, Kumar S, Nilsson RH, Abarenkov K, Kirk PM, Kauserud H: ITS1 versus ITS2 as DNA metabarcodes for fungi. Molecular Ecology Resources. Volume 13, Issue2, Page 218-224. doi: 10.1111/1755-0998.12065 [Paper link]

I would like to bring your attention to that the abstract deadline for the Swedish Bioinformatics Workshop held in Gothenburg in October has been extended to September 15. So hurry on and contribute with your latest research, we look forward to get to know what you’re doing!

I just got word from BMC Genomics that my most recent paper has just been published (in provisional form; we still have not seen the edited proofs). In this paper (1), which I have co-authored with Anders Blomberg, Magnus Alm Rosenblad and Mikael Molin, we utilize metagenomic data from the GOS-expedition (2) together with fully sequenced bacterial genomes to show that:

  1. Detoxification genes in general are underrepresented in marine planktonic bacteria
  2. Surprisingly, the detoxification that show a differential distribution are more abundant in open ocean water than closer to the coast
  3. Peroxidases and peroxiredoxins seem to be the main line of defense against oxidative stress for bacteria in the marine milieu, rather than e.g. catalases
  4. The abundance of detoxification genes does not seem to increase with estimated pollution.

From this we conclude that other selective pressures than pollution likely play the largest role in shaping marine planktonic bacterial communities, such as for example nutrient limitations. This suggests substantial streamlining of gene copy number and genome sizes, in line with observations made in previous studies (3). Along the same lines, our findings indicate that the majority of marine bacteria would have a low capacity to adapt to increased pollution, which is relevant as large amounts of human pollutants and waste end up in the oceans every year. The study exemplifies the use of metagenomics data in ecotoxicology, and how we can examine anthropogenic consequences on life in the sea using approaches derived from genomics. You can read the paper in its entirety here.

References:

  1. Bengtsson-Palme J, Alm Rosenblad M, Molin M, Blomberg A: Metagenomics reveals that detoxification systems are underrepresented in marine bacterial communities. BMC Genomics. Volume 15, Issue 749 (2014). doi: 10.1186/1471-2164-15-749 [Paper link]

  2. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, Van Belle C, Chandonia J-M, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, et al: The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biology. 5:e16 (2007).
  3. Yooseph S, Nealson KH, Rusch DB, McCrow JP, Dupont CL, Kim M, Johnson J, Montgomery R, Ferriera S, Beeson KY, Williamson SJ, Tovchigrechko A, Allen AE, Zeigler LA, Sutton G, Eisenstadt E, Rogers Y-H, Friedman R, Frazier M, Venter JC: Genomic and functional adaptation in surface ocean planktonic prokaryotes. Nature. 468:60–66 (2010).