Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

Another paper I have made a contribution to have just recently been published in Molecular Ecology Resources. The paper (1), which is lead-authored by Xin-Cun Wang and Chang Liu at the Institute of Medicinal Plant Development in Beijing, investigates the usability of the ITS1 and ITS2 as separate barcodes across the Eukaryotes. The study is a large scale meta-analysis comparing available high-quality sequence data in as many taxonomic groups at possible from three different aspects: PCR amplification, DNA sequencing efficiency and species discrimination ability. Specifically, we have looked for the presence of DNA barcoding gaps, species discrimination efficiency, sequence length distribution, GC content distribution and primer universality, using bioinformatic approaches. We found that the ITS1 had significantly higher efficiencies than the ITS2 in 17 of 47 families and 20 of 49 investigated genera, which was markedly better than the performance of ITS2. We conclude that, in general, ITS1 represents a better DNA barcode than ITS2 for a majority of eukaryotic taxonomic groups. This of course doesn’t mean that using the ITS2 or the ITS region in its entirety should be dismissed, but our results can serve as a ground for making informed decisions about which region to choose for your amplicon sequencing project. The results complement what have previously been observed for e.g. fungi, where the difference between ITS1 and ITS2 were much less pronounced (2).

References:

  1. Wang X-C, Liu C, Huang L, Bengtsson-Palme J, Chen H, Zhang J-H, Cai D, Li J-Q: ITS1: A DNA barcode better than ITS2 in eukaryotes? Molecular Ecology Resources. Early view. doi: 10.1111/1755-0998.12325 [Paper link]
  2. Blaalid R, Kumar S, Nilsson RH, Abarenkov K, Kirk PM, Kauserud H: ITS1 versus ITS2 as DNA metabarcodes for fungi. Molecular Ecology Resources. Volume 13, Issue2, Page 218-224. doi: 10.1111/1755-0998.12065 [Paper link]

I would like to bring your attention to that the abstract deadline for the Swedish Bioinformatics Workshop held in Gothenburg in October has been extended to September 15. So hurry on and contribute with your latest research, we look forward to get to know what you’re doing!

I just got word from BMC Genomics that my most recent paper has just been published (in provisional form; we still have not seen the edited proofs). In this paper (1), which I have co-authored with Anders Blomberg, Magnus Alm Rosenblad and Mikael Molin, we utilize metagenomic data from the GOS-expedition (2) together with fully sequenced bacterial genomes to show that:

  1. Detoxification genes in general are underrepresented in marine planktonic bacteria
  2. Surprisingly, the detoxification that show a differential distribution are more abundant in open ocean water than closer to the coast
  3. Peroxidases and peroxiredoxins seem to be the main line of defense against oxidative stress for bacteria in the marine milieu, rather than e.g. catalases
  4. The abundance of detoxification genes does not seem to increase with estimated pollution.

From this we conclude that other selective pressures than pollution likely play the largest role in shaping marine planktonic bacterial communities, such as for example nutrient limitations. This suggests substantial streamlining of gene copy number and genome sizes, in line with observations made in previous studies (3). Along the same lines, our findings indicate that the majority of marine bacteria would have a low capacity to adapt to increased pollution, which is relevant as large amounts of human pollutants and waste end up in the oceans every year. The study exemplifies the use of metagenomics data in ecotoxicology, and how we can examine anthropogenic consequences on life in the sea using approaches derived from genomics. You can read the paper in its entirety here.

References:

  1. Bengtsson-Palme J, Alm Rosenblad M, Molin M, Blomberg A: Metagenomics reveals that detoxification systems are underrepresented in marine bacterial communities. BMC Genomics. Volume 15, Issue 749 (2014). doi: 10.1186/1471-2164-15-749 [Paper link]

  2. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, Van Belle C, Chandonia J-M, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, et al: The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biology. 5:e16 (2007).
  3. Yooseph S, Nealson KH, Rusch DB, McCrow JP, Dupont CL, Kim M, Johnson J, Montgomery R, Ferriera S, Beeson KY, Williamson SJ, Tovchigrechko A, Allen AE, Zeigler LA, Sutton G, Eisenstadt E, Rogers Y-H, Friedman R, Frazier M, Venter JC: Genomic and functional adaptation in surface ocean planktonic prokaryotes. Nature. 468:60–66 (2010).

Finally, I am at the airport waiting for the plane to Beijing, and tomorrow, the plane to Seoul. I will spend next week at the ISME15 conference, which will be awesome. I hope to meet as many of you there as possible!

(And on a side note, if I don’t answer mails it might be due to that the conference wifi might be overcrowded. They warned us about this potential shortcoming of the conference center…)

I and one of the other developers of ITSx had a discussion a while ago about that using the --anchor option should output the “anchor sequences” around the ITS regions also for the full-length output file (given that the --truncate option is activated). I have today changed ITSx to employ this behaviour, updating it to version 1.0.9. The update also improves sensitivity when using the --anchor HMM option slightly, and can be downloaded here. Happy barcoding!

I am part of the organizing committee for the Swedish Bioinformatics Workshop (#SBW2014) that will be held October 23-24 this year in Gothenburg. I would like to invite you all, especially master/PhD students and PostDocs in Sweden, to come and share the event with us!

SBW is an annual event that has been organized by the different universities in Sweden. This year it will take place at the Wallenberg Conference Centre in Gothenburg and is arranged by both University of Gothenburg and Chalmers University of Technology. SBW2014 will, as the tradition abides, be a meeting point for PhD students and postdocs working with any kind of bioinformatics within Sweden and is therefore free of charge for these groups. We are proud to announce a program including both invited speakers – such as Mick Watson from the Roslin institute, Dawn Field from University of Oxford, and Joakim Lundeberg from KTH – along with participant presentations and poster sessions. This year, the program will also contain a number of workshop sessions where hands-on problems will be used as starting points for discussions on new bioinformatics approaches to these problems. This will provide opportunities for attendees with different methodological backgrounds to interact and work together to find synergies between fields and come up with creative solutions.

More information about the event including registration and abstract submission can be found at www.sbw2014.se.

I, and the rest of the organizers, look forward to meeting you in Gothenburg in October!

Webpage: http://www.sbw2014.se

Facebook: https://www.facebook.com/events/1450513325188910/

Google+: https://plus.google.com/events/cuhlpovcc275stut854dk5ussnk

If you want, you can spread the word, for example using this flyer!

In an interesting development, Nature Publishing Group has launched a new initiative: Scientific Data – a online-only open access journal that publishes data sets without the demand of testing scientific hypotheses in connection to the data. That is, the data itself is seen as the valuable product, not any findings that might result from it. There is an immediate upside of this; large scientific data sets might be accessible to the research community in a way that enables proper credit for the sample collection effort. Since there is no demand for a full analysis of the data, the data itself might quicker be of use to others, without worrying that someone else might steal the bang of the data per se. I also see a possible downside, though. It would be easy to hold on to the data until you have analyzed it yourself, and then release it separately just about when you submit the paper on the analysis, generating extra papers and citation counts. I don’t know if this is necessarily bad, but it seems it could contribute to “publishing unit dilution”. Nevertheless, I believe that this is overall a good initiative, although how well it actually works will be up to us – the scientific community. Some info copied from the journal website:

Scientific Data’s main article-type is the Data Descriptor: peer-reviewed, scientific publications that provide an in-depth look at research datasets. Data Descriptors are a combination of traditional scientific publication content and structured information curated in-house, and are designed to maximize reuse and enable searching, linking and data mining. (…) Scientific Data aims to address the increasing need to make research data more available, citable, discoverable, interpretable, reusable and reproducible. We understand that wider data-sharing requires credit mechanisms that reward scientists for releasing their data, and peer evaluation mechanisms that account for data quality and ensure alignment with community standards.

ITSx has today been updated, bringing it to version 1.0.8. This update adds the “--only_full” option, which restricts output in the ITS1, 5.8S and ITS2 files to only the files that contain the full region, i.e. that both surrounding domains have been detected. The update also fixes a bug with the --anchor option, and can be downloaded here. Happy barcoding!

Another paper I have co-authored related to the UNITE database for fungal rDNA ITS sequences is now published as an Online Early article in Fungal Diversity. The paper describes an effort to improve the annotation of ITS sequences from fungal plant pathogens. Why is this important? Well, apart from fungal plant pathogens being responsible for great economic losses in agriculture, the paper is also conceptually important as it shows that together we can accomplish a substantial improvement to the metadata in sequence databases. In this work we have hunted down high-quality reference sequences for various plant pathogenic fungi, and re-annotated incorrectly or insufficiently annotated ITS sequences from the same fungal lineages. In total, the 59 authors have made 31,954 changes to UNITE database data, on average 540 changes per author. While one, or a few, persons could not feasibly have made this effort alone, this work shows that in larger consortia vast improvements can be made to the quality of databases, by distributing the work among many scientists. In many ways, this relates to proposals to “wikify” GenBank, and after Rfam and Pfam it might now be time to take the user-contribution model to, at least, the RefSeq portion of GenBank, which despite its description as being “comprehensive, integrated, non-redundant, [and] well-annotated” still contains errors and examples of non-usable annotation. More on that at a later point…

Paper reference:

Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity Online early (2014). doi: 10.1007/s13225-014-0291-8 [Paper link]

Because of my previous involvement in a Swedish report on toxicological monitoring using (meta)-genomics tools [1], I also became in a related EU report on effect-based tools for use in toxicology in the aquatic environment. This report has recently been officially published [2], and can be found here, with the annex available on the European Commission document website. My contribution to this report has been in the genomics and metagenomics section (Chapter 7: OMICS techniques), in which I wrote the metagenomics part and contributed to the rest. I personally think this is a quite forward-thinking report, which is nice for a large institution such as the EU.

  1. Länsstyrelsen i Västra Götalands län. (2012). Swedish monitoring of hazardous substances in the aquatic environment (No. 2012:23). (A.-S. Wernersson, Ed.) Current vs required monitoring and potential developments (pp. 1–291). Länsstyrelsen i Västra Götalands län, vattenvårdsenheten.
  2. Wernersson A-S, Carere M, Maggi C, Tusil P, Soldan P, James A, Sanchez W, Broeg K, Kammann U, Reifferscheid G, Buchinger S, Maas H, Van Der Grinten E, Ausili A, Manfra L, Marziali L, Polesello S, Lacchetti I, Mancini L, Lilja K, Linderoth M, Lundeberg T, Fjällborg B, Porsbring T, Larsson DGJ, Bengtsson-Palme J, Förlin L, Kase R, Kienle C, Kunz P, Vermeirssen E, Werner I, Robinson CD, Lyons B, Katsiadaki I, Whalley C, den Haan K, Messiaen M, Clayton H, Lettieri T, Negrão Carvalho R, Gawlik BM, Dulio V, Hollert H, Di Paolo C, Brack W (2014). Technical Report on Aquatic Effect-Based Monitoring Tools. European Commission. Technical Report 2014-077, Office for Official Publications of European Communities, ISBN: 978-92-79-35787-9. doi:10.2779/7260