Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

Browsing Posts tagged Fungi

MycoKeys today put a paper online which I was involved in. The paper describes the results of a workshop in May, when we added and refined annotations for fungal ITS sequences according to the MIxS-Built Environment annotation standard (1). Fungi have been associated with a range of unwanted effects in the built environment, including asthma, decay of building materials, and food spoilage. However, the state of the metadata annotation of fungal DNA sequences from the built environment is very much incomplete in public databases. The workshop aimed to ease a little part of this problem, by distributing the re-annotation of public fungal ITS sequences across 36 persons. In total, we added or changed of 45,488 data points drawing from published literature, including addition of 8,430 instances of countries of collection, 5,801 instances of building types, and 3,876 instances of surface-air contaminants. The results have been implemented in the UNITE database and shared with other online resources. I believe, that distributed initiatives like this (and the ones I have been involved in in the past (2,3)) serve a very important purpose for establishing better annotation of sequence data, an issue I have brought up also for sequences outside of barcoding genes (4). The full paper can be found here.

References

  1. Abarenkov K, Adams RI, Laszlo I, Agan A, Ambrioso E, Antonelli A, Bahram M, Bengtsson-Palme J, Bok G, Cangren P, Coimbra V, Coleine C, Gustafsson C, He J, Hofmann T, Kristiansson E, Larsson E, Larsson T, Liu Y, Martinsson S, Meyer W, Panova M, Pombubpa N, Ritter C, Ryberg M, Svantesson S, Scharn R, Svensson O, Töpel M, Untersehrer M, Visagie C, Wurzbacher C, Taylor AFS, Kõljalg U, Schriml L, Nilsson RH: Annotating public fungal ITS sequences from the built environment according to the MIxS-Built Environment standard – a report from a May 23-24, 2016 workshop (Gothenburg, Sweden). MycoKeys, 16, 1–15 (2016). doi: 10.3897/mycokeys.16.10000
  2. Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Molecular Ecology, 22, 21, 5271–5277 (2013). doi: 10.1111/mec.12481
  3. Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity, 67, 1, 11–19 (2014). doi: 10.1007/s13225-014-0291-8
  4. Bengtsson-Palme J, Boulund F, Edström R, Feizi A, Johnning A, Jonsson VA, Karlsson FH, Pal C, Pereira MB, Rehammar A, Sánchez J, Sanli K, Thorell K: Strategies to improve usability and preserve accuracy in biological sequence databases. Proteomics, Early view (2016). doi: 10.1002/pmic.201600034

My colleague Henrik Nilsson has been interviewed by the ResearchGate news team about the recent effort to better annotate ITS data for plant pathogenic fungi. It’s an interesting read, and I think Henrik nicely underscores why large-scale efforts for improving and correcting sequence annotations are important. You can read the interview here, and the paper they talk about is referenced below.

Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity, Volume 67, Issue 1 (2014), 11–19. doi: 10.1007/s13225-014-0291-8 [Paper link]

Another paper I have co-authored related to the UNITE database for fungal rDNA ITS sequences is now published as an Online Early article in Fungal Diversity. The paper describes an effort to improve the annotation of ITS sequences from fungal plant pathogens. Why is this important? Well, apart from fungal plant pathogens being responsible for great economic losses in agriculture, the paper is also conceptually important as it shows that together we can accomplish a substantial improvement to the metadata in sequence databases. In this work we have hunted down high-quality reference sequences for various plant pathogenic fungi, and re-annotated incorrectly or insufficiently annotated ITS sequences from the same fungal lineages. In total, the 59 authors have made 31,954 changes to UNITE database data, on average 540 changes per author. While one, or a few, persons could not feasibly have made this effort alone, this work shows that in larger consortia vast improvements can be made to the quality of databases, by distributing the work among many scientists. In many ways, this relates to proposals to “wikify” GenBank, and after Rfam and Pfam it might now be time to take the user-contribution model to, at least, the RefSeq portion of GenBank, which despite its description as being “comprehensive, integrated, non-redundant, [and] well-annotated” still contains errors and examples of non-usable annotation. More on that at a later point…

Paper reference:

Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity Online early (2014). doi: 10.1007/s13225-014-0291-8 [Paper link]

Our paper on the most recent developments of the UNITE database for fungal rDNA ITS sequences has just been published as an Early view article in Molecular Ecology. In this paper, we aim to ease two of the major problems facing the identification of newly generated fungal ITS sequences: the lack of a sufficiently goof reference dataset, and the lack of a way to refer to fungal species without a Latin name. As part of a solution, we have introduced the term species hypothesis for all fungal species represented by at least two ITS sequences. The UNITE database has an easy-to-use web-based sequence management system, and we encourage everybody that can improve on the annotations or metadata of a fungal lineage to do so.

My main contribution on this paper has been to tailor ITSx functionality for the UNITE database, so that ITS data could be more easily processed for the Species Hypotheses.

Paper reference:
Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Accepted in Molecular Ecology. doi: 10.1111/mec.12481 [Paper link]

For a couple of years, I have been working with microbial ecology and diversity, and how such features can be assessed using molecular barcodes, such as the SSU (16S/18S) rRNA sequence (the Metaxa and Megraft packages). However, I have also been aiming at the ITS region, and how that can be used in barcoding (see e.g. the guidelines we published last year). It is therefore a great pleasure to introduce my next gem for community analysis; a software tool for detection and extraction of the ITS1 and ITS2 regions of ITS sequences from environmental communities. The tool is dubbed ITSx, and supersedes the more specific fungal ITS extractor written by Henrik Nilsson and colleagues. Henrik is once more the mastermind behind this completely rewritten version, in which I have done the lion’s share of the programming. Among the new features in ITSx are:

  • Robust support for the Cantharellus, Craterellus, and Tulasnella genera of fungi
  • Support for nineteen additional eukaryotic groups on top of the already present support for fungi (specifically these groups: Tracheophyta (vascular plants), Bryophyta (bryophytes), Marchantiophyta (liverworts), Chlorophyta (green algae), Rhodophyta (red algae), Phaeophyceae (brown algae), Metazoa (metazoans), Oomycota (oomycetes), Alveolata (alveolates), Amoebozoa (amoebozoans), Euglenozoa, Rhizaria, Bacillariophyta (diatoms), Eustigmatophyceae (eustigmatophytes), Raphidophyceae (raphidophytes), Synurophyceae (synurids), Haptophyceae (haptophytes) , Apusozoa, and Parabasalia (parabasalids))
  • Multi-processor support
  • Extensive output options
  • Virtually zero false-positive extractions

ITSx is today moved from a private pre-release state to a public beta state. No code changes has been made since February, indicative of that the last pre-release candidate is now ready to fly on its own. As far as our testing has revealed, this version seems to be bug free. In reality though, researchers tend to find the most unexpected usage scenarios. So please, if you find any unexpected behavior in this version of ITSx, send me an e-mail and make us aware of the potential shortcomings of our software.

We expect this open-source software to boost research in microbial ecology based on barcoding of the ITS region, and hope that the research community will evaluate its performance also among the eukaryote groups that we have less experience with.