Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

Browsing Posts in Bioinformatics

I am part of the organizing committee for the Swedish Bioinformatics Workshop (#SBW2014) that will be held October 23-24 this year in Gothenburg. I would like to invite you all, especially master/PhD students and PostDocs in Sweden, to come and share the event with us!

SBW is an annual event that has been organized by the different universities in Sweden. This year it will take place at the Wallenberg Conference Centre in Gothenburg and is arranged by both University of Gothenburg and Chalmers University of Technology. SBW2014 will, as the tradition abides, be a meeting point for PhD students and postdocs working with any kind of bioinformatics within Sweden and is therefore free of charge for these groups. We are proud to announce a program including both invited speakers – such as Mick Watson from the Roslin institute, Dawn Field from University of Oxford, and Joakim Lundeberg from KTH – along with participant presentations and poster sessions. This year, the program will also contain a number of workshop sessions where hands-on problems will be used as starting points for discussions on new bioinformatics approaches to these problems. This will provide opportunities for attendees with different methodological backgrounds to interact and work together to find synergies between fields and come up with creative solutions.

More information about the event including registration and abstract submission can be found at www.sbw2014.se.

I, and the rest of the organizers, look forward to meeting you in Gothenburg in October!

Webpage: http://www.sbw2014.se

Facebook: https://www.facebook.com/events/1450513325188910/

Google+: https://plus.google.com/events/cuhlpovcc275stut854dk5ussnk

If you want, you can spread the word, for example using this flyer!

ITSx has today been updated, bringing it to version 1.0.8. This update adds the “--only_full” option, which restricts output in the ITS1, 5.8S and ITS2 files to only the files that contain the full region, i.e. that both surrounding domains have been detected. The update also fixes a bug with the --anchor option, and can be downloaded here. Happy barcoding!

Another paper I have co-authored related to the UNITE database for fungal rDNA ITS sequences is now published as an Online Early article in Fungal Diversity. The paper describes an effort to improve the annotation of ITS sequences from fungal plant pathogens. Why is this important? Well, apart from fungal plant pathogens being responsible for great economic losses in agriculture, the paper is also conceptually important as it shows that together we can accomplish a substantial improvement to the metadata in sequence databases. In this work we have hunted down high-quality reference sequences for various plant pathogenic fungi, and re-annotated incorrectly or insufficiently annotated ITS sequences from the same fungal lineages. In total, the 59 authors have made 31,954 changes to UNITE database data, on average 540 changes per author. While one, or a few, persons could not feasibly have made this effort alone, this work shows that in larger consortia vast improvements can be made to the quality of databases, by distributing the work among many scientists. In many ways, this relates to proposals to “wikify” GenBank, and after Rfam and Pfam it might now be time to take the user-contribution model to, at least, the RefSeq portion of GenBank, which despite its description as being “comprehensive, integrated, non-redundant, [and] well-annotated” still contains errors and examples of non-usable annotation. More on that at a later point…

Paper reference:

Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity Online early (2014). doi: 10.1007/s13225-014-0291-8 [Paper link]

Last week, I was informed by an ITSx user that the software behaved strangely when input files containing extremely long sequence identifiers were used. The bug is not likely to have affected a majority of users, but in any case it is now fixed, and ITSx can now handle sequence identifiers of any length. The new update brings ITSx to version 1.0.7, and it can be downloaded here. Happy barcoding!

Science for Life Laboratories (SciLifeLab) in Stockholm will host a metagenome data analysis workshop on May 21-23, in which I will participate as a tutorial assistant. Additionally, our group leader Joakim Larsson will be giving a lecture about how we use metagenomics to assess the environmental reservoir of antibiotic resistance genes (much of my recent work will likely go into that). I hope to meet you there, so don’t forget to register!

Confirmed speakers:
Lex Nederbragt, Oslo University, Norway
Saskia Smits, Erasmus University Rotterdam, Netherlands
Joakim Larsson, Göteborg University, Sweden
Paul Wilmes, University of Luxembourg, Luxembourg
Anders Andersson, SciLifeLab, Sweden
Noan Le Bescot, UPMC (Tara expedition), France

The workshop is part of the AllBio Bioinformatics initiative.

If you are thinking about doing a PhD and think that bioinformatics and antibiotic resistance is a cool subject, then now is your chance to come and join us for the next four years! There is a PhD position open i Joakim Larsson’s group, which means that if you get the job you will work with me, Joakim Larsson, Erik Kristiansson, Ørjan Samuelsen and Carl-Fredrik Flach on a super-interesting project relating to discovery of novel beta-lactamase genes (NoCURE). The project aims to better understand where, how and under what circumstances these genetic transfer events take place, in order to provide opportunities to limit or delay resistance development and thus increase the functional lifespan of precious antibiotics. The lion’s share of the work will be related to interpreting large-scale sequencing data generated by collaborators within the project; both genome sequencing and metagenomic data.

This is a great opportunity to prove your bioinformatics skills and use them for something urgently important. Full details about the position can be found here.

It’s been a while since the PETKit got any attention from me. Partially, that has been due to a nasty bug that could produce no output for one of the read files in Pefcon when using FASTA input files, but mostly it has simply been due to lack of time to continue development on the package. Now, I have finally put all threads together (bug fixes, new features, documentation) and today the 1.1 version is released! The new features are:

  • A new tool has been added – peacat – that can be used to e.g. stitch contigs together that have been separated for one reason or another in an assembly
  • Another tool – pemap – has been added that can be used to determine whether an assembled contig is from a circular DNA element
  • The default offset value for FASTQ files has been set to 33 (as in Sanger and Illumina 1.8+ PHRED format)
  • The documentation has been vastly improved (but is still rather inferior)

I got informed by a colleague that today is Taxonomist Appreciation Day! This is a very important day; quoting from the original post:

We need active work on taxonomy and systematics if our work is going to progress, and if we are to apply our findings. Without taxonomists, entire fields wouldn’t exist. We’d be working in darkness. (…) Taxonomists and systematists often work in obscurity, and some of the most painstaking projects come to fruition after long years with only a small dose of the recognition that is required.

So, send your favorite taxonomist(s) some love today, and remember they are the foundation for much of what we bioinformaticians do!

A user informed me of unexpected behavior regarding potentially chimeric sequences in ITSx, and indeed it turned out to contain a bug that over-reported potential chimeras. This bug is totally unrelated to the new version released this week, and exists in all prior ITSx versions. I strongly encourage everyone to update to ITSx 1.0.6.

I would also like to underscore that ITSx is not a chimera-checker. It detects when sequences look unusual, but all such cases should be further investigated. If you follow this practice, you will see that in some cases ITSx might have over-reported chimeras, and in some instances it will have been correct in its suspicions (and thereby you would be largely unaffected by this bug).

I am on a roll pushing out new software these days, an here’s the latest addition. This version of ITSx was finished up last month and seems to be stable enough for consumption by the users. Version 1.0.5 adds a new option: “--anchor” which enables extraction of regions flanking the ITS sequences (and the 5.8S, LSU and SSU, if desired). The option allows for extraction of a number of bases at each end, e.g. “--anchor 30” to get 30 bp before and after each ITS region, or all bases matching the corresponding HMM, by specifying “--anchor HMM“. The update can be downloaded here.