Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

A new year has begun, and it brings with it a few updates on the website. I have added a summary of the year 2013 from my perspective, and (as you may recognize) updated my picture on the front page. Briefly, this year will bring lots of exciting stuff. Personally, I am quite excited to finally be able to share the new version of Metaxa – Metaxa2 – which will be released to the public late this Winter (or early Spring). Additionally, I look forward to wrap up some manuscript on metagenomics and antibiotic resistance, which I have been working with for more than 2.5 years now. Also, we look forward to some super-intersting technology developments in DNA sequencing, with PacBio finally finding proper usage scenarios, Nano-pore sequencing around the corner, and super-multiplexing on the Illumina instruments. We’re in for a treat with DNA sequencing in 2014!

It seems like our paper on the recently launched database on resistance genes against antibacterial biocides and metals (BacMet) has gone online as an advance access paper in Nucleic Acids Research today. Chandan Pal – the first author of the paper, and one of my close colleagues as well as my roommate at work – has made a tremendous job taking the database from a list of genes and references, to a full-fledged browsable and searchable database with a really nice interface. I have contributed along the process, and wrote the lion’s share of the code for the BacMet-Scan tool that can be downloaded along with the database files.

BacMet is a curated source of bacterial resistance genes against antibacterial biocides and metals. All gene entries included have at least one experimentally confirmed resistance gene with references in scientific literature. However, we have also made a homology-based prediction of genes that are likely to share the same resistance function (the BacMet predicted dataset). We believe that the BacMet database will make it possible to better understand co- and cross-resistance of biocides and metals to antibiotics within bacterial genomes and in complex microbial communities from different environments.

The database can be easily accessed here: http://bacmet.biomedicine.gu.se, and use of the database in scientific work can cite the following paper, which recently appeared in Nucleic Acids Research:

Pal C, Bengtsson-Palme J, Rensing C, Kristiansson E, Larsson DGJ: BacMet: Antibacterial Biocide and Metal Resistance Genes Database. Nucleic Acids Research. Database issue, advance access. doi: 10.1093/nar/gkt1252 [Paper link]

Over the weekend, I’ve been able to finish off some stuff that has been stuck on my todo-list. Among these was to finish up the pieces of the ITSx update we put in the hands of our users today. This update brings three requested features, and a fix for an extremely rarely occurring bug:

  1. If the “–not_found T” option is used, ITSx now outputs both a list and a FASTA file of entries in the input file that did not have any ITS regions detected in them. This was a user requested feature, and a very nice an easily implemented one.
  2. As mentioned in a previous blog post, ITSx has up until now not been able to preserve the sequence headers of the input file. In hindsight, such an option would have been obvious to include, and as of version 1.0.4 ITSx comes with a “‘–preserve” option that allows headers to be carried over to all the output files.
  3. ITSx is now better at handling certain chimeric sequences.

In addition, there was a minor bug that very rarely (I have only seen one such example) that could cause the ITS region to be reported with negative lengths. This issue has now been fixed.

This update brings ITSx to version 1.0.4, and it can be downloaded here.

Those of you attending the Swedish Bioinformatics Workshop, this year given in Skövde, will have a chance seeing me talk about how sequencing depth influences the picture we get of the environmental resistance gene diversity. I think the topic is very urgent and interesting, and will likely come back to it in a more thorough blog post later. There are also a few other very interesting talks, for example about metagenomic gene quantification, and en masse sequencing of E. coli and H. pylori isolates. I think all attendants are in for a treat! See you there!

I am happy to inform you that our paper on ITSx now is out online in Methods in Ecology and Evolution issue 4.10. Meanwhile, I am slowly getting my stuff together on an update that will bring some minor requested features. The publication brings the proper citation of the ITSx paper to be:

Bengtsson-Palme, J., Ryberg, M., Hartmann, M., Branco, S., Wang, Z., Godhe, A., De Wit, P., Sánchez-García, M., Ebersberger, I., de Sousa, F., Amend, A. S., Jumpponen, A., Unterseher, M., Kristiansson, E., Abarenkov, K., Bertrand, Y. J. K., Sanli, K., Eriksson, K. M., Vik, U., Veldre, V., Nilsson, R. H. (2013), Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution, 4: 914–919. doi: 10.1111/2041-210X.12073

An ITSx user (thanks a lot, you know who you are), brought my attention to an issue that might be important to many others as well. For the reason of me overlooking a potentially very useful feature, ITSx does not have an option to preserve the sequence header in the output files. This can of course be very inconvenient in some applications, and will be addressed in an upcoming version. However, I don’t have the time at this very moment to implement and test such a feature for a new ITSx release. Instead, until I get time to do that, I have provided a little Perl-script that can take the headers of the original input file, and copy them to the output file. The script can be downloaded here. (You might have to right click the link to get the script.)

The syntax of the script is:

perl restore_headers.pl <original file> <ITSx output file in FASTA format> [<additional ITSx output files in FASTA format>]

The script saves the files with the old headers in the same directory as the ITSx output files, with the additional suffix “.restored.fasta”. I hope those of you who are missing this ITSx feature will find it useful.

A poor excuse…

No comments

I feel very sorry that I have been a little bit unresponsive for the last couple of weeks. I have received several questions regarding the PETKit and ITSx that i have not yet got around to answer. I am very sorry for that inconvenience. The reason (not a good excuse, but still) is that I have been overloaded with grant applications. This will continue through the rest of september, so please be patient until October if I don’t reply e-mails. If you need a quick response, please state so very clearly, and I might be able to squeeze you in before the start of October. Otherwise, see you at the other end of the tunnel! Thanks for the understanding.

Our new home

No comments

Last Friday, our research group moved into our new facilities at the Department of Infectious Diseases. I am very happy with my new room and my new view, both depicted below.

Our new affiliation is:

Department of Infectious Diseases, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg

Postal address:
Avd. för Klinisk bakteriologi/Virologi
Guldhedsgatan 10
SE-413 46 Göteborg

I have recently started to receive requests for full-text versions of my publications on ResearchGate. That’s great, but I have yet to figure out how to send them over, without breaking any agreements. As I am in a somewhat intensive work-period at the moment, please forgive me for not spending time on ResearchGate right now. And if you would like full-text versions of my publications, please send me an e-mail! I’ll be glad to help!

Our paper on the most recent developments of the UNITE database for fungal rDNA ITS sequences has just been published as an Early view article in Molecular Ecology. In this paper, we aim to ease two of the major problems facing the identification of newly generated fungal ITS sequences: the lack of a sufficiently goof reference dataset, and the lack of a way to refer to fungal species without a Latin name. As part of a solution, we have introduced the term species hypothesis for all fungal species represented by at least two ITS sequences. The UNITE database has an easy-to-use web-based sequence management system, and we encourage everybody that can improve on the annotations or metadata of a fungal lineage to do so.

My main contribution on this paper has been to tailor ITSx functionality for the UNITE database, so that ITS data could be more easily processed for the Species Hypotheses.

Paper reference:
Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Accepted in Molecular Ecology. doi: 10.1111/mec.12481 [Paper link]