Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

Browsing Posts in Bioinformatics

If you are thinking about doing a PhD and think that bioinformatics and antibiotic resistance is a cool subject, then now is your chance to come and join us for the next four years! There is a PhD position open i Joakim Larsson’s group, which means that if you get the job you will work with me, Joakim Larsson, Erik Kristiansson, Ørjan Samuelsen and Carl-Fredrik Flach on a super-interesting project relating to discovery of novel beta-lactamase genes (NoCURE). The project aims to better understand where, how and under what circumstances these genetic transfer events take place, in order to provide opportunities to limit or delay resistance development and thus increase the functional lifespan of precious antibiotics. The lion’s share of the work will be related to interpreting large-scale sequencing data generated by collaborators within the project; both genome sequencing and metagenomic data.

This is a great opportunity to prove your bioinformatics skills and use them for something urgently important. Full details about the position can be found here.

It’s been a while since the PETKit got any attention from me. Partially, that has been due to a nasty bug that could produce no output for one of the read files in Pefcon when using FASTA input files, but mostly it has simply been due to lack of time to continue development on the package. Now, I have finally put all threads together (bug fixes, new features, documentation) and today the 1.1 version is released! The new features are:

  • A new tool has been added – peacat – that can be used to e.g. stitch contigs together that have been separated for one reason or another in an assembly
  • Another tool – pemap – has been added that can be used to determine whether an assembled contig is from a circular DNA element
  • The default offset value for FASTQ files has been set to 33 (as in Sanger and Illumina 1.8+ PHRED format)
  • The documentation has been vastly improved (but is still rather inferior)

I got informed by a colleague that today is Taxonomist Appreciation Day! This is a very important day; quoting from the original post:

We need active work on taxonomy and systematics if our work is going to progress, and if we are to apply our findings. Without taxonomists, entire fields wouldn’t exist. We’d be working in darkness. (…) Taxonomists and systematists often work in obscurity, and some of the most painstaking projects come to fruition after long years with only a small dose of the recognition that is required.

So, send your favorite taxonomist(s) some love today, and remember they are the foundation for much of what we bioinformaticians do!

A user informed me of unexpected behavior regarding potentially chimeric sequences in ITSx, and indeed it turned out to contain a bug that over-reported potential chimeras. This bug is totally unrelated to the new version released this week, and exists in all prior ITSx versions. I strongly encourage everyone to update to ITSx 1.0.6.

I would also like to underscore that ITSx is not a chimera-checker. It detects when sequences look unusual, but all such cases should be further investigated. If you follow this practice, you will see that in some cases ITSx might have over-reported chimeras, and in some instances it will have been correct in its suspicions (and thereby you would be largely unaffected by this bug).

I am on a roll pushing out new software these days, an here’s the latest addition. This version of ITSx was finished up last month and seems to be stable enough for consumption by the users. Version 1.0.5 adds a new option: “--anchor” which enables extraction of regions flanking the ITS sequences (and the 5.8S, LSU and SSU, if desired). The option allows for extraction of a number of bases at each end, e.g. “--anchor 30” to get 30 bp before and after each ITS region, or all bases matching the corresponding HMM, by specifying “--anchor HMM“. The update can be downloaded here.

If you’re looking for a PhD position in bioinformatics, working with antibiotic resistance, there’s an opening in Erik Krisiansson’s (best bioinformatician in Gothenburg? I think so) group. To apply you need to have a master’s level degree in bioinformatics, mathematical statistics, mathematics, computer science, physics, molecular biology or any equivalent topic, obtained latest June 2014. If you’re a master student and want to join us, this is your chance! You can read more and apply for the position here.

Metaxa2 is here!

1 comment

The new version of MetaxaMetaxa2 – which I first started talking about more than 1.5 years ago, has finally been determined to be so stable that we can officially release it! The release come around the same time as we submitted a paper describing the changes in it, but I will briefly go through the changes here:

  • Metaxa2 now handles extraction and classification of LSU rRNA sequences in addition to SSU rRNA
  • The classification engine has been completely redesigned, and now enables accurate taxonomic classifications down to the genus – or in some cases – species level
  • The classification database has been updated, and is now based on the SILVA 111 release
  • The Metaxa2 Taxonomic Traversal Tool – metaxa2_ttt – has been added to the package, to ease the counting of rRNA sequences in different organism groups (at various taxonomic levels)
  • Metaxa2 adds support for paired-end libraries
  • It is now possible to directly input of sequences in FASTQ-format to Metaxa2
  • The support for libraries with short read lengths (~100 bp) has been vastly improved (and is now assumed to be the case for default settings)
  • Metaxa2 can do quality pre-filtering of reads in FASTQ-format
  • Metaxa2 adds support for the modern BLAST+ package (although the old blastall version is still default)
  • Compatibility with the HMMER 3.1 beta

Metaxa2 brings together a large set of features that we have been gradually incorporating since 2011, many of which have been dependent on each other. Most of the new features and changes are thoroughly explained in the manual. While we hope Metaxa2 is bug free, there will likely be bugs caused by usage scenarios we have not envisioned. I therefore encourage anyone who come across some unexpected behavior to send me an e-mail. Especially, I would like to know about how the software performs using HMMER 3.1 and BLAST+, where testing has been limited compared to older parts of the code.

We hope that you will find Metaxa2 useful, and that it will bring taxonomic assessment of metagenomes another step forward! Metaxa2 can be downloaded here.

I have fixed a long-standing bug in the Bloutminer script, which has thereby been pushed to version 0.9.6. The new version fixes an issue when using the -o blast option without the -n option. The new version can be downloaded here.

An ITSx user yesterday made me aware of an information-problem (thanks Suzanne!) regarding the use of ITSx in combination with the HMMER 3.1 beta. I have not been entirely clear on why you might get the “Error: bad format, binary auxfiles, (…) binary auxfiles are in an outdated HMMER format (3/b); please hmmpress your HMM file again” error message when running ITSx with HMMER 3.1 installed. You might think that following the instructions for Metaxa might do the trick. As you will notice, however, it will not. Instead you will be presented with the following error message: “Error: Failed to open binary auxfiles”. This is because while Metaxa 1.1.2 will re-create the HMM-files if needed, ITSx does not. Instead, ITSx has the option "--reset T" which can be added to the command line to recreate the HMM-files for the current HMMER version installed (regardless of which 3.x version).

Thus, the solution for the “bad format, binary auxfiles” error is to simply add "--reset T" (without quotes) to the ITSx command line and run the software again. You only need to do this once, unless you update HMMER and/or get the same error message again for some other reason. The Metaxa-post has been updated to clarify this as well.

It seems like our paper on the recently launched database on resistance genes against antibacterial biocides and metals (BacMet) has gone online as an advance access paper in Nucleic Acids Research today. Chandan Pal – the first author of the paper, and one of my close colleagues as well as my roommate at work – has made a tremendous job taking the database from a list of genes and references, to a full-fledged browsable and searchable database with a really nice interface. I have contributed along the process, and wrote the lion’s share of the code for the BacMet-Scan tool that can be downloaded along with the database files.

BacMet is a curated source of bacterial resistance genes against antibacterial biocides and metals. All gene entries included have at least one experimentally confirmed resistance gene with references in scientific literature. However, we have also made a homology-based prediction of genes that are likely to share the same resistance function (the BacMet predicted dataset). We believe that the BacMet database will make it possible to better understand co- and cross-resistance of biocides and metals to antibiotics within bacterial genomes and in complex microbial communities from different environments.

The database can be easily accessed here: http://bacmet.biomedicine.gu.se, and use of the database in scientific work can cite the following paper, which recently appeared in Nucleic Acids Research:

Pal C, Bengtsson-Palme J, Rensing C, Kristiansson E, Larsson DGJ: BacMet: Antibacterial Biocide and Metal Resistance Genes Database. Nucleic Acids Research. Database issue, advance access. doi: 10.1093/nar/gkt1252 [Paper link]