A long time ago, we (Martin Eriksson, Martin Hartmann, Henrik Nilsson and me) were invited to write an overview on Metaxa for the Encyclopedia of Metagenomics. I guess the workload for pulling such a project off is huge, so there’s no surprise that it has taken a while for it to be accepted, but now it is available for consumption.
Meanwhile, Metaxa have been getting regular updates, and I hope to soon be able to show you a new major update to the software, bringing it up to the next generation of metagenomics. More on that soon.
Those attending the Metagenomics lab (part of the basic NGS course for PhD students given at GU this week), can find the material for the lab on this page:
Of course, the page is open for anyone else as well, although you won’t get the support that the GU students are given.
Some users have asked me to fix a table output bug in Metaxa, and I have finally got around to do so. The fix is released today in the 1.1.2 Metaxa package (download here). This version also brings an updated manual (finally), as the User’s Guide has lagged behind since version 1.0. Please continue to report bugs to metaxa [at sign] microbiology [dot] se
I am on my way to Copenhagen for the ISME14 conference that begins today. I’m myself quite excited about this event, and will present three posters (two as first author), and give a short talk on antibiotic resistance gene identification and metagenomics. My talk will be in the Bioinformatics in Microbial Ecology session on Thursday afternoon (at 13.30).
If you’d like to talk about Metaxa and Megraft, I will present an SSU-oriented poster in the Monday afternoon poster section (board number 267A). My antibiotic resistance gene poster will be presented on Thursday afternoon (board number 002A), and I really encourage everyone interested in metagenomics (especially metagenomic assembly) to come talk to me then! Finally, I am also partially responsible for a poster on periphyton metagenomics with Martin Eriksson as its main author. This poster is also presented on Monday, in the Microbial Dispersion and Biogeography session (board number 021A).
I hope to be able to make another post later tonight on what are the “essential” sessions for me on this conference. Hope to see you there soon!
Yesterday, our paper on Megraft – a software tool to graft ribosomal small subunit (16S/18S) fragments onto full-length SSU sequences – became available as an accepted online early article in Research in Microbiology. Megraft is built upon the notion that when examining the depth of a community sequencing effort, researchers often use rarefaction analysis of the ribosomal small subunit (SSU/16S/18S) gene in a metagenome. However, the SSU sequences in metagenomic libraries generally are present as fragmentary, non-overlapping entries, which poses a great problem for this analysis. Megraft aims to remedy this problem by grafting the input SSU fragments from the metagenome (obtained by e.g. Metaxa) onto full-length SSU sequences. The software also uses a variability model which accounts for observed and unobserved variability. This way, Megraft enables accurate assessment of species richness and sequencing depth in metagenomic datasets.
The algorithm, efficiency and accuracy of Megraft is thoroughly described in the paper. It should be noted that this is not a panacea for species richness estimates in metagenomics, but it is a huge step forward over existing approaches. Megraft shares some similarities with EMIRGE (Miller et al., 2011), which is a software package for reconstruction of full-length ribosomal genes from paired-end Illumina sequences. Megraft, however, is set apart in that it has a strong focus on rarefaction, and functions also when the number of sequences is small, which is often the case in 454 and Sanger-based metagenomics studies. Thus, EMIRGE and Megraft seek to solve a roughly similar problem, but for different sequencing technologies and sequencing scales.
Bengtsson, J., Hartmann, M., Unterseher, M., Vaishampayan, P., Abarenkov, K., Durso, L., Bik, E.M., Garey, J.R., Eriksson, K.M., Nilsson R.H. (2012). Megraft: A software package to graftribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes and similar environmental datasets. Research in Microbiology, doi: 10.1016/j.resmic.2012.07.001.
- Miller, C. S., Baker, B. J., Thomas, B. C., Singer, S. W., & Banfield, J. F. (2011). EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biology, 12(5), R44. doi:10.1186/gb-2011-12-5-r44
I realized that I have been using a newer version of Metaxa than most of you for the last couple of months. This bug fix was written sometime in February or March, and we have kept it internal to make sure it works as it should. Then other things came across and we never got around to actually release it. But with testing passed and upcoming versions of Metaxa in the pipeline, I think it is about time that everyone gets their hands on the latest Metaxa version.
It’s only two small things this time:
- Slight tweaks to the new HMM scoring system, making Metaxa just a little bit faster
- Fixed a rarely occurring bug causing the –heuristics options to be ignored in certain circumstances
For the last months I have been (part time) struggling with getting Metaxa to eat Illumina paired-end data. This is a pretty tricky task, mainly due to the fact that Illumina reads are so much shorter than those obtained by Sanger and 454 sequencing. Therefore, I am more than happy to inform the community that today (the day before I go on vacation) I have a working prototype up and running. In fact, calling it a prototype is unfair, it is a quite far gone piece of software by now. Currently, I am running it on test data sets, and I will try to keep it running over the next couple of weeks. Thereafter, I hope to be able to release it sometime this autumn (but don’t expect a September release!), harnessing the power of Illumina sequencing for SSU identification. Stayed tuned, and have a great summer!
The guys at Pfam recently introduced a new database, called AntiFam, which will provide HMM profiles for some groups of sequences that seemingly formed larger protein families, although they were not actually real proteins. For example, rRNA sequences could contain putative ORFs, that seems to be conserved over broad lineages; with the only problem being that they are not translated into proteins in real life, as they are part of an rRNA .
With this initiative the Xfam team wants to “reduce the number of spurious proteins that make their way into the protein sequence databases.” I have run into this problem myself at some occasions with suspicious sequences in GenBank, and I highly encourage this development towards consistency and correctness in sequence databases. It is of extreme importance that databases remain reliable if we want bioinformatics to tell us anything about organismal or community functions. The Antifam database is a first step towards such a cleanup of the databases, and as such I would like to applaud Pfam for taking actions in this direction.
To my knowledge, GenBank are doing what they can with e.g. barcoding data (SSU, LSU, ITS sequences), but for bioinformatics and metagenomics (and even genomics) to remain viable, these initiatives needs to come quickly; and automated (but still very sensitive) tools for this needs to get our focus immediately. For example, Metaxa  could be used as a tool to clean up SSU sequences of misclassified origin. More such tools are needed, and a lot of work remains to be done in the area of keeping databases trustworthy in the age of large-scale sequencing.
- Tripp, H. J., Hewson, I., Boyarsky, S., Stuart, J. M., & Zehr, J. P. (2011). Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies. Nucleic Acids Research, 39(20), 8792–8802. doi:10.1093/nar/gkr576
- Bengtsson, J., Eriksson, K. M., Hartmann, M., Wang, Z., Shenoy, B. D., Grelet, G.-A., Abarenkov, K., et al. (2011). Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek, 100(3), 471–475. doi:10.1007/s10482-011-9598-6
I am extremely happy to announce that Metaxa 1.1 (first announced back in July) has finally left the beta stage, and is now designated as a feature complete 1.1 update. We consider this update stable for production use. The 1.1 update utilize hmmsearch instead of hmmscan for higher extraction speeds and better accuracy. This clever trick was inspired by a blog post by HMMER’s creator Sean Eddy on hmmscan vs hmmsearch (http://selab.janelia.org/people/eddys/blog/?p=424). As the speedup comes from the extraction step, the speed increase will be largest for huge data sets with only a small proportion of actual SSU sequences (typically large 454 metagenomes).
What took so long, you might ask, as I promised an imminent release already in August. Well, during testing a difference in scoring was discovered. This difference did not have any implications for long sequences (> ~350 bp), but caused Metaxa to have problems on short reads (most evident on ~150 bp and shorter). Therefore, the scoring system had to be redesigned, which in turn required more extensive testing. Now, however, Metaxa 1.1 has a fine-tuned scoring system, which by default is based on scores instead of E-values, and in some instances have even better detection accuracy than the old Metaxa version. We encourage everyone to try out this new version of Metaxa (although the 1.0.2 version will remain available for download). It should be bug free, but we cannot ensure 100% compatibility in all usage scenarios. Therefore, we are happy if you report any bugs or inconsistencies to the e-mail address: metaxa (at] microbiology [dot) se.
The new version of Metaxa can be downloaded here: https://microbiology.se/software/metaxa/ Please note that the manual has not yet been updated yet, so use the help feature for the up-to-date options. Happy SSU detecting!
Finally, the Metaxa FAQ is ready! If you have any other questions, please mail them to metaxa [at] microbiology [dot] se, and I will include them in the FAQ at some later point. I would like to thank anyone who has contributed with questions, suggestions, comments and other types of feedback so far. It really helps improving this software. The FAQ is found here.
You may also wonder what has happened to the stable version of the 1.1 Metaxa speedup I promised in July. It is still on the way, but due to a minor computer failure and other CPU-heavy tasks being of higher priority the software still has not been fully tested. As we want to release a truly stable and functional update, we need to hold back on the package for some more time. Be patient, or try out the beta that is already available.