This week is the first of my long summer break, and I will be on vacation until mid-August. This means that I will only read mail sporadically, if at all. For very urgent issues, please give me a call or send me an sms, and I will attend to your message as soon as possible (this of course only applies to those of you who have my number in the first place).
For support questions, there are a few options:
- For questions regarding Metaxa or Metaxa2, please add “METAXA” to the beginning of the subject line of the e-mail.
- For questions regarding ITSx, please add “ITSX” to the beginning of the subject line.
- For other support questions, please add “SUPPORT” to the beginning of the subject line.
This way I can easily assess which mails that are urgent to reply to. Don’t add “IMPORANT” or “URGENT” since that will just invoke the spam filter.
I wish you all a very very great summer!
Metaxa2 has been updated to version 2.0.2 and can be downloaded from the Metaxa2 web site. The 2.0.2 update fixes two minor bugs; one causing the “.graph” file to display incorrect or no names for the regions of the LSU regions, and one causing misreporting of the number of sequences in single-end FASTQ files (paired-end files were reported correctly). The update also brings a slightly improved classifier. Thanks to Marco Severgnini for reporting the FASTQ file issue! The update is available here.
Some of you who think ITSx is running slowly despite being assigned multiple CPUs, particularly on datasets with only one kind of sequences (e.g. fungal) using the
-t F option might be interested in trying out Andrew Krohn’s parallel ITSx implementation. The solution essentially employs a bash script spawning multiple ITSx instances running on different portions of the input file. Although there are some limitations to the script (e.g. you cannot select a custom name for the output and you will only get the ITS1 and ITS2 + full sequences FASTA files, as far as I understand the script), it may prove useful for many of you until we write up a proper solution to the poor multi-thread performance of ITSx (planned for version 1.1). In the coming months, I recommend that you check this solution out! See also the wiki documentation.
My speed tests shows the following (on a quite small test set of fungal ITS sequences):
ITSx parallel on 16 CPUs, all ITS types (option “
3 min, 16 sec
ITSx parallel on 16 CPUs, only fungal ITS types (option “
ITSx native on 16 CPUs, all ITS types (options “
-t all --cpu 16“):
4 min, 59 sec
ITSx native on 16 CPUs, only fungal types (options “
-t f --cpu 16“):
5 min, 50 sec
Why fungal only took longer time in the native implementation is a mystery to me, but probably shows why there is a need to rewrite the multithreading code, as we did with Metaxa a couple of years ago. Stay tuned for ITSx updates!
After almost a year in different stages of review and revision, in which the paper (but not the software) saw a total transformation, I am happy to announce that the paper describing Metaxa2 has been accepted in Molecular Ecology Resources and is available in a rudimentary online early form. The figures in this version are not that pretty, but those who wants to read the paper asap, you have the possibility to do so.
This means that if you have been using Metaxa2 for a publication, there is now a new preferred way of citing this, namely:
Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399
The paper (1), apart from describing the new Metaxa version, also brings a very thorough evaluation of the software, compared to other tools for taxonomic classification implemented in QIIME (2). In short, we show that:
- Metaxa2 can make trustworthy taxonomic classifications even with reads as short as 100 bp
- Generally, the performance is reliable across the entire SSU rRNA gene, regardless of which V-region a read is derived from
- Metaxa2 can reliably recapture species composition from short-read metagenomic data, comparable with results of amplicon sequencing
- Metaxa2 outperforms other popular tools such as Mothur (3), the RDP Classifier (4), Rtax (5) and the QIIME implementation of Uclust (6) in terms of proportion of correctly classified reads from metagenomic data
- The false positive rate of Metaxa2 is very close to zero; far superior to many of the above mentioned tools, many of which assume that reads must derive from the rRNA gene
Metaxa2 can be downloaded here. We have already used it for around two years internally, and it forms the base of the taxonomic classifications in e.g. our recently published paper on antibiotic resistance in a polluted Indian lake (7).
- Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
- Caporaso JG, Kuczynski J, Stombaugh J et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336 (2010).
- Schloss PD, Westcott SL, Ryabin T et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541 (2009).
- Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73, 5261–5267 (2007).
- Soergel DAW, Dey N, Knight R, Brenner SE: Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. The ISME Journal, 6, 1440–1444 (2012).
- Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461 (2010).
- Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014).
We’re approaching Christmas, and this year I will try to spend lots of time with my family and less time at the computer. We’ll see how that goes, but all in all it means that I will most likely not respond promptly to e-mails until after New Year’s, maybe not until January 8 or 9. If, for example, you have asked a support question and have not received a response before January 12 2015, then please feel free to re-send your e-mail as I should then at least have replied that I cannot solve your issue quickly.
A further note for the future is that I will be on parental leave with my lovely nine-month-old during the entire spring, so answering e-mails will not be my highest priority, and might be neglected entirely in periods. I apologize for all kinds of inconveniences that this might cause, especially for Metaxa, ITSx and Megraft users.
Merry Christmas and a Happy New Year!
An update to Metaxa2 that has long remained in internal testing has been deemed bug-free (as far as we can tell) and has been uploaded to the Metaxa2 web site. The update brings a slightly improved classifier, and is the first release that we declare full stable, although we have found no problems with the previously available version (release candidate 3). This also means that we take a jump directly from version 2.0, release candidate 3 to version 2.0.1 without passing a final 2.0 release. The update is available here.
I would like to sincerely apologize for that I have been terrible at responding to support issues pertaining to ITSx, Metaxa, Atosh etc. lately. I am currently on 50% parental leave and at the same time I am wrapping up three first-author papers, organizing a workshop and preparing a talk. Thus, support issues has been lagging a bit behind the last weeks to be able to cope with everything else. I have been ticking off most (all?) of my support questions the last couple of days, but if I have any remaining issues that I have missed to reply to, please re-send them to me!
I will try to improve response times, but it is hard when I am working less than usual (also, note that I (strangely) don’t get paid for supporting software, so I have to do this on my “sparetime”). My aim is to respond within a few days, so if I have not done so, please resend your e-mail with a friendly reminder that you are waiting for my response. Reminding me will very likely put your question up the priority pile.
So, my advice to becoming dads is: Do take paternal leave. Do take a lot of it. Share responsibilities with your partner. Because what you get back is awesome. (And also you get a good reason not to answer support questions in time.) But finally, don’t plan to wrap up the last couple of year’s worth of work and arrange a conference at the same time as you take out paternal leave. That will only make you feel insufficient at all fronts.
Keep the spirit high!
The new version of Metaxa – Metaxa2 – which I first started talking about more than 1.5 years ago, has finally been determined to be so stable that we can officially release it! The release come around the same time as we submitted a paper describing the changes in it, but I will briefly go through the changes here:
- Metaxa2 now handles extraction and classification of LSU rRNA sequences in addition to SSU rRNA
- The classification engine has been completely redesigned, and now enables accurate taxonomic classifications down to the genus – or in some cases – species level
- The classification database has been updated, and is now based on the SILVA 111 release
- The Metaxa2 Taxonomic Traversal Tool – metaxa2_ttt – has been added to the package, to ease the counting of rRNA sequences in different organism groups (at various taxonomic levels)
- Metaxa2 adds support for paired-end libraries
- It is now possible to directly input of sequences in FASTQ-format to Metaxa2
- The support for libraries with short read lengths (~100 bp) has been vastly improved (and is now assumed to be the case for default settings)
- Metaxa2 can do quality pre-filtering of reads in FASTQ-format
- Metaxa2 adds support for the modern BLAST+ package (although the old blastall version is still default)
- Compatibility with the HMMER 3.1 beta
Metaxa2 brings together a large set of features that we have been gradually incorporating since 2011, many of which have been dependent on each other. Most of the new features and changes are thoroughly explained in the manual. While we hope Metaxa2 is bug free, there will likely be bugs caused by usage scenarios we have not envisioned. I therefore encourage anyone who come across some unexpected behavior to send me an e-mail. Especially, I would like to know about how the software performs using HMMER 3.1 and BLAST+, where testing has been limited compared to older parts of the code.
We hope that you will find Metaxa2 useful, and that it will bring taxonomic assessment of metagenomes another step forward! Metaxa2 can be downloaded here.
A new year has begun, and it brings with it a few updates on the website. I have added a summary of the year 2013 from my perspective, and (as you may recognize) updated my picture on the front page. Briefly, this year will bring lots of exciting stuff. Personally, I am quite excited to finally be able to share the new version of Metaxa – Metaxa2 – which will be released to the public late this Winter (or early Spring). Additionally, I look forward to wrap up some manuscript on metagenomics and antibiotic resistance, which I have been working with for more than 2.5 years now. Also, we look forward to some super-intersting technology developments in DNA sequencing, with PacBio finally finding proper usage scenarios, Nano-pore sequencing around the corner, and super-multiplexing on the Illumina instruments. We’re in for a treat with DNA sequencing in 2014!
As you might be aware, a new version of HMMER is out since late May. You might wonder how Metaxa (relying on HMMER3) will work if you update to the new version of HMMER, and I have finally got around to test it! The answer, according to my somewhat limited testing, is that Metaxa 1.1.2 seems to be working fine with HMMER 3.1.
You might need to go into the database directory (“metaxa_db”; should be located in the same directory as the Metaxa binaries), and remove all the files ending with suffixes .h3f .h3i .h3m and .h3p inside the “HMMs” directory. On most installation, this should not be necessary. Myself, I just plugged HMMER 3.1 in and started Metaxa, but if you get error messages complaining that “Error: bad format, binary auxfiles,
binary auxfiles are in an outdated HMMER format (3/b); please hmmpress your HMM file again”, then you should try removing the files and re-running Metaxa. This might especially be a problem on older Metaxa versions. [Update: Note that this fix will likely not work with ITSx!]
Bear in mind that I have not run thorough testing on Metaxa and HMMER 3.1, and probably won’t for the 1.1.2 version, since there’s a 2.0 version waiting just around the corner…
Additionally, if you experience problems with Megraft, you should try the same fix as for Metaxa, but with the Megraft database directory instead. Regarding ITSx, a minor update will be released very soon, which also will address HMMER 3.1b compatibility. [Update: See this post for how to work around HMMER 3.1 problems with ITSx.]
Happy barcoding everyone!