I am on a roll pushing out new software these days, an here’s the latest addition. This version of ITSx was finished up last month and seems to be stable enough for consumption by the users. Version 1.0.5 adds a new option: “
--anchor” which enables extraction of regions flanking the ITS sequences (and the 5.8S, LSU and SSU, if desired). The option allows for extraction of a number of bases at each end, e.g. “
--anchor 30” to get 30 bp before and after each ITS region, or all bases matching the corresponding HMM, by specifying “
--anchor HMM“. The update can be downloaded here.
An ITSx user yesterday made me aware of an information-problem (thanks Suzanne!) regarding the use of ITSx in combination with the HMMER 3.1 beta. I have not been entirely clear on why you might get the “Error: bad format, binary auxfiles, (…) binary auxfiles are in an outdated HMMER format (3/b); please hmmpress your HMM file again” error message when running ITSx with HMMER 3.1 installed. You might think that following the instructions for Metaxa might do the trick. As you will notice, however, it will not. Instead you will be presented with the following error message: “Error: Failed to open binary auxfiles”. This is because while Metaxa 1.1.2 will re-create the HMM-files if needed, ITSx does not. Instead, ITSx has the option
"--reset T" which can be added to the command line to recreate the HMM-files for the current HMMER version installed (regardless of which 3.x version).
Thus, the solution for the “bad format, binary auxfiles” error is to simply add
"--reset T" (without quotes) to the ITSx command line and run the software again. You only need to do this once, unless you update HMMER and/or get the same error message again for some other reason. The Metaxa-post has been updated to clarify this as well.
Over the weekend, I’ve been able to finish off some stuff that has been stuck on my todo-list. Among these was to finish up the pieces of the ITSx update we put in the hands of our users today. This update brings three requested features, and a fix for an extremely rarely occurring bug:
- If the “–not_found T” option is used, ITSx now outputs both a list and a FASTA file of entries in the input file that did not have any ITS regions detected in them. This was a user requested feature, and a very nice an easily implemented one.
- As mentioned in a previous blog post, ITSx has up until now not been able to preserve the sequence headers of the input file. In hindsight, such an option would have been obvious to include, and as of version 1.0.4 ITSx comes with a “‘–preserve” option that allows headers to be carried over to all the output files.
- ITSx is now better at handling certain chimeric sequences.
In addition, there was a minor bug that very rarely (I have only seen one such example) that could cause the ITS region to be reported with negative lengths. This issue has now been fixed.
This update brings ITSx to version 1.0.4, and it can be downloaded here.
I am happy to inform you that our paper on ITSx now is out online in Methods in Ecology and Evolution issue 4.10. Meanwhile, I am slowly getting my stuff together on an update that will bring some minor requested features. The publication brings the proper citation of the ITSx paper to be:
Bengtsson-Palme, J., Ryberg, M., Hartmann, M., Branco, S., Wang, Z., Godhe, A., De Wit, P., Sánchez-García, M., Ebersberger, I., de Sousa, F., Amend, A. S., Jumpponen, A., Unterseher, M., Kristiansson, E., Abarenkov, K., Bertrand, Y. J. K., Sanli, K., Eriksson, K. M., Vik, U., Veldre, V., Nilsson, R. H. (2013), Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution, 4: 914–919. doi: 10.1111/2041-210X.12073
An ITSx user (thanks a lot, you know who you are), brought my attention to an issue that might be important to many others as well. For the reason of me overlooking a potentially very useful feature, ITSx does not have an option to preserve the sequence header in the output files. This can of course be very inconvenient in some applications, and will be addressed in an upcoming version. However, I don’t have the time at this very moment to implement and test such a feature for a new ITSx release. Instead, until I get time to do that, I have provided a little Perl-script that can take the headers of the original input file, and copy them to the output file. The script can be downloaded here. (You might have to right click the link to get the script.)
The syntax of the script is:
perl restore_headers.pl <original file> <ITSx output file in FASTA format> [<additional ITSx output files in FASTA format>]
The script saves the files with the old headers in the same directory as the ITSx output files, with the additional suffix “.restored.fasta”. I hope those of you who are missing this ITSx feature will find it useful.
Our paper on the most recent developments of the UNITE database for fungal rDNA ITS sequences has just been published as an Early view article in Molecular Ecology. In this paper, we aim to ease two of the major problems facing the identification of newly generated fungal ITS sequences: the lack of a sufficiently goof reference dataset, and the lack of a way to refer to fungal species without a Latin name. As part of a solution, we have introduced the term species hypothesis for all fungal species represented by at least two ITS sequences. The UNITE database has an easy-to-use web-based sequence management system, and we encourage everybody that can improve on the annotations or metadata of a fungal lineage to do so.
My main contribution on this paper has been to tailor ITSx functionality for the UNITE database, so that ITS data could be more easily processed for the Species Hypotheses.
Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Accepted in Molecular Ecology. doi: 10.1111/mec.12481 [Paper link]
An ITSx user informed me a couple of days ago of an issue that caused ITSx to sometimes accidentally remove the HMM-files in the database when multiple ITSx jobs were run in parallel. Although this issue should be relatively rare, it was also very easy to fix. Therefore, we already push out a new version of ITSx (1.0.3), which is available for download here.
In short, the bug was introduced because I overlooked this usage scenario when fixing another bug related to the HMM-files in an earlier pre-release. Let’s keep our fingers crossed that version 1.0.3 will be more long-lived than 1.0.2!
The paper describing our software tool ITSx has now gone online as an Early View paper on the Methods in Ecology and Evolution website. The software just recently left its beta-status behind, and with the paper out as well, we hope that as many people as possible will find use for the software in barcoding efforts of the ITS region. If you’re not familiar with the software – or its predecessor; the fungal ITS Extractor – here is a brief description of what it does:
ITSx is a Perl-based software tool that extracts the ITS1, 5.8S and ITS2 sequences – as well as full-length ITS sequences – from high-throughput sequencing data sets. To achieve this, we use carefully crafted hidden Markov models (HMMs), computed from large alignments of a total of 20 groups of eukaryotes. Testing has shown that ITSx has close to 100% detection accuracy, and virtually zero false-positive extractions. Additionally, it supports multiple processor cores, and is therefore suitable for running also on very large datasets. It is also able to eliminate non-ITS sequences from a given input dataset.
While ITSx supports extractions of ITS sequences from at least 20 different eukaryotic lineages, we ourselves have considerably less experience with many of the eukaryote groups outside of the fungi. We therefore release ITSx with the intent that the research community will evaluate its performance also in other parts of the eukaryote tree, and if necessary contribute data required to address also those lineages in a thorough way.
The ITSx paper can at the moment be cited as:
Bengtsson-Palme, J., Ryberg, M., Hartmann, M., Branco, S., Wang, Z., Godhe, A., De Wit, P., Sánchez-García, M., Ebersberger, I., de Sousa, F., Amend, A. S., Jumpponen, A., Unterseher, M., Kristiansson, E., Abarenkov, K., Bertrand, Y. J. K., Sanli, K., Eriksson, K. M., Vik, U., Veldre, V., Nilsson, R. H. (2013), Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution. doi: 10.1111/2041-210X.12073
First of all, ITSx is now taken out of beta and is now considered ready for production use. We do no longer find any bugs in it, and since there’s now a wide range of people already using it for various purposes, we feel confident that any significant bugs would have been unraveled by now.
Secondly, I have also added support for the new HMMER version (3.1b) released in May in this version of ITSx. So you can now go ahead and install HMMER 3.1 if you want to try out the new HMMER beta and still be able to use ITSx.
Finally, we have also updated the manual somewhat, hopefully making it a little easier to use ITSx for a first-time user.
Version 1.0.2 of ITSx can be downloaded from here. As previously, you may still report any bugs, strange behaviors, ideas for new features, or inconsistencies with certain lineages, by mailing to “itsx” at this domain name.
As you might be aware, a new version of HMMER is out since late May. You might wonder how Metaxa (relying on HMMER3) will work if you update to the new version of HMMER, and I have finally got around to test it! The answer, according to my somewhat limited testing, is that Metaxa 1.1.2 seems to be working fine with HMMER 3.1.
You might need to go into the database directory (“metaxa_db”; should be located in the same directory as the Metaxa binaries), and remove all the files ending with suffixes .h3f .h3i .h3m and .h3p inside the “HMMs” directory. On most installation, this should not be necessary. Myself, I just plugged HMMER 3.1 in and started Metaxa, but if you get error messages complaining that “Error: bad format, binary auxfiles,
binary auxfiles are in an outdated HMMER format (3/b); please hmmpress your HMM file again”, then you should try removing the files and re-running Metaxa. This might especially be a problem on older Metaxa versions. [Update: Note that this fix will likely not work with ITSx!]
Bear in mind that I have not run thorough testing on Metaxa and HMMER 3.1, and probably won’t for the 1.1.2 version, since there’s a 2.0 version waiting just around the corner…
Additionally, if you experience problems with Megraft, you should try the same fix as for Metaxa, but with the Megraft database directory instead. Regarding ITSx, a minor update will be released very soon, which also will address HMMER 3.1b compatibility. [Update: See this post for how to work around HMMER 3.1 problems with ITSx.]
Happy barcoding everyone!