Last week, I was informed by an ITSx user that the software behaved strangely when input files containing extremely long sequence identifiers were used. The bug is not likely to have affected a majority of users, but in any case it is now fixed, and ITSx can now handle sequence identifiers of any length. The new update brings ITSx to version 1.0.7, and it can be downloaded here. Happy barcoding!
A user informed me of unexpected behavior regarding potentially chimeric sequences in ITSx, and indeed it turned out to contain a bug that over-reported potential chimeras. This bug is totally unrelated to the new version released this week, and exists in all prior ITSx versions. I strongly encourage everyone to update to ITSx 1.0.6.
I would also like to underscore that ITSx is not a chimera-checker. It detects when sequences look unusual, but all such cases should be further investigated. If you follow this practice, you will see that in some cases ITSx might have over-reported chimeras, and in some instances it will have been correct in its suspicions (and thereby you would be largely unaffected by this bug).
I am on a roll pushing out new software these days, an here’s the latest addition. This version of ITSx was finished up last month and seems to be stable enough for consumption by the users. Version 1.0.5 adds a new option: “
--anchor” which enables extraction of regions flanking the ITS sequences (and the 5.8S, LSU and SSU, if desired). The option allows for extraction of a number of bases at each end, e.g. “
--anchor 30” to get 30 bp before and after each ITS region, or all bases matching the corresponding HMM, by specifying “
--anchor HMM“. The update can be downloaded here.
An ITSx user yesterday made me aware of an information-problem (thanks Suzanne!) regarding the use of ITSx in combination with the HMMER 3.1 beta. I have not been entirely clear on why you might get the “Error: bad format, binary auxfiles, (…) binary auxfiles are in an outdated HMMER format (3/b); please hmmpress your HMM file again” error message when running ITSx with HMMER 3.1 installed. You might think that following the instructions for Metaxa might do the trick. As you will notice, however, it will not. Instead you will be presented with the following error message: “Error: Failed to open binary auxfiles”. This is because while Metaxa 1.1.2 will re-create the HMM-files if needed, ITSx does not. Instead, ITSx has the option
"--reset T" which can be added to the command line to recreate the HMM-files for the current HMMER version installed (regardless of which 3.x version).
Thus, the solution for the “bad format, binary auxfiles” error is to simply add
"--reset T" (without quotes) to the ITSx command line and run the software again. You only need to do this once, unless you update HMMER and/or get the same error message again for some other reason. The Metaxa-post has been updated to clarify this as well.
Over the weekend, I’ve been able to finish off some stuff that has been stuck on my todo-list. Among these was to finish up the pieces of the ITSx update we put in the hands of our users today. This update brings three requested features, and a fix for an extremely rarely occurring bug:
- If the “–not_found T” option is used, ITSx now outputs both a list and a FASTA file of entries in the input file that did not have any ITS regions detected in them. This was a user requested feature, and a very nice an easily implemented one.
- As mentioned in a previous blog post, ITSx has up until now not been able to preserve the sequence headers of the input file. In hindsight, such an option would have been obvious to include, and as of version 1.0.4 ITSx comes with a “‘–preserve” option that allows headers to be carried over to all the output files.
- ITSx is now better at handling certain chimeric sequences.
In addition, there was a minor bug that very rarely (I have only seen one such example) that could cause the ITS region to be reported with negative lengths. This issue has now been fixed.
This update brings ITSx to version 1.0.4, and it can be downloaded here.
I am happy to inform you that our paper on ITSx now is out online in Methods in Ecology and Evolution issue 4.10. Meanwhile, I am slowly getting my stuff together on an update that will bring some minor requested features. The publication brings the proper citation of the ITSx paper to be:
Bengtsson-Palme, J., Ryberg, M., Hartmann, M., Branco, S., Wang, Z., Godhe, A., De Wit, P., Sánchez-García, M., Ebersberger, I., de Sousa, F., Amend, A. S., Jumpponen, A., Unterseher, M., Kristiansson, E., Abarenkov, K., Bertrand, Y. J. K., Sanli, K., Eriksson, K. M., Vik, U., Veldre, V., Nilsson, R. H. (2013), Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution, 4: 914–919. doi: 10.1111/2041-210X.12073
An ITSx user (thanks a lot, you know who you are), brought my attention to an issue that might be important to many others as well. For the reason of me overlooking a potentially very useful feature, ITSx does not have an option to preserve the sequence header in the output files. This can of course be very inconvenient in some applications, and will be addressed in an upcoming version. However, I don’t have the time at this very moment to implement and test such a feature for a new ITSx release. Instead, until I get time to do that, I have provided a little Perl-script that can take the headers of the original input file, and copy them to the output file. The script can be downloaded here. (You might have to right click the link to get the script.)
The syntax of the script is:
perl restore_headers.pl <original file> <ITSx output file in FASTA format> [<additional ITSx output files in FASTA format>]
The script saves the files with the old headers in the same directory as the ITSx output files, with the additional suffix “.restored.fasta”. I hope those of you who are missing this ITSx feature will find it useful.
Our paper on the most recent developments of the UNITE database for fungal rDNA ITS sequences has just been published as an Early view article in Molecular Ecology. In this paper, we aim to ease two of the major problems facing the identification of newly generated fungal ITS sequences: the lack of a sufficiently goof reference dataset, and the lack of a way to refer to fungal species without a Latin name. As part of a solution, we have introduced the term species hypothesis for all fungal species represented by at least two ITS sequences. The UNITE database has an easy-to-use web-based sequence management system, and we encourage everybody that can improve on the annotations or metadata of a fungal lineage to do so.
My main contribution on this paper has been to tailor ITSx functionality for the UNITE database, so that ITS data could be more easily processed for the Species Hypotheses.
Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Accepted in Molecular Ecology. doi: 10.1111/mec.12481 [Paper link]
An ITSx user informed me a couple of days ago of an issue that caused ITSx to sometimes accidentally remove the HMM-files in the database when multiple ITSx jobs were run in parallel. Although this issue should be relatively rare, it was also very easy to fix. Therefore, we already push out a new version of ITSx (1.0.3), which is available for download here.
In short, the bug was introduced because I overlooked this usage scenario when fixing another bug related to the HMM-files in an earlier pre-release. Let’s keep our fingers crossed that version 1.0.3 will be more long-lived than 1.0.2!
The paper describing our software tool ITSx has now gone online as an Early View paper on the Methods in Ecology and Evolution website. The software just recently left its beta-status behind, and with the paper out as well, we hope that as many people as possible will find use for the software in barcoding efforts of the ITS region. If you’re not familiar with the software – or its predecessor; the fungal ITS Extractor – here is a brief description of what it does:
ITSx is a Perl-based software tool that extracts the ITS1, 5.8S and ITS2 sequences – as well as full-length ITS sequences – from high-throughput sequencing data sets. To achieve this, we use carefully crafted hidden Markov models (HMMs), computed from large alignments of a total of 20 groups of eukaryotes. Testing has shown that ITSx has close to 100% detection accuracy, and virtually zero false-positive extractions. Additionally, it supports multiple processor cores, and is therefore suitable for running also on very large datasets. It is also able to eliminate non-ITS sequences from a given input dataset.
While ITSx supports extractions of ITS sequences from at least 20 different eukaryotic lineages, we ourselves have considerably less experience with many of the eukaryote groups outside of the fungi. We therefore release ITSx with the intent that the research community will evaluate its performance also in other parts of the eukaryote tree, and if necessary contribute data required to address also those lineages in a thorough way.
The ITSx paper can at the moment be cited as:
Bengtsson-Palme, J., Ryberg, M., Hartmann, M., Branco, S., Wang, Z., Godhe, A., De Wit, P., Sánchez-García, M., Ebersberger, I., de Sousa, F., Amend, A. S., Jumpponen, A., Unterseher, M., Kristiansson, E., Abarenkov, K., Bertrand, Y. J. K., Sanli, K., Eriksson, K. M., Vik, U., Veldre, V., Nilsson, R. H. (2013), Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution. doi: 10.1111/2041-210X.12073