I just uploaded a mini update to ITSx, fixing a bug that caused the
--truncate option not to be accepted by the software in ITSx 1.1. This bug fix brings the software to version 1.1.1. No other changes have been introduced in this version. Download the update here. Happy barcoding!
In the 2019 database issue, Nucleic Acids Research will include a new paper on the UNITE database for molecular identification of fungi (1). I have been involved in the development of UNITE in different ways since 2012, most prominently via the ITSx (2) and Atosh software which are ticking under the hood of the database.
In this update paper, we introduce a redesigned handling of unclassifiable species hypotheses, integration with the taxonomic backbone of the Global Biodiversity Information Facility, and support for an unlimited number of parallel taxonomic classification systems. The database now contains around one million fungal ITS sequences that can be used for reference, which are clustered into roughly 459,000 species hypotheses (3). Each species hypothesis is assigned a digital object identifier (DOI), which enables unambiguous reference across studies. The paper is available as open access and the UNITE database is available open source from here.
- Nilsson RH, Larsson K-H, Taylor AFS, Bengtsson-Palme J, Jeppesen TS, Schigel D, Kennedy P, Picard K, Glöckner FO, Tedersoo L, Saar I, Kõljalg U, Abarenkov K: The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Research, Advance article, gky1022 (2018). doi: 10.1093/nar/gky1022
- Bengtsson-Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, De Wit P, Sánchez-García M, Ebersberger I, de Souza F, Amend AS, Jumpponen A, Unterseher M, Kristiansson E, Abarenkov K, Bertrand YJK, Sanli K, Eriksson KM, Vik U, Veldre V, Nilsson RH: Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for use in environmental sequencing. Methods in Ecology and Evolution, 4, 10, 914–919 (2013). doi: 10.1111/2041-210X.12073
- Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Molecular Ecology, 22, 21, 5271–5277 (2013). doi: 10.1111/mec.12481
Mattias de Hollander at the Netherlands Institute of Ecology kindly informed me that they recently added the ITSx 1.1b version to the Bioconda package manager. This will make it easy for Conda users to install ITSx automatically into their systems and pipelines and also for others who are using conda. The Bioconda version can be found here. I would like to thank Mattias for this initiative and hope that the Bioconda version of ITSx will find much use!
Today, I am very happy to announce that after years in the making and months in testing, the next generation of ITSx, version 1.1, is ready to step into the public light and scrutiny. I have today uploaded a public beta version of the ITSx 1.1 release, which I encourage everyone that have enjoyed using ITSx to try out.
The 1.1 release of ITSx includes a wide range of new feature, including:
- A 2-10x performance increase (depending on the dataset), since ITSx now utilizes hmmsearch instead of hmmscan to detect the ITS regions and distributes the CPU cores better
- Improved ITS detection among fungi and chlorophyta, by addition of new HMM-profiles
- The HMM profile format for ITSx has been updated to HMMER3/f (thus ITSx now requires HMMER version 3.1 or later)
- Better handling of interrupted HMMER searches
- Added the
--require_anchoroption to only include sequences where the complete anchor is found in the output
- Added the possibility for partial sequence output for the SSU, LSU and 5.8S regions
- Fixed a bug causing problems when reading sequence data from standard input
A lot of the code has changed in this version, which means that there might still be bugs lingering in the program. Since I will be on vacation throughout July, I encourage everyone to submit bug reports and questions, but I will not promise to respond to them until in August.
I hope that you will enjoy this new ITSx release, which you can download here. Happy barcoding!
From today, I will shut down my activities on the website and over mail to spend the Christmas holidays with my family. I will likely not read e-mails until the first week of January, and as I might then have a large pile of mail to go through, please re-send any messages to me after January 3.
I apologise to everyone who might have outstanding support issues for Metaxa2 or ITSx. If you feel that I have neglected your e-mail, please re-send it after January 3 as well, to make sure that I have not missed it.
I wish you all a very Merry Christmas and a Happy new year!
Today marks the five year anniversary for the Metaxa software’s initial release. Much has happened to the software since; Metaxa started off as an rRNA extraction utility for metagenomic data (1), including coarse classification to organism/organelle type. Since it has gained full-scale taxonomic classification ability better or on par with other software packages (2), much greater speed, support for the LSU gene, gained a range of related software tools (3), and spurred development of other tools such as ITSx (4). I have also been involved in no less than four peer-reviewed publications directly related to the software (1-3,5).
But it does not end here; these five years were just the beginning. We are – in different constellations – working on further enhancements to Metaxa2, including support for more genes, an updated classification database, and better customization options. I am very much still devoted to keep Metaxa2 alive and relevant as a tool for taxonomic analysis of metagenomes, applicable whenever accuracy is a key parameter. Thanks for being part of the community for these five years!
- Bengtsson J, Eriksson KM, Hartmann M, Wang Z, Shenoy BD, Grelet G, Abarenkov K, Petri A, Alm Rosenblad M, Nilsson RH: Metaxa: A software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek, 100, 3, 471–475 (2011). doi:10.1007/s10482-011-9598-6. [Paper link]
- Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399 [Paper link]
- Bengtsson-Palme J, Thorell K, Wurzbacher C, Sjöling Å, Nilsson RH: Metaxa2 Diversity Tools: Easing microbial community analysis with Metaxa2. Ecological Informatics, 33, 45–50 (2016). doi: 10.1016/j.ecoinf.2016.04.004 [Paper link]
- Bengtsson-Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, De Wit P, Sánchez-García M, Ebersberger I, de Souza F, Amend AS, Jumpponen A, Unterseher M, Kristiansson E, Abarenkov K, Bertrand YJK, Sanli K, Eriksson KM, Vik U, Veldre V, Nilsson RH: Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for use in environmental sequencing. Methods in Ecology and Evolution, 4, 10, 914–919 (2013). doi: 10.1111/2041-210X.12073 [Paper link]
- Bengtsson-Palme J, Hartmann M, Eriksson KM, Nilsson RH: Metaxa, overview. In:Nelson K. (Ed.) Encyclopedia of Metagenomics: SpringerReference (www.springerreference.com). Springer-Verlag Berlin Heidelberg (2013). doi: 10.1007/978-1-4614-6418-1_239-6 [Link]
This week is the first of my long summer break, and I will be on vacation until mid-August. This means that I will only read mail sporadically, if at all. For very urgent issues, please give me a call or send me an sms, and I will attend to your message as soon as possible (this of course only applies to those of you who have my number in the first place).
For support questions, there are a few options:
- For questions regarding Metaxa or Metaxa2, please add “METAXA” to the beginning of the subject line of the e-mail.
- For questions regarding ITSx, please add “ITSX” to the beginning of the subject line.
- For other support questions, please add “SUPPORT” to the beginning of the subject line.
This way I can easily assess which mails that are urgent to reply to. Don’t add “IMPORANT” or “URGENT” since that will just invoke the spam filter.
I wish you all a very very great summer!
Some of you who think ITSx is running slowly despite being assigned multiple CPUs, particularly on datasets with only one kind of sequences (e.g. fungal) using the
-t F option might be interested in trying out Andrew Krohn’s parallel ITSx implementation. The solution essentially employs a bash script spawning multiple ITSx instances running on different portions of the input file. Although there are some limitations to the script (e.g. you cannot select a custom name for the output and you will only get the ITS1 and ITS2 + full sequences FASTA files, as far as I understand the script), it may prove useful for many of you until we write up a proper solution to the poor multi-thread performance of ITSx (planned for version 1.1). In the coming months, I recommend that you check this solution out! See also the wiki documentation.
My speed tests shows the following (on a quite small test set of fungal ITS sequences):
ITSx parallel on 16 CPUs, all ITS types (option “
3 min, 16 sec
ITSx parallel on 16 CPUs, only fungal ITS types (option “
ITSx native on 16 CPUs, all ITS types (options “
-t all --cpu 16“):
4 min, 59 sec
ITSx native on 16 CPUs, only fungal types (options “
-t f --cpu 16“):
5 min, 50 sec
Why fungal only took longer time in the native implementation is a mystery to me, but probably shows why there is a need to rewrite the multithreading code, as we did with Metaxa a couple of years ago. Stay tuned for ITSx updates!
We’re approaching Christmas, and this year I will try to spend lots of time with my family and less time at the computer. We’ll see how that goes, but all in all it means that I will most likely not respond promptly to e-mails until after New Year’s, maybe not until January 8 or 9. If, for example, you have asked a support question and have not received a response before January 12 2015, then please feel free to re-send your e-mail as I should then at least have replied that I cannot solve your issue quickly.
A further note for the future is that I will be on parental leave with my lovely nine-month-old during the entire spring, so answering e-mails will not be my highest priority, and might be neglected entirely in periods. I apologize for all kinds of inconveniences that this might cause, especially for Metaxa, ITSx and Megraft users.
Merry Christmas and a Happy New Year!
A minor bug in the “its1.full_and_partial.fasta” file has been fixed in a minor update to ITSx (1.0.11) released to day. The bug occasionally caused newline characters at the end of a sequence to be skipped and the next entry to begin at the same row. The bug only manifested itself when ITSx was used with the
--partial option and only in the above mentioned FASTA file. If you have been affected by the bug, you should have noticed as the resulting FASTA file would be considered corrupted by most bioinformatics software. The updated version of ITSx can be downloaded here.