A third-party parallel ITSx implementation
Some of you who think ITSx is running slowly despite being assigned multiple CPUs, particularly on datasets with only one kind of sequences (e.g. fungal) using the
-t F option might be interested in trying out Andrew Krohn’s parallel ITSx implementation. The solution essentially employs a bash script spawning multiple ITSx instances running on different portions of the input file. Although there are some limitations to the script (e.g. you cannot select a custom name for the output and you will only get the ITS1 and ITS2 + full sequences FASTA files, as far as I understand the script), it may prove useful for many of you until we write up a proper solution to the poor multi-thread performance of ITSx (planned for version 1.1). In the coming months, I recommend that you check this solution out! See also the wiki documentation.
My speed tests shows the following (on a quite small test set of fungal ITS sequences):
ITSx parallel on 16 CPUs, all ITS types (option “
3 min, 16 sec
ITSx parallel on 16 CPUs, only fungal ITS types (option “
ITSx native on 16 CPUs, all ITS types (options “
-t all --cpu 16“):
4 min, 59 sec
ITSx native on 16 CPUs, only fungal types (options “
-t f --cpu 16“):
5 min, 50 sec
Why fungal only took longer time in the native implementation is a mystery to me, but probably shows why there is a need to rewrite the multithreading code, as we did with Metaxa a couple of years ago. Stay tuned for ITSx updates!