Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg | Wisconsin Institute for Discovery

Browsing Posts tagged ITSx

ITSx in Bioconda

Comments off

Mattias de Hollander at the Netherlands Institute of Ecology kindly informed me that they recently added the ITSx 1.1b version to the Bioconda package manager. This will make it easy for Conda users to install ITSx automatically into their systems and pipelines and also for others who are using conda. The Bioconda version can be found here. I would like to thank Mattias for this initiative and hope that the Bioconda version of ITSx will find much use!

Today, I am very happy to announce that after years in the making and months in testing, the next generation of ITSx, version 1.1, is ready to step into the public light and scrutiny. I have today uploaded a public beta version of the ITSx 1.1 release, which I encourage everyone that have enjoyed using ITSx to try out.

The 1.1 release of ITSx includes a wide range of new feature, including:

  • A 2-10x performance increase (depending on the dataset), since ITSx now utilizes hmmsearch instead of hmmscan to detect the ITS regions and distributes the CPU cores better
  • Improved ITS detection among fungi and chlorophyta, by addition of new HMM-profiles
  • The HMM profile format for ITSx has been updated to HMMER3/f (thus ITSx now requires HMMER version 3.1 or later)
  • Better handling of interrupted HMMER searches
  • Added the --require_anchor option to only include sequences where the complete anchor is found in the output
  • Added the possibility for partial sequence output for the SSU, LSU and 5.8S regions
  • Fixed a bug causing problems when reading sequence data from standard input

A lot of the code has changed in this version, which means that there might still be bugs lingering in the program. Since I will be on vacation throughout July, I encourage everyone to submit bug reports and questions, but I will not promise to respond to them until in August.

I hope that you will enjoy this new ITSx release, which you can download here. Happy barcoding!

Merry Christmas

Comments off

From today, I will shut down my activities on the website and over mail to spend the Christmas holidays with my family. I will likely not read e-mails until the first week of January, and as I might then have a large pile of mail to go through, please re-send any messages to me after January 3.

I apologise to everyone who might have outstanding support issues for Metaxa2 or ITSx. If you feel that I have neglected your e-mail, please re-send it after January 3 as well, to make sure that I have not missed it.

I wish you all a very Merry Christmas and a Happy new year!

Today marks the five year anniversary for the Metaxa software’s initial release. Much has happened to the software since; Metaxa started off as an rRNA extraction utility for metagenomic data (1), including coarse classification to organism/organelle type. Since it has gained full-scale taxonomic classification ability better or on par with other software packages (2), much greater speed, support for the LSU gene, gained a range of related software tools (3), and spurred development of other tools such as ITSx (4). I have also been involved in no less than four peer-reviewed publications directly related to the software (1-3,5).

But it does not end here; these five years were just the beginning. We are – in different constellations – working on further enhancements to Metaxa2, including support for more genes, an updated classification database, and better customization options. I am very much still devoted to keep Metaxa2 alive and relevant as a tool for taxonomic analysis of metagenomes, applicable whenever accuracy is a key parameter. Thanks for being part of the community for these five years!

References

  1. Bengtsson J, Eriksson KM, Hartmann M, Wang Z, Shenoy BD, Grelet G, Abarenkov K, Petri A, Alm Rosenblad M, Nilsson RH: Metaxa: A software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek, 100, 3, 471–475 (2011). doi:10.1007/s10482-011-9598-6. [Paper link]
  2. Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399 [Paper link]
  3. Bengtsson-Palme J, Thorell K, Wurzbacher C, Sjöling Å, Nilsson RH: Metaxa2 Diversity Tools: Easing microbial community analysis with Metaxa2. Ecological Informatics, 33, 45–50 (2016). doi: 10.1016/j.ecoinf.2016.04.004 [Paper link]
  4. Bengtsson-Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, De Wit P, Sánchez-García M, Ebersberger I, de Souza F, Amend AS, Jumpponen A, Unterseher M, Kristiansson E, Abarenkov K, Bertrand YJK, Sanli K, Eriksson KM, Vik U, Veldre V, Nilsson RH: Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for use in environmental sequencing. Methods in Ecology and Evolution, 4, 10, 914–919 (2013). doi: 10.1111/2041-210X.12073 [Paper link]
  5. Bengtsson-Palme J, Hartmann M, Eriksson KM, Nilsson RH: Metaxa, overview. In:Nelson K. (Ed.) Encyclopedia of Metagenomics: SpringerReference (www.springerreference.com). Springer-Verlag Berlin Heidelberg (2013). doi: 10.1007/978-1-4614-6418-1_239-6 [Link]

Vacation

Comments off

This week is the first of my long summer break, and I will be on vacation until mid-August. This means that I will only read mail sporadically, if at all. For very urgent issues, please give me a call or send me an sms, and I will attend to your message as soon as possible (this of course only applies to those of you who have my number in the first place).

For support questions, there are a few options:

  • For questions regarding Metaxa or Metaxa2, please add “METAXA” to the beginning of the subject line of the e-mail.
  • For questions regarding ITSx, please add “ITSX” to the beginning of the subject line.
  • For other support questions, please add “SUPPORT” to the beginning of the subject line.

This way I can easily assess which mails that are urgent to reply to. Don’t add “IMPORANT” or “URGENT” since that will just invoke the spam filter.

I wish you all a very very great summer!

Some of you who think ITSx is running slowly despite being assigned multiple CPUs, particularly on datasets with only one kind of sequences (e.g. fungal) using the -t F option might be interested in trying out Andrew Krohn’s parallel ITSx implementation. The solution essentially employs a bash script spawning multiple ITSx instances running on different portions of the input file. Although there are some limitations to the script (e.g. you cannot select a custom name for the output and you will only get the ITS1 and ITS2 + full sequences FASTA files, as far as I understand the script), it may prove useful for many of you until we write up a proper solution to the poor multi-thread performance of ITSx (planned for version 1.1). In the coming months, I recommend that you check this solution out! See also the wiki documentation.

My speed tests shows the following (on a quite small test set of fungal ITS sequences):
ITSx parallel on 16 CPUs, all ITS types (option “-t all“):
3 min, 16 sec
ITSx parallel on 16 CPUs, only fungal ITS types (option “-t f“):
54 sec
ITSx native on 16 CPUs, all ITS types (options “-t all --cpu 16“):
4 min, 59 sec
ITSx native on 16 CPUs, only fungal types (options “-t f --cpu 16“):
5 min, 50 sec

Why fungal only took longer time in the native implementation is a mystery to me, but probably shows why there is a need to rewrite the multithreading code, as we did with Metaxa a couple of years ago. Stay tuned for ITSx updates!

We’re approaching Christmas, and this year I will try to spend lots of time with my family and less time at the computer. We’ll see how that goes, but all in all it means that I will most likely not respond promptly to e-mails until after New Year’s, maybe not until January 8 or 9. If, for example, you have asked a support question and have not received a response before January 12 2015, then please feel free to re-send your e-mail as I should then at least have replied that I cannot solve your issue quickly.

A further note for the future is that I will be on parental leave with my lovely nine-month-old during the entire spring, so answering e-mails will not be my highest priority, and might be neglected entirely in periods. I apologize for all kinds of inconveniences that this might cause, especially for Metaxa, ITSx and Megraft users.

Merry Christmas and a Happy New Year!

A minor bug in the “its1.full_and_partial.fasta” file has been fixed in a minor update to ITSx (1.0.11) released to day. The bug occasionally caused newline characters at the end of a sequence to be skipped and the next entry to begin at the same row. The bug only manifested itself when ITSx was used with the --partial option and only in the above mentioned FASTA file. If you have been affected by the bug, you should have noticed as the resulting FASTA file would be considered corrupted by most bioinformatics software. The updated version of ITSx can be downloaded here.

After a long delay-time in testing ITSx version 1.0.10 has been made public. The new version patches a bug causing the 3′ anchor not being properly written to file when using the “--anchor hmm” option. If a number was used for the “--anchor” option, this bug did not apply. Thus, if you have not been using the “--anchor” option together with “hmm”, you have not been affected in any way by this bug. Nevertheless, I encourage updating in case you would use the “--anchor hmm” option in the future. The update can be downloaded here. Happy barcoding!

I would like to sincerely apologize for that I have been terrible at responding to support issues pertaining to ITSx, Metaxa, Atosh etc. lately. I am currently on 50% parental leave and at the same time I am wrapping up three first-author papers, organizing a workshop and preparing a talk. Thus, support issues has been lagging a bit behind the last weeks to be able to cope with everything else. I have been ticking off most (all?) of my support questions the last couple of days, but if I have any remaining issues that I have missed to reply to, please re-send them to me!

I will try to improve response times, but it is hard when I am working less than usual (also, note that I (strangely) don’t get paid for supporting software, so I have to do this on my “sparetime”). My aim is to respond within a few days, so if I have not done so, please resend your e-mail with a friendly reminder that you are waiting for my response. Reminding me will very likely put your question up the priority pile.

So, my advice to becoming dads is: Do take paternal leave. Do take a lot of it. Share responsibilities with your partner. Because what you get back is awesome. (And also you get a good reason not to answer support questions in time.) But finally, don’t plan to wrap up the last couple of year’s worth of work and arrange a conference at the same time as you take out paternal leave. That will only make you feel insufficient at all fronts.

Keep the spirit high!