Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

Browsing Posts tagged Bugs

After a long delay-time in testing ITSx version 1.0.10 has been made public. The new version patches a bug causing the 3′ anchor not being properly written to file when using the “--anchor hmm” option. If a number was used for the “--anchor” option, this bug did not apply. Thus, if you have not been using the “--anchor” option together with “hmm”, you have not been affected in any way by this bug. Nevertheless, I encourage updating in case you would use the “--anchor hmm” option in the future. The update can be downloaded here. Happy barcoding!

ITSx has today been updated, bringing it to version 1.0.8. This update adds the “--only_full” option, which restricts output in the ITS1, 5.8S and ITS2 files to only the files that contain the full region, i.e. that both surrounding domains have been detected. The update also fixes a bug with the --anchor option, and can be downloaded here. Happy barcoding!

Last week, I was informed by an ITSx user that the software behaved strangely when input files containing extremely long sequence identifiers were used. The bug is not likely to have affected a majority of users, but in any case it is now fixed, and ITSx can now handle sequence identifiers of any length. The new update brings ITSx to version 1.0.7, and it can be downloaded here. Happy barcoding!

It’s been a while since the PETKit got any attention from me. Partially, that has been due to a nasty bug that could produce no output for one of the read files in Pefcon when using FASTA input files, but mostly it has simply been due to lack of time to continue development on the package. Now, I have finally put all threads together (bug fixes, new features, documentation) and today the 1.1 version is released! The new features are:

  • A new tool has been added – peacat – that can be used to e.g. stitch contigs together that have been separated for one reason or another in an assembly
  • Another tool – pemap – has been added that can be used to determine whether an assembled contig is from a circular DNA element
  • The default offset value for FASTQ files has been set to 33 (as in Sanger and Illumina 1.8+ PHRED format)
  • The documentation has been vastly improved (but is still rather inferior)

A user informed me of unexpected behavior regarding potentially chimeric sequences in ITSx, and indeed it turned out to contain a bug that over-reported potential chimeras. This bug is totally unrelated to the new version released this week, and exists in all prior ITSx versions. I strongly encourage everyone to update to ITSx 1.0.6.

I would also like to underscore that ITSx is not a chimera-checker. It detects when sequences look unusual, but all such cases should be further investigated. If you follow this practice, you will see that in some cases ITSx might have over-reported chimeras, and in some instances it will have been correct in its suspicions (and thereby you would be largely unaffected by this bug).

I have fixed a long-standing bug in the Bloutminer script, which has thereby been pushed to version 0.9.6. The new version fixes an issue when using the -o blast option without the -n option. The new version can be downloaded here.

Over the weekend, I’ve been able to finish off some stuff that has been stuck on my todo-list. Among these was to finish up the pieces of the ITSx update we put in the hands of our users today. This update brings three requested features, and a fix for an extremely rarely occurring bug:

  1. If the “–not_found T” option is used, ITSx now outputs both a list and a FASTA file of entries in the input file that did not have any ITS regions detected in them. This was a user requested feature, and a very nice an easily implemented one.
  2. As mentioned in a previous blog post, ITSx has up until now not been able to preserve the sequence headers of the input file. In hindsight, such an option would have been obvious to include, and as of version 1.0.4 ITSx comes with a “‘–preserve” option that allows headers to be carried over to all the output files.
  3. ITSx is now better at handling certain chimeric sequences.

In addition, there was a minor bug that very rarely (I have only seen one such example) that could cause the ITS region to be reported with negative lengths. This issue has now been fixed.

This update brings ITSx to version 1.0.4, and it can be downloaded here.

An ITSx user informed me a couple of days ago of an issue that caused ITSx to sometimes accidentally remove the HMM-files in the database when multiple ITSx jobs were run in parallel. Although this issue should be relatively rare, it was also very easy to fix. Therefore, we already push out a new version of ITSx (1.0.3), which is available for download here.

In short, the bug was introduced because I overlooked this usage scenario when fixing another bug related to the HMM-files in an earlier pre-release. Let’s keep our fingers crossed that version 1.0.3 will be more long-lived than 1.0.2!

Some good and some bad news regarding the PETKit. Good news first; I have written a fourth tool for the PETKit, which is included in the latest release (version 1.0.2b, download here). The new tool is called Pesort, and sorts input read pairs (or single reads) so that the read pairs occur in the same order. It also sorts out which reads that don’t have a pair and outputs them to a separate file. All this is useful if you for some reason have ended up with a scrambled read file (pair). This can e.g. happen if you want to further process the reads after running Khmer or investigate the reads remaining after mapping to a genome.

Then the bad news. There’s a critical bug in PETKit version 1.0.1b. This bug manifest itself when using custom offsets for quality scores (using the –offset option), and makes the Pearf and Pepp tools too strict – leading to that they discard reads that actually are of good quality. This does not affect the Pefcon program. If you use the PETKit for read filtering or ORF prediction, and have used custom offset values, I recommend that you re-run your data with the newly released PETKit version (1.0.2b), in which this bug has been fixed. If you have only used the default offset setting, your safe. I sincerely apologize for any inconveniences that this might have caused.

So Metaxa has gone into the wild, which means that I start to get feedback from users using it in ways I have not foreseen. This is the best and the worst thing about having your software exposed to real-world usage; it makes it possible to improve it in a variety of ways, but it also gives you severe headaches at times. I could luckily fix a smaller bug in the Metaxa code within a matter of hours and issue an update to version 1.0.2. The interesting thing here was that I would never have discovered the bug myself, as I never would have called the Metaxa program in the way required for the bug to happen. But once I saw the command given, and the output, which the user kindly sent me, I pretty quickly realized what was wrong, and how to fix it. Therefore, I would like to ask all out you who use Metaxa to send me your questions, problems and bug reports. The feedback is highly appreciated, and I can (at least currently) promise to issue fixes as fast as possible. We are really committed to make Metaxa work for everyone.

If you have suggestions for improvements, those are welcome as well (though it will take significantly more time to implement new features than to fix bugs). I am currently compiling a FAQ, and all questions are welcome. Finally, I would like to thank everybody who has downloaded and tried the Metaxa package. I can see in the server logs that there are quite many of you, which of course makes us happy.