Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg | Wisconsin Institute for Discovery

Browsing Posts tagged Software updates

A minor bug in the “its1.full_and_partial.fasta” file has been fixed in a minor update to ITSx (1.0.11) released to day. The bug occasionally caused newline characters at the end of a sequence to be skipped and the next entry to begin at the same row. The bug only manifested itself when ITSx was used with the --partial option and only in the above mentioned FASTA file. If you have been affected by the bug, you should have noticed as the resulting FASTA file would be considered corrupted by most bioinformatics software. The updated version of ITSx can be downloaded here.

With the publication of my latest paper last week (1), I also would like to highlight some of the software underpinning the findings a bit. To get around the problem that extremely common resistance genes could be present in multiple contexts and variants, causing assembler such as Velvet (2) to perform sub-optimally, we have written a software tool that utilizes Vmatch (3) and Trinity (4) to iteratively construct contigs from reads associated with resistance genes. This could of course be used in many other situations as well, when you want to specifically assemble a certain portion of a metagenome, but suspect that that portion might be found in multiple contexts.

TriMetAss is a Perl program, employing Vmatch and Trinity to construct multi-context contigs. TriMetAss uses extracted reads associated with, e.g., resistance genes as seeds for a Vmatch search against the complete set of read pairs, extracting reads matching with at least 49 bp (by default) to any of the seed reads. These reads are then assembled using Trinity. The resulting contigs are then used as seeds for another search using Vmatch to the complete set of reads, as above. All matches (including the previously matching read pairs) are again then used for a Trinity assembly. This iterative process is repeated until a stop criteria is met, e.g. when the total number of assembled nucleotides starts to drop rather than increase. The software can be downloaded here.

References:

  1. Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014). doi: 10.3389/fmicb.2014.00648
  2. Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18, 821–829 (2008). doi:10.1101/gr.074492.107
  3. Kurtz S: The Vmatch large scale sequence analysis software (2010). http://vmatch.de/
  4. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644–652 (2011). doi:10.1038/nbt.1883

Metaxa2 update

Comments off

An update to Metaxa2 that has long remained in internal testing has been deemed bug-free (as far as we can tell) and has been uploaded to the Metaxa2 web site. The update brings a slightly improved classifier, and is the first release that we declare full stable, although we have found no problems with the previously available version (release candidate 3). This also means that we take a jump directly from version 2.0, release candidate 3 to version 2.0.1 without passing a final 2.0 release. The update is available here.

After a long delay-time in testing ITSx version 1.0.10 has been made public. The new version patches a bug causing the 3′ anchor not being properly written to file when using the “--anchor hmm” option. If a number was used for the “--anchor” option, this bug did not apply. Thus, if you have not been using the “--anchor” option together with “hmm”, you have not been affected in any way by this bug. Nevertheless, I encourage updating in case you would use the “--anchor hmm” option in the future. The update can be downloaded here. Happy barcoding!

I and one of the other developers of ITSx had a discussion a while ago about that using the --anchor option should output the “anchor sequences” around the ITS regions also for the full-length output file (given that the --truncate option is activated). I have today changed ITSx to employ this behaviour, updating it to version 1.0.9. The update also improves sensitivity when using the --anchor HMM option slightly, and can be downloaded here. Happy barcoding!

ITSx has today been updated, bringing it to version 1.0.8. This update adds the “--only_full” option, which restricts output in the ITS1, 5.8S and ITS2 files to only the files that contain the full region, i.e. that both surrounding domains have been detected. The update also fixes a bug with the --anchor option, and can be downloaded here. Happy barcoding!

Last week, I was informed by an ITSx user that the software behaved strangely when input files containing extremely long sequence identifiers were used. The bug is not likely to have affected a majority of users, but in any case it is now fixed, and ITSx can now handle sequence identifiers of any length. The new update brings ITSx to version 1.0.7, and it can be downloaded here. Happy barcoding!

It’s been a while since the PETKit got any attention from me. Partially, that has been due to a nasty bug that could produce no output for one of the read files in Pefcon when using FASTA input files, but mostly it has simply been due to lack of time to continue development on the package. Now, I have finally put all threads together (bug fixes, new features, documentation) and today the 1.1 version is released! The new features are:

  • A new tool has been added – peacat – that can be used to e.g. stitch contigs together that have been separated for one reason or another in an assembly
  • Another tool – pemap – has been added that can be used to determine whether an assembled contig is from a circular DNA element
  • The default offset value for FASTQ files has been set to 33 (as in Sanger and Illumina 1.8+ PHRED format)
  • The documentation has been vastly improved (but is still rather inferior)

A user informed me of unexpected behavior regarding potentially chimeric sequences in ITSx, and indeed it turned out to contain a bug that over-reported potential chimeras. This bug is totally unrelated to the new version released this week, and exists in all prior ITSx versions. I strongly encourage everyone to update to ITSx 1.0.6.

I would also like to underscore that ITSx is not a chimera-checker. It detects when sequences look unusual, but all such cases should be further investigated. If you follow this practice, you will see that in some cases ITSx might have over-reported chimeras, and in some instances it will have been correct in its suspicions (and thereby you would be largely unaffected by this bug).

I am on a roll pushing out new software these days, an here’s the latest addition. This version of ITSx was finished up last month and seems to be stable enough for consumption by the users. Version 1.0.5 adds a new option: “--anchor” which enables extraction of regions flanking the ITS sequences (and the 5.8S, LSU and SSU, if desired). The option allows for extraction of a number of bases at each end, e.g. “--anchor 30” to get 30 bp before and after each ITS region, or all bases matching the corresponding HMM, by specifying “--anchor HMM“. The update can be downloaded here.