With the publication of my latest paper last week (1), I also would like to highlight some of the software underpinning the findings a bit. To get around the problem that extremely common resistance genes could be present in multiple contexts and variants, causing assembler such as Velvet (2) to perform sub-optimally, we have written a software tool that utilizes Vmatch (3) and Trinity (4) to iteratively construct contigs from reads associated with resistance genes. This could of course be used in many other situations as well, when you want to specifically assemble a certain portion of a metagenome, but suspect that that portion might be found in multiple contexts.
TriMetAss is a Perl program, employing Vmatch and Trinity to construct multi-context contigs. TriMetAss uses extracted reads associated with, e.g., resistance genes as seeds for a Vmatch search against the complete set of read pairs, extracting reads matching with at least 49 bp (by default) to any of the seed reads. These reads are then assembled using Trinity. The resulting contigs are then used as seeds for another search using Vmatch to the complete set of reads, as above. All matches (including the previously matching read pairs) are again then used for a Trinity assembly. This iterative process is repeated until a stop criteria is met, e.g. when the total number of assembled nucleotides starts to drop rather than increase. The software can be downloaded here.
- Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014). doi: 10.3389/fmicb.2014.00648
- Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18, 821–829 (2008). doi:10.1101/gr.074492.107
- Kurtz S: The Vmatch large scale sequence analysis software (2010). http://vmatch.de/
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644–652 (2011). doi:10.1038/nbt.1883
After a long delay-time in testing ITSx version 1.0.10 has been made public. The new version patches a bug causing the 3′ anchor not being properly written to file when using the “
--anchor hmm” option. If a number was used for the “
--anchor” option, this bug did not apply. Thus, if you have not been using the “
--anchor” option together with “hmm”, you have not been affected in any way by this bug. Nevertheless, I encourage updating in case you would use the “
--anchor hmm” option in the future. The update can be downloaded here. Happy barcoding!
I would like to sincerely apologize for that I have been terrible at responding to support issues pertaining to ITSx, Metaxa, Atosh etc. lately. I am currently on 50% parental leave and at the same time I am wrapping up three first-author papers, organizing a workshop and preparing a talk. Thus, support issues has been lagging a bit behind the last weeks to be able to cope with everything else. I have been ticking off most (all?) of my support questions the last couple of days, but if I have any remaining issues that I have missed to reply to, please re-send them to me!
I will try to improve response times, but it is hard when I am working less than usual (also, note that I (strangely) don’t get paid for supporting software, so I have to do this on my “sparetime”). My aim is to respond within a few days, so if I have not done so, please resend your e-mail with a friendly reminder that you are waiting for my response. Reminding me will very likely put your question up the priority pile.
So, my advice to becoming dads is: Do take paternal leave. Do take a lot of it. Share responsibilities with your partner. Because what you get back is awesome. (And also you get a good reason not to answer support questions in time.) But finally, don’t plan to wrap up the last couple of year’s worth of work and arrange a conference at the same time as you take out paternal leave. That will only make you feel insufficient at all fronts.
Keep the spirit high!
I and one of the other developers of ITSx had a discussion a while ago about that using the
--anchor option should output the “anchor sequences” around the ITS regions also for the full-length output file (given that the
--truncate option is activated). I have today changed ITSx to employ this behaviour, updating it to version 1.0.9. The update also improves sensitivity when using the
--anchor HMM option slightly, and can be downloaded here. Happy barcoding!
ITSx has today been updated, bringing it to version 1.0.8. This update adds the “
--only_full” option, which restricts output in the ITS1, 5.8S and ITS2 files to only the files that contain the full region, i.e. that both surrounding domains have been detected. The update also fixes a bug with the
--anchor option, and can be downloaded here. Happy barcoding!
Last week, I was informed by an ITSx user that the software behaved strangely when input files containing extremely long sequence identifiers were used. The bug is not likely to have affected a majority of users, but in any case it is now fixed, and ITSx can now handle sequence identifiers of any length. The new update brings ITSx to version 1.0.7, and it can be downloaded here. Happy barcoding!
It’s been a while since the PETKit got any attention from me. Partially, that has been due to a nasty bug that could produce no output for one of the read files in Pefcon when using FASTA input files, but mostly it has simply been due to lack of time to continue development on the package. Now, I have finally put all threads together (bug fixes, new features, documentation) and today the 1.1 version is released! The new features are:
- A new tool has been added – peacat – that can be used to e.g. stitch contigs together that have been separated for one reason or another in an assembly
- Another tool – pemap – has been added that can be used to determine whether an assembled contig is from a circular DNA element
- The default offset value for FASTQ files has been set to 33 (as in Sanger and Illumina 1.8+ PHRED format)
- The documentation has been vastly improved (but is still rather inferior)
A user informed me of unexpected behavior regarding potentially chimeric sequences in ITSx, and indeed it turned out to contain a bug that over-reported potential chimeras. This bug is totally unrelated to the new version released this week, and exists in all prior ITSx versions. I strongly encourage everyone to update to ITSx 1.0.6.
I would also like to underscore that ITSx is not a chimera-checker. It detects when sequences look unusual, but all such cases should be further investigated. If you follow this practice, you will see that in some cases ITSx might have over-reported chimeras, and in some instances it will have been correct in its suspicions (and thereby you would be largely unaffected by this bug).
I am on a roll pushing out new software these days, an here’s the latest addition. This version of ITSx was finished up last month and seems to be stable enough for consumption by the users. Version 1.0.5 adds a new option: “
--anchor” which enables extraction of regions flanking the ITS sequences (and the 5.8S, LSU and SSU, if desired). The option allows for extraction of a number of bases at each end, e.g. “
--anchor 30” to get 30 bp before and after each ITS region, or all bases matching the corresponding HMM, by specifying “
--anchor HMM“. The update can be downloaded here.
The new version of Metaxa – Metaxa2 – which I first started talking about more than 1.5 years ago, has finally been determined to be so stable that we can officially release it! The release come around the same time as we submitted a paper describing the changes in it, but I will briefly go through the changes here:
- Metaxa2 now handles extraction and classification of LSU rRNA sequences in addition to SSU rRNA
- The classification engine has been completely redesigned, and now enables accurate taxonomic classifications down to the genus – or in some cases – species level
- The classification database has been updated, and is now based on the SILVA 111 release
- The Metaxa2 Taxonomic Traversal Tool – metaxa2_ttt – has been added to the package, to ease the counting of rRNA sequences in different organism groups (at various taxonomic levels)
- Metaxa2 adds support for paired-end libraries
- It is now possible to directly input of sequences in FASTQ-format to Metaxa2
- The support for libraries with short read lengths (~100 bp) has been vastly improved (and is now assumed to be the case for default settings)
- Metaxa2 can do quality pre-filtering of reads in FASTQ-format
- Metaxa2 adds support for the modern BLAST+ package (although the old blastall version is still default)
- Compatibility with the HMMER 3.1 beta
Metaxa2 brings together a large set of features that we have been gradually incorporating since 2011, many of which have been dependent on each other. Most of the new features and changes are thoroughly explained in the manual. While we hope Metaxa2 is bug free, there will likely be bugs caused by usage scenarios we have not envisioned. I therefore encourage anyone who come across some unexpected behavior to send me an e-mail. Especially, I would like to know about how the software performs using HMMER 3.1 and BLAST+, where testing has been limited compared to older parts of the code.
We hope that you will find Metaxa2 useful, and that it will bring taxonomic assessment of metagenomes another step forward! Metaxa2 can be downloaded here.