You know the feeling when your assembler supports paired-end sequences, but your FASTQ quality filterer doesn’t care about what pairs that belong together? Meaning that you end up with a mess of sequences that you have to script together in some way. Gosh, that feeling is way too common. It is for situations like that I have put together the Paired-End ToolKit (PETKit), a collection of FASTQ/FASTA sequence handling programs written in Perl. Currently the toolkit contains three command-line tools that does sequence conversion, quality filtering, and ORF prediction, all adapted for paired-end sequences specifically. You can read more about the programs, which are released as open source software, on the PETKit page. At the moment they lack proper documentation, but running the software with the “–help” option should bring up a useful set of options for each tool. This is still considered beta-software, so any bug reports, and especially suggestions, are welcome.
Also, if you have an idea of another problem that is unsolved or badly executed for paired-end sequences, let me know, and I will see if I can implement it in PETKit.
For the last months I have been (part time) struggling with getting Metaxa to eat Illumina paired-end data. This is a pretty tricky task, mainly due to the fact that Illumina reads are so much shorter than those obtained by Sanger and 454 sequencing. Therefore, I am more than happy to inform the community that today (the day before I go on vacation) I have a working prototype up and running. In fact, calling it a prototype is unfair, it is a quite far gone piece of software by now. Currently, I am running it on test data sets, and I will try to keep it running over the next couple of weeks. Thereafter, I hope to be able to release it sometime this autumn (but don’t expect a September release!), harnessing the power of Illumina sequencing for SSU identification. Stayed tuned, and have a great summer!