Update on the PhD candidate evaluation
Thanks a lot to all of those who applied to the PhD position opening that closed a week ago. In total we received 59 applications, of which the vast majority were of high quality – I am sure that at least half the candidates would have made a great job in the position. However, we have to make a selection among these 59 candidates, so after reading and evaluating all 59 applications, we have now nailed down ten top candidates that we will initially move forward with. Those ten candidates should have received an e-mail today about how the process will move forward.
If you have not received an e-mail from us, the most likely explanation is that you were not among these top ten candidates (but remember to also check your spam!) In that case, you will get a follow-up once the position is filled.
Again, thanks a lot for your interest. I have been overwhelmed by the high quality and relevance of the applications.
Published paper: Evaluating taxonomic classification software
Yesterday, Molecular Ecology Resources put online an unedited version of a recent paper which I co-authored. This time, Rodney Richardson at Ohio State University has made a tremendous work of evaluating three taxonomic classification software – the RDP Naïve Bayesian Classifier, RTAX and UTAX – on a set of DNA barcoding regions commonly used for plants, namely the ITS2, and the matK, rbcL, trnL and trnH genes.
In the paper (1), we discuss the results, merits and limitations of the classifiers. In brief, we found that:
- There is a considerable trade-off between accuracy and sensitivity for the classifiers tested, which indicates a need for improved sequence classification tools (2)
- UTAX was superior with respect to error rate, but was exceedingly stringent and thus suffered from a low assignment rate
- The RDP Naïve Bayesian Classifier displayed high sensitivity and low error at the family and order levels, but had a genus-level error rate of 9.6 percent
- RTAX showed high sensitivity at all taxonomic ranks, but at the same time consistently produced the high error rates
- The choice of locus has significant effects on the classification sensitivity of all tested tools
- All classifiers showed strong relationships between database completeness, classification sensitivity and classification accuracy
We believe that the methods of comparison we have used are simple and robust, and thereby provides a methodological and conceptual foundation for future software evaluations. On a personal note, I will thoroughly enjoy working with Rodney and Reed again; I had a great time discussing the ins and outs of taxonomic classification with them! The paper can be found here.
References and notes
- Richardson RT, Bengtsson-Palme J, Johnson RM: Evaluating and Optimizing the Performance of Software Commonly Used for the Taxonomic Classification of DNA Sequence Data. Molecular Ecology Resources, Early view (2016). doi: 10.1111/1755-0998.12628 [Paper link]
- This is something that several classifiers also showed in the evaluation we did for the Metaxa2 paper (3). Interestingly enough, Metaxa2 is better at maintaining high accuracy also as sensitivity is increased.
- Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular Ecology Resources, 15, 6, 1403–1414 (2015). doi: 10.1111/1755-0998.12399 [Paper link]