After almost a year in different stages of review and revision, in which the paper (but not the software) saw a total transformation, I am happy to announce that the paper describing Metaxa2 has been accepted in Molecular Ecology Resources and is available in a rudimentary online early form. The figures in this version are not that pretty, but those who wants to read the paper asap, you have the possibility to do so.
This means that if you have been using Metaxa2 for a publication, there is now a new preferred way of citing this, namely:
Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399
The paper (1), apart from describing the new Metaxa version, also brings a very thorough evaluation of the software, compared to other tools for taxonomic classification implemented in QIIME (2). In short, we show that:
- Metaxa2 can make trustworthy taxonomic classifications even with reads as short as 100 bp
- Generally, the performance is reliable across the entire SSU rRNA gene, regardless of which V-region a read is derived from
- Metaxa2 can reliably recapture species composition from short-read metagenomic data, comparable with results of amplicon sequencing
- Metaxa2 outperforms other popular tools such as Mothur (3), the RDP Classifier (4), Rtax (5) and the QIIME implementation of Uclust (6) in terms of proportion of correctly classified reads from metagenomic data
- The false positive rate of Metaxa2 is very close to zero; far superior to many of the above mentioned tools, many of which assume that reads must derive from the rRNA gene
Metaxa2 can be downloaded here. We have already used it for around two years internally, and it forms the base of the taxonomic classifications in e.g. our recently published paper on antibiotic resistance in a polluted Indian lake (7).
- Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH: Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data. Molecular Ecology Resources (2015). doi: 10.1111/1755-0998.12399 [Paper link]
- Caporaso JG, Kuczynski J, Stombaugh J et al.: QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7, 335–336 (2010).
- Schloss PD, Westcott SL, Ryabin T et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537–7541 (2009).
- Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73, 5261–5267 (2007).
- Soergel DAW, Dey N, Knight R, Brenner SE: Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. The ISME Journal, 6, 1440–1444 (2012).
- Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461 (2010).
- Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014).
My colleague Henrik Nilsson has been interviewed by the ResearchGate news team about the recent effort to better annotate ITS data for plant pathogenic fungi. It’s an interesting read, and I think Henrik nicely underscores why large-scale efforts for improving and correcting sequence annotations are important. You can read the interview here, and the paper they talk about is referenced below.
Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity, Volume 67, Issue 1 (2014), 11–19. doi: 10.1007/s13225-014-0291-8 [Paper link]
In a recent paper in Nature, a completely new antibiotic – teixobactin – is described (1). The really cool thing about this antibiotic is that it was discovered in a screen of uncultured bacteria, grown using new technology that enable controlled growth of single colonies in situ. I really like this idea, and I think the prospect of a novel antibiotic using a previously unexploited mechanism is super-promising, particularly in the light of alarming resistance development in clinically important pathogens (2,3). What really annoys me about the paper is the claim (already in the abstract) that since “we did not obtain any mutants of Staphylococcus aureus or Mycobacterium tuberculosis resistant to teixobactin (…) the properties of this compound suggest a path towards developing antibiotics that are likely to avoid development of resistance.” To me, this sounds pretty much like a bogus statement; in essence telling me that we apparently have not learned anything from the 70 years of antibiotics usage and resistance development. After working with antibiotic resistance a couple of years, particularly from the environmental perspective, I have a very disturbing feeling that there is already resistance mechanisms against teixobactin waiting out in the wild (4,5). Pretending that lack of mutation-associated resistance development means that there could not be resistance development did not help vancomycin (6,7), and we now see VRE (Vancomycin Resistant Enterococcus) showing up as a major problem in clinics. The “avoid development of resistance” claim is downright irresponsible, and the cynic in me cannot help to think that NovoBiotic Pharmaceuticals (the affiliation of almost half of the authors) has a monetary finger in this jar. In the end, time will tell how “resistance-resilient” teixobactin is and how well we can handle the gift of a novel antibiotic.
- Ling LL, Schneider T, Peoples AJ, Spoering AL, Engels I, Conlon BP, Mueller A, Schäberle TF, Hughes DE, Epstein S, Jones M, Lazarides L, Steadman VA, Cohen DR, Felix CR, Fetterman KA, Millett WP, Nitti AG, Zullo AM, Chen C, Lewis K: A new antibiotic kills pathogens without detectable resistance. Nature (2015). doi:10.1038/nature14098
- Finley RL, Collignon P, Larsson DGJ, McEwen SA, Li X-Z, Gaze WH, Reid-Smith R, Timinouni M, Graham DW, Topp E: The scourge of antibiotic resistance: the important role of the environment. Clin Infect Dis, 57: 704–710 (2013).
- French GL: The continuing crisis in antibiotic resistance. Int J Antimicrob Agents, 36 Suppl 3:S3–7 (2010).
- Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5: 648 (2014).
- Larsson DGJ: Antibiotics in the environment. Ups J Med Sci, 119: 108–112 (2014).
- Wright GD: Mechanisms of resistance to antibiotics. Curr Opin Chem Biol, 7:563–569 (2003).
- Werner G, Strommenger B, Witte W: Acquired vancomycin resistance in clinically relevant pathogens. Future Microbiol, 3: 547–562 (2008).
A minor bug in the “its1.full_and_partial.fasta” file has been fixed in a minor update to ITSx (1.0.11) released to day. The bug occasionally caused newline characters at the end of a sequence to be skipped and the next entry to begin at the same row. The bug only manifested itself when ITSx was used with the
--partial option and only in the above mentioned FASTA file. If you have been affected by the bug, you should have noticed as the resulting FASTA file would be considered corrupted by most bioinformatics software. The updated version of ITSx can be downloaded here.
An update to Metaxa2 that has long remained in internal testing has been deemed bug-free (as far as we can tell) and has been uploaded to the Metaxa2 web site. The update brings a slightly improved classifier, and is the first release that we declare full stable, although we have found no problems with the previously available version (release candidate 3). This also means that we take a jump directly from version 2.0, release candidate 3 to version 2.0.1 without passing a final 2.0 release. The update is available here.
Another paper I have made a contribution to have just recently been published in Molecular Ecology Resources. The paper (1), which is lead-authored by Xin-Cun Wang and Chang Liu at the Institute of Medicinal Plant Development in Beijing, investigates the usability of the ITS1 and ITS2 as separate barcodes across the Eukaryotes. The study is a large scale meta-analysis comparing available high-quality sequence data in as many taxonomic groups at possible from three different aspects: PCR amplification, DNA sequencing efficiency and species discrimination ability. Specifically, we have looked for the presence of DNA barcoding gaps, species discrimination efficiency, sequence length distribution, GC content distribution and primer universality, using bioinformatic approaches. We found that the ITS1 had significantly higher efficiencies than the ITS2 in 17 of 47 families and 20 of 49 investigated genera, which was markedly better than the performance of ITS2. We conclude that, in general, ITS1 represents a better DNA barcode than ITS2 for a majority of eukaryotic taxonomic groups. This of course doesn’t mean that using the ITS2 or the ITS region in its entirety should be dismissed, but our results can serve as a ground for making informed decisions about which region to choose for your amplicon sequencing project. The results complement what have previously been observed for e.g. fungi, where the difference between ITS1 and ITS2 were much less pronounced (2).
- Wang X-C, Liu C, Huang L, Bengtsson-Palme J, Chen H, Zhang J-H, Cai D, Li J-Q: ITS1: A DNA barcode better than ITS2 in eukaryotes? Molecular Ecology Resources. Early view. doi: 10.1111/1755-0998.12325 [Paper link]
- Blaalid R, Kumar S, Nilsson RH, Abarenkov K, Kirk PM, Kauserud H: ITS1 versus ITS2 as DNA metabarcodes for fungi. Molecular Ecology Resources. Volume 13, Issue2, Page 218-224. doi: 10.1111/1755-0998.12065 [Paper link]
I just got word from BMC Genomics that my most recent paper has just been published (in provisional form; we still have not seen the edited proofs). In this paper (1), which I have co-authored with Anders Blomberg, Magnus Alm Rosenblad and Mikael Molin, we utilize metagenomic data from the GOS-expedition (2) together with fully sequenced bacterial genomes to show that:
- Detoxification genes in general are underrepresented in marine planktonic bacteria
- Surprisingly, the detoxification that show a differential distribution are more abundant in open ocean water than closer to the coast
- Peroxidases and peroxiredoxins seem to be the main line of defense against oxidative stress for bacteria in the marine milieu, rather than e.g. catalases
- The abundance of detoxification genes does not seem to increase with estimated pollution.
From this we conclude that other selective pressures than pollution likely play the largest role in shaping marine planktonic bacterial communities, such as for example nutrient limitations. This suggests substantial streamlining of gene copy number and genome sizes, in line with observations made in previous studies (3). Along the same lines, our findings indicate that the majority of marine bacteria would have a low capacity to adapt to increased pollution, which is relevant as large amounts of human pollutants and waste end up in the oceans every year. The study exemplifies the use of metagenomics data in ecotoxicology, and how we can examine anthropogenic consequences on life in the sea using approaches derived from genomics. You can read the paper in its entirety here.
- Bengtsson-Palme J, Alm Rosenblad M, Molin M, Blomberg A: Metagenomics reveals that detoxification systems are underrepresented in marine bacterial communities. BMC Genomics. Volume 15, Issue 749 (2014). doi: 10.1186/1471-2164-15-749 [Paper link]
- Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, Van Belle C, Chandonia J-M, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, et al: The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biology. 5:e16 (2007).
- Yooseph S, Nealson KH, Rusch DB, McCrow JP, Dupont CL, Kim M, Johnson J, Montgomery R, Ferriera S, Beeson KY, Williamson SJ, Tovchigrechko A, Allen AE, Zeigler LA, Sutton G, Eisenstadt E, Rogers Y-H, Friedman R, Frazier M, Venter JC: Genomic and functional adaptation in surface ocean planktonic prokaryotes. Nature. 468:60–66 (2010).
Another paper I have co-authored related to the UNITE database for fungal rDNA ITS sequences is now published as an Online Early article in Fungal Diversity. The paper describes an effort to improve the annotation of ITS sequences from fungal plant pathogens. Why is this important? Well, apart from fungal plant pathogens being responsible for great economic losses in agriculture, the paper is also conceptually important as it shows that together we can accomplish a substantial improvement to the metadata in sequence databases. In this work we have hunted down high-quality reference sequences for various plant pathogenic fungi, and re-annotated incorrectly or insufficiently annotated ITS sequences from the same fungal lineages. In total, the 59 authors have made 31,954 changes to UNITE database data, on average 540 changes per author. While one, or a few, persons could not feasibly have made this effort alone, this work shows that in larger consortia vast improvements can be made to the quality of databases, by distributing the work among many scientists. In many ways, this relates to proposals to “wikify” GenBank, and after Rfam and Pfam it might now be time to take the user-contribution model to, at least, the RefSeq portion of GenBank, which despite its description as being “comprehensive, integrated, non-redundant, [and] well-annotated” still contains errors and examples of non-usable annotation. More on that at a later point…
Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity Online early (2014). doi: 10.1007/s13225-014-0291-8 [Paper link]
I got informed by a colleague that today is Taxonomist Appreciation Day! This is a very important day; quoting from the original post:
We need active work on taxonomy and systematics if our work is going to progress, and if we are to apply our findings. Without taxonomists, entire fields wouldn’t exist. We’d be working in darkness. (…) Taxonomists and systematists often work in obscurity, and some of the most painstaking projects come to fruition after long years with only a small dose of the recognition that is required.
So, send your favorite taxonomist(s) some love today, and remember they are the foundation for much of what we bioinformaticians do!
I am on a roll pushing out new software these days, an here’s the latest addition. This version of ITSx was finished up last month and seems to be stable enough for consumption by the users. Version 1.0.5 adds a new option: “
--anchor” which enables extraction of regions flanking the ITS sequences (and the 5.8S, LSU and SSU, if desired). The option allows for extraction of a number of bases at each end, e.g. “
--anchor 30” to get 30 bp before and after each ITS region, or all bases matching the corresponding HMM, by specifying “
--anchor HMM“. The update can be downloaded here.