Our paper on the most recent developments of the UNITE database for fungal rDNA ITS sequences has just been published as an Early view article in Molecular Ecology. In this paper, we aim to ease two of the major problems facing the identification of newly generated fungal ITS sequences: the lack of a sufficiently goof reference dataset, and the lack of a way to refer to fungal species without a Latin name. As part of a solution, we have introduced the term species hypothesis for all fungal species represented by at least two ITS sequences. The UNITE database has an easy-to-use web-based sequence management system, and we encourage everybody that can improve on the annotations or metadata of a fungal lineage to do so.
My main contribution on this paper has been to tailor ITSx functionality for the UNITE database, so that ITS data could be more easily processed for the Species Hypotheses.
Kõljalg U, Nilsson RH, Abarenkov K, Tedersoo L, Taylor AFS, Bahram M, Bates ST, Bruns TT, Bengtsson-Palme J, Callaghan TM, Douglas B, Drenkhan T, Eberhardt U, Dueñas M, Grebenc T, Griffith GW, Hartmann M, Kirk PM, Kohout P, Larsson E, Lindahl BD, Lücking R, Martín MP, Matheny PB, Nguyen NH, Niskanen T, Oja J, Peay KG, Peintner U, Peterson M, Põldmaa K, Saag L, Saar I, Schüßler A, Senés C, Smith ME, Suija A, Taylor DE, Telleria MT, Weiß M, Larsson KH: Towards a unified paradigm for sequence-based identification of Fungi. Accepted in Molecular Ecology. doi: 10.1111/mec.12481 [Paper link]
The paper describing our software tool ITSx has now gone online as an Early View paper on the Methods in Ecology and Evolution website. The software just recently left its beta-status behind, and with the paper out as well, we hope that as many people as possible will find use for the software in barcoding efforts of the ITS region. If you’re not familiar with the software – or its predecessor; the fungal ITS Extractor – here is a brief description of what it does:
ITSx is a Perl-based software tool that extracts the ITS1, 5.8S and ITS2 sequences – as well as full-length ITS sequences – from high-throughput sequencing data sets. To achieve this, we use carefully crafted hidden Markov models (HMMs), computed from large alignments of a total of 20 groups of eukaryotes. Testing has shown that ITSx has close to 100% detection accuracy, and virtually zero false-positive extractions. Additionally, it supports multiple processor cores, and is therefore suitable for running also on very large datasets. It is also able to eliminate non-ITS sequences from a given input dataset.
While ITSx supports extractions of ITS sequences from at least 20 different eukaryotic lineages, we ourselves have considerably less experience with many of the eukaryote groups outside of the fungi. We therefore release ITSx with the intent that the research community will evaluate its performance also in other parts of the eukaryote tree, and if necessary contribute data required to address also those lineages in a thorough way.
The ITSx paper can at the moment be cited as:
Bengtsson-Palme, J., Ryberg, M., Hartmann, M., Branco, S., Wang, Z., Godhe, A., De Wit, P., Sánchez-García, M., Ebersberger, I., de Sousa, F., Amend, A. S., Jumpponen, A., Unterseher, M., Kristiansson, E., Abarenkov, K., Bertrand, Y. J. K., Sanli, K., Eriksson, K. M., Vik, U., Veldre, V., Nilsson, R. H. (2013), Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods in Ecology and Evolution. doi: 10.1111/2041-210X.12073
A long time ago, we (Martin Eriksson, Martin Hartmann, Henrik Nilsson and me) were invited to write an overview on Metaxa for the Encyclopedia of Metagenomics. I guess the workload for pulling such a project off is huge, so there’s no surprise that it has taken a while for it to be accepted, but now it is available for consumption.
Meanwhile, Metaxa have been getting regular updates, and I hope to soon be able to show you a new major update to the software, bringing it up to the next generation of metagenomics. More on that soon.
I was creating the diagram below an upcoming presentation, and I realized that the exponential growth in published metagenomics papers might be coming to an end. Interestingly enough the small drop in pace the recent years (701 -> 983 -> 1148) reminds me of the Hype Cycle, where we would (if my projection holds) have reached the “Peak of Inflated Expectations”, which means that we will see a rapid drop in the number of metagenomics publications in the next few years, as the field moves on.
The thought is interesting, but it seems a little bit early to draw any conclusions from the number of publications, yet. It is still kind of strange to note, though, that more than 20% of metagenomics publications (740/3547) are review papers. Come on, let’s do some science first and then review it… Anyway, it’ll be interesting to see what 2013 has in store for us.
I have co-authored a paper together with, among others, Henrik Nilsson that was published today in MycoKeys. The paper deals with checking quality of DNA sequences prior to using them for research purposes. In our opinion, a lot of the software available for sequence quality management is rather complex and resource intensive. Not everyone have the skills to master such software, and in addition computational resources might also be scarce. Luckily, there’s a lot that can be done in quality control of DNA sequences just using manual means and a web browser. This paper puts these means together into one comprehensible and easy-to-digest document. Our targeted audience is primaily biologists who do not have a strong background in computer science, and still have a dataset requiring DNA sequence quality control.
We have chosen to focus on the fungal ITS barcoding region, but the guidelines should be pretty general and applicable to most groups of organisms. In very short our five guidelines spells:
- ￼￼￼Establish that the sequences come from the intended gene or marker
Can be done using a multiple alignment of the sequences and verifying that they all feature some suitable, conserved sub-region (the 5.8S gene in the ITS case)
- Establish that all sequences are given in the correct (5’ to 3’) orientation
Examine the alignment for any sequences that do not align at all to the others; re-orient these; re-run the alignment step; and examine them again
- Establish that there are no (at least bad cases of) chimeras in the dataset
Run the sequences through BLAST in one of the large sequence databases, e.g. at NCBI (or in the ITS case, use the UNITE database), to verify that the best match comprises more or less the full length of the query sequences
- Establish that there are no other major technical errors in the sequences
Examine the BLAST results carefully, particularly the graphical overview and the pairwise alignment, for anomalies (there are some nice figures in the paper on how it should and should not look like)
- Establish that any taxonomic annotations given to the sequences make sense
Examine the BLAST hit list to see that the species names produced make sense
A much more thorough description of these guidelines can be found in the paper itself, which is available under open access from MycoKeys. There’s simply no reason not to go there and at least take a look at it. Happy quality control!
Nilsson RH, Tedersoo L, Abarenkov K, Ryberg M, Kristiansson E, Hartmann M, Schoch CL, Nylander JAA, Bergsten J, Porter TM, Jumpponen A, Vaishampayan P, Ovaskainen O, Hallenberg N, Bengtsson-Palme J, Eriksson KM, Larsson K-H, Larsson E, Kõljalg U: Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences. MycoKeys. Issue 4 (2012), 37–63. doi: 10.3897/mycokeys.4.3606 [Paper link]
Bengtsson J, Hartmann M, Unterseher M, Vaishampayan P, Abarenkov K, Durso L, Bik EM, Garey JR, Eriksson KM, Nilsson RH: Megraft: A software package to graft ribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes. Research in Microbiology. Volume 163, Issues 6–7 (2012), 407–412, doi: 10.1016/j.resmic.2012.07.001. [Paper link]
Megraft is currently at version 1.0.1, but I have a slightly updated version in the pipeline which will be made available later this fall.
Yesterday, our paper on Megraft – a software tool to graft ribosomal small subunit (16S/18S) fragments onto full-length SSU sequences – became available as an accepted online early article in Research in Microbiology. Megraft is built upon the notion that when examining the depth of a community sequencing effort, researchers often use rarefaction analysis of the ribosomal small subunit (SSU/16S/18S) gene in a metagenome. However, the SSU sequences in metagenomic libraries generally are present as fragmentary, non-overlapping entries, which poses a great problem for this analysis. Megraft aims to remedy this problem by grafting the input SSU fragments from the metagenome (obtained by e.g. Metaxa) onto full-length SSU sequences. The software also uses a variability model which accounts for observed and unobserved variability. This way, Megraft enables accurate assessment of species richness and sequencing depth in metagenomic datasets.
The algorithm, efficiency and accuracy of Megraft is thoroughly described in the paper. It should be noted that this is not a panacea for species richness estimates in metagenomics, but it is a huge step forward over existing approaches. Megraft shares some similarities with EMIRGE (Miller et al., 2011), which is a software package for reconstruction of full-length ribosomal genes from paired-end Illumina sequences. Megraft, however, is set apart in that it has a strong focus on rarefaction, and functions also when the number of sequences is small, which is often the case in 454 and Sanger-based metagenomics studies. Thus, EMIRGE and Megraft seek to solve a roughly similar problem, but for different sequencing technologies and sequencing scales.
Bengtsson, J., Hartmann, M., Unterseher, M., Vaishampayan, P., Abarenkov, K., Durso, L., Bik, E.M., Garey, J.R., Eriksson, K.M., Nilsson R.H. (2012). Megraft: A software package to graftribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes and similar environmental datasets. Research in Microbiology, doi: 10.1016/j.resmic.2012.07.001.
- Miller, C. S., Baker, B. J., Thomas, B. C., Singer, S. W., & Banfield, J. F. (2011). EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biology, 12(5), R44. doi:10.1186/gb-2011-12-5-r44
It seriously worries me that a number of indications recently have pointed to that the heavy use of antibiotics does not only drive antibiotic resistance development, but also the development towards more virulent and aggressive strains of pathogenic bacteria. First, the genome sequencing of the E. coli strain that caused the EHEC outbreak in Germany in May revealed not only antibiotic resistance genes, but also is also able to make Shiga toxin, which is causes the severe diarrhoea and kidney damage related to the haemolytic uremic syndrome (HUS). The genes encoding the Shiga toxin are not originally bacterial genes, but instead seem to originate from phages. When E. coli gets infected with a Shiga toxin-producing phage, it becomes a human pathogen . David Acheson, managing director for food safety at consulting firm Leavitt Partners, says that exposure to antibiotics might be enhancing the spread of Shiga toxin-producing phage. Some antibiotics triggers what is referred to as the SOS response, which induces the phage to start replicating. The replication of the phage causes the bacteria to burst, releasing the phages, and with them the toxin .
Second, there is apparently an ongoing outbreak of scarlet fever in Hong Kong. Kwok-Yung Yuen, microbiologist at the University of Hong Kong, has analyzed the draft sequence of the genome, and suggests that the bacteria acquired greater virulence and drug resistance by picking up one or more genes from bacteria in the human oral and urogenital tracts. He believes that the overuse of antibiotics is driving the emergence of drug resistance in these bacteria .
Now, both of these cases are just indications, but if they are true that would be an alarming development, where the use of antibiotics promotes the spread not only of resistance genes, impairing our ability to treat bacterial infections, but also the development of far more virulent and aggressive strains. Combining increasing untreatability with increasing aggressiveness seems to me like the ultimate weapon against our relatively high standards of treatment of common infections. Good thing hand hygiene still seems to help .
- Phage on the rampage (http://www.nature.com/news/2011/110609/full/news.2011.360.html), Published online 9 June 2011, Nature, doi:10.1038/news.2011.360
- Mutated Bacteria Drives Scarlet Fever Outbreak (http://news.sciencemag.org/scienceinsider/2011/06/mutated-bacteria-drives-scarlet.html?etoc&elq=cd94aa347dca45b3a82f144b8213e82b), Published online 27 June 2011.
- Luby SP, Halder AK, Huda T, Unicomb L, Johnston RB (2011) The Effect of Handwashing at Recommended Times with Water Alone and With Soap on Child Diarrhea in Rural Bangladesh: An Observational Study. PLoS Med 8(6): e1001052. doi:10.1371/journal.pmed.1001052 (http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.1001052)
It is a pleasure to annonce that the paper on Metaxa is now available as an Online early article in Antonie van Leeuwenhoek. In short, the paper describes a software tool that is able to extract small subunit (SSU) rRNA sequences from large data sets, such as metagenomes and environmental PCR libraries, and classify them according to bacterial, archaeal, eukaryote, chloroplast or mitochondrial origin. The program makes it easy to distinguish between e.g. the bacterial SSU sequences you like to analyze, and the SSU sequences you would like to remove prior to the analysis (e.g. mitochondrial and chloroplast sequences). This task is particularly important in metagenomics, where sequences can potentially derive from a variety of origins, but bacterial diversity often is the desired target for analysis. The software can be downloaded here, and the article can be read here. I would like to thank all the co-authors on this paper for a brilliant collaboration, and hope to be working with them again.
- Bengtsson J, Eriksson KM, Hartmann M, Wang Z, Shenoy BD, Grelet G, Abarenkov K, Petri A, Alm Rosenblad M, Nilsson RH: Metaxa: A software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek Journal of Microbiology, 2011, doi:10.1007/s10482-011-9598-6.
In a recent Nature article (1), Craig Venter and his co-workers at JCVI has not only sequenced one marine bacterium, but 137 different isolates. Their main goal of this study was to better understand the ecology of marine picoplankton in the context of Global Ocean Sampling (GOS) data (2,3). As I see it, there are at least two really interesting things going on here:
First, this is a milestone in sequencing. Were not talking one genome – one article anymore. Were talking one article – 137 new genomes. This vastly raises the bar for any sequencing efforts in the future, but even more importantly, it shifts the focus even further from the actual sequencing to the purpose of the sequencing. One sequenced genome might be interesting enough if it fills a biological knowledge gap, but just sequencing a bacterial strain isn’t worth that much anymore. With the arrival of second- and third-generation sequencing techniques, this development was pretty obvious, but this article is (to my knowledge) the first real proof of that this has finally happened. I expect that five to ten years from now, not sequencing an organism of interest for your research will be viewed as very strange and backwards-looking. “Why didn’t you sequence this?” will be a highly relevant review question for many publications. But also the days when you could write “we here publish for the first time the complete genome sequence of <insert organism name here>” and have that as the central theme for an article will soon be over. Sequencing will simply be reduced to the (valuable) tool it actually is. Which is probably good, as it brings us back to biology again. Articles like this one, where you look at ~200 genomes to investigate ecological questions, are simply providing a more relevant biological perspective than staring at the sequence of one genome in a time when DNA-data is flooding over us.
Second, this is the first (again, to my knowledge) publication where questions arising from metagenomics (2,3,4) has initiated a huge sequencing effort to understand the ecology or the environment to which the metagenome is associated. This highlights a new use of metagenomics as a prospective technique, to mine various environments for interesting features, and then select a few of its inhabitants and look closer at who is responsible for what. With a number of emerging single cell sequencing and visualisation techniques (5,6,7,8) as well as the application of cell sorting approaches to environmental communities (5,9), we can expect metagenomics to play a huge role in organism, strain and protein discovery, but also in determining microbial ecosystem services. Though Venter’s latest article (1) is just a first step towards this new role for metagenomics, it’s a nice example of what (meta)genomics could look like towards the end of this decade, if even not sooner.
- Yooseph et al. Genomic and functional adaptation in surface ocean planktonic prokaryotes. Nature (2010) vol. 468 (7320) pp. 60-6
- Yooseph et al. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. Plos Biol (2007) vol. 5 (3) pp. e16
- Rusch et al. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. Plos Biol (2007) vol. 5 (3) pp. e77
- Rusch et al. Characterization of Prochlorococcus clades from iron-depleted oceanic regions. Proceedings of the National Academy of Sciences of the United States of America (2010) pp.
- Woyke et al. Assembling the marine metagenome, one cell at a time. PLoS ONE (2009) vol. 4 (4) pp. e5299
- Woyke et al. One bacterial cell, one complete genome. PLoS ONE (2010) vol. 5 (4) pp. e10314
- Moraru et al. GeneFISH – an in situ technique for linking gene presence and cell identity in environmental microorganisms. Environ Microbiol (2010) pp.
- Lasken. Genomic DNA amplification by the multiple displacement amplification (MDA) method. Biochem Soc Trans (2009) vol. 37 (Pt 2) pp. 450-3
- Mary et al. Metaproteomic and metagenomic analyses of defined oceanic microbial populations using microwave cell fixation and flow cytometric sorting. FEMS microbiology ecology (2010) pp.