New paper accepted: Megraft
Yesterday, our paper on Megraft – a software tool to graft ribosomal small subunit (16S/18S) fragments onto full-length SSU sequences – became available as an accepted online early article in Research in Microbiology. Megraft is built upon the notion that when examining the depth of a community sequencing effort, researchers often use rarefaction analysis of the ribosomal small subunit (SSU/16S/18S) gene in a metagenome. However, the SSU sequences in metagenomic libraries generally are present as fragmentary, non-overlapping entries, which poses a great problem for this analysis. Megraft aims to remedy this problem by grafting the input SSU fragments from the metagenome (obtained by e.g. Metaxa) onto full-length SSU sequences. The software also uses a variability model which accounts for observed and unobserved variability. This way, Megraft enables accurate assessment of species richness and sequencing depth in metagenomic datasets.
The algorithm, efficiency and accuracy of Megraft is thoroughly described in the paper. It should be noted that this is not a panacea for species richness estimates in metagenomics, but it is a huge step forward over existing approaches. Megraft shares some similarities with EMIRGE (Miller et al., 2011), which is a software package for reconstruction of full-length ribosomal genes from paired-end Illumina sequences. Megraft, however, is set apart in that it has a strong focus on rarefaction, and functions also when the number of sequences is small, which is often the case in 454 and Sanger-based metagenomics studies. Thus, EMIRGE and Megraft seek to solve a roughly similar problem, but for different sequencing technologies and sequencing scales.
Megraft is available for download here, and the paper can be read here.
-
Bengtsson, J., Hartmann, M., Unterseher, M., Vaishampayan, P., Abarenkov, K., Durso, L., Bik, E.M., Garey, J.R., Eriksson, K.M., Nilsson R.H. (2012). Megraft: A software package to graftribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes and similar environmental datasets. Research in Microbiology, doi: 10.1016/j.resmic.2012.07.001.
- Miller, C. S., Baker, B. J., Thomas, B. C., Singer, S. W., & Banfield, J. F. (2011). EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biology, 12(5), R44. doi:10.1186/gb-2011-12-5-r44
One more thing…
I realized that I have been using a newer version of Metaxa than most of you for the last couple of months. This bug fix was written sometime in February or March, and we have kept it internal to make sure it works as it should. Then other things came across and we never got around to actually release it. But with testing passed and upcoming versions of Metaxa in the pipeline, I think it is about time that everyone gets their hands on the latest Metaxa version.
It’s only two small things this time:
- Slight tweaks to the new HMM scoring system, making Metaxa just a little bit faster
- Fixed a rarely occurring bug causing the –heuristics options to be ignored in certain circumstances
Metaxa and Illumina data
For the last months I have been (part time) struggling with getting Metaxa to eat Illumina paired-end data. This is a pretty tricky task, mainly due to the fact that Illumina reads are so much shorter than those obtained by Sanger and 454 sequencing. Therefore, I am more than happy to inform the community that today (the day before I go on vacation) I have a working prototype up and running. In fact, calling it a prototype is unfair, it is a quite far gone piece of software by now. Currently, I am running it on test data sets, and I will try to keep it running over the next couple of weeks. Thereafter, I hope to be able to release it sometime this autumn (but don’t expect a September release!), harnessing the power of Illumina sequencing for SSU identification. Stayed tuned, and have a great summer!
Presentation at SocBiN 2012
For those of you who like to listen to (or look at) me, I will be giving a presentation at this year’s SocBiN conference in Stockholm. My presentation has the long and quite informative title: Comprehensive Analysis of Antibiotic Resistance Genes in River Sediment, Well Water and Soil Microbial Communities Using Metagenomic DNA Sequencing. The talk is scheduled in the Using Next generation sequence data session, right after Jeroen Raes and Christopher Quince… It’s a short talk, so I will probably need to keep it simple, but it will be the first time I present results generated in relation with my present position, which I personally feel is very nice. We’re moving forward!
Swedish monitoring of hazardous substances
I was recently involved as an adviser in a report by the County Administrative Board in Västra Götaland (Länsstyrelsen) which has now been published [1]. [UPDATE: The PDF link at Länsstyrelsen’s page does not seem to work, but leads to another report in Swedish. I have reported this error to the web admin, we’ll see what happens. Once again, the PDF seems to work.] The report aims to identify gaps in the current monitoring system of hazardous substances in the Swedish environment. The report deals with effect based monitoring tools and their usefulness for predicting and/or observing effects of hazardous substances in the environment. The overall conclusion of the report is that there are several gaps in both knowledge and techniques, and a need for developing new resources. However, Sweden still has a good potential to adapt the monitoring system to fill the needs. I have been involved in one of the last chapters, describing the use of metagenomics if study ecosystem function (chapter 30.3). For people with an interest in environmental monitoring, the report is an interesting read in its entirety. For those more interested in applications for metagenomics I recommend turning to page 285 and continue to the end of the report (it’s only five pages on metagenomics, so you’ll manage).
- Länsstyrelsen i Västra Götalands län. (2012). Swedish monitoring of hazardous substances in the aquatic environment (No. 2012:23). (A.-S. Wernersson, Ed.) Current vs required monitoring and potential developments (pp. 1–291). Länsstyrelsen i Västra Götalands län, vattenvårdsenheten.
Pfam team aims at cleaning erroneous protein families
The guys at Pfam recently introduced a new database, called AntiFam, which will provide HMM profiles for some groups of sequences that seemingly formed larger protein families, although they were not actually real proteins. For example, rRNA sequences could contain putative ORFs, that seems to be conserved over broad lineages; with the only problem being that they are not translated into proteins in real life, as they are part of an rRNA [1].
With this initiative the Xfam team wants to “reduce the number of spurious proteins that make their way into the protein sequence databases.” I have run into this problem myself at some occasions with suspicious sequences in GenBank, and I highly encourage this development towards consistency and correctness in sequence databases. It is of extreme importance that databases remain reliable if we want bioinformatics to tell us anything about organismal or community functions. The Antifam database is a first step towards such a cleanup of the databases, and as such I would like to applaud Pfam for taking actions in this direction.
To my knowledge, GenBank are doing what they can with e.g. barcoding data (SSU, LSU, ITS sequences), but for bioinformatics and metagenomics (and even genomics) to remain viable, these initiatives needs to come quickly; and automated (but still very sensitive) tools for this needs to get our focus immediately. For example, Metaxa [2] could be used as a tool to clean up SSU sequences of misclassified origin. More such tools are needed, and a lot of work remains to be done in the area of keeping databases trustworthy in the age of large-scale sequencing.
References
- Tripp, H. J., Hewson, I., Boyarsky, S., Stuart, J. M., & Zehr, J. P. (2011). Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies. Nucleic Acids Research, 39(20), 8792–8802. doi:10.1093/nar/gkr576
- Bengtsson, J., Eriksson, K. M., Hartmann, M., Wang, Z., Shenoy, B. D., Grelet, G.-A., Abarenkov, K., et al. (2011). Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie van Leeuwenhoek, 100(3), 471–475. doi:10.1007/s10482-011-9598-6
GoBiG Introduction Meeting
The newly formed bioinformatics network for PhD students in Gothenburg (GoBiG), will have an introductory meeting next week, on thursday the 26th at Chalmers. See this page for more info.
Blurring the line between cause and effect
Finally I have gotten around to finish my reply to Amy Pruden, who gave me some highly relevant and well-balanced critique of my previous post on antibiotic resistance genes as pollutants, back in early March. Too much came in between, but now I am more or less content with my answer.
First of all I would like to thank Amy for her response to my post on antibiotic resistance genes as pollutants. Her reply is very well thought-through, and her criticism of some of my claims is highly appropriate. For example, I have to agree on that the extracellular DNA pool is vastly uncharacterized, and that my statement on this likely not being a source of resistance transmission is a bit of a stretch. The role of “free-floating” DNA in gene transfer must be further elucidated, and currently we do not really know whether it is important or not; and if so, to what extent it contributes.
However, I still maintain my view that there are problems with considering resistance genes pollutants, mainly because the blurs the line between cause and effect. If we for example consider photosynthetic microbial communities exposed to the photosynthesis inhibitor Irgarol, the communities develop (or acquires) tolerance towards the compound over time (Blanck et al 2009). The tolerance mechanism has been attributed to changes in the psbA gene sequence (Eriksson et al. 2009). If we address this issue from a “resistance-genes-as-pollutants” perspective, would these tolerance-conveying psbA genes be considered pollutants? It would make sense to do so as they are unwanted in weed control circumstances; much like antibiotic resistance genes are unwanted in clinical contexts. It could be argued here that in these microbes such tolerance-associated psbA genes do not cause any harm. But consider for a moment that they did not occur microbes, but in weeds, would they then be considered pollutants? In weeds they would certainly cause (at least economic) harm. Furthermore, say that the tolerance-conveying psbA genes have the ability to spread (which is possible at least in marine settings assisted by phages (Lindell et al 2005)), would that make these tolerance genes pollutants? It is quite of a stretch but as plants can take up genetic material from bacteria (c.f. Clough & Bent 1998, although this is not my area of expertise), there could be a spreading potential to weeds of these tolerance-conveying psbA genes.
What I am trying to say is that if we start viewing antibiotic resistance genes as pollutants per se, instead of looking at the chemicals (likely) causing resistance development, we start blurring the line between cause and effect. Resistance genes in the environment provide resilience to communities (at least to some species – the issue of ecosystem function responses to toxicants is a highly interesting area one as well). However, in this case the resilience itself is the problem, because we think it can spread into human and animal pathogens. But from my point of view, the causes are still use, overuse, misuse and inappropriate release of antibiotics. Therefore, I maintain that we should be careful with pointing out resistance genes by themselves as pollutants – if we do not have very good reasons to do so.
Nevertheless, that does not mean that I think Pruden, and many other prominent authors, are wrong when they refer to resistance genes as pollutants. All I want to point out is that the statement in itself is a bit dangerous, as it might draw attention towards mitigating the effect of pollution, instead of mitigating the source of pollution itself. The persistence of resistance genes in bacterial genomes is alarming (Andersson & Hughes 2011), as it means that removal of selection pressures may have less effect on resistance gene abundance than anticipated. However, the only way I see out of this darkening scenario is to:
- Minimize the selection pressure for resistance genes in the clinical setting
- Immediately reduce environmental release of antibiotics, both from manufacturing and use. This primarily has to be done using better treatment technologies
- Find the routes that enable environmental bacteria to disseminate resistance genes to clinically relevant species and strains – and close them
- Develop antibiotics exploiting new mechanisms to eliminate bacteria
Lastly, I would like to thank Amy for taking my critique seriously – I think we agree on a lot more than we differ on, and I look forward to have this discussion in person at some point. I think we both agree that regardless of our standpoint, the terminology used in this context deserves to be discussed. Nevertheless, the terminology is quite unimportant compared to the values that are at stake – our fundamental ability to treat diseases and perform modern health care.
References
- Andersson, D.I. & Hughes, D., 2011. Persistence of antibiotic resistance in bacterial populations. FEMS Microbiology Reviews, 35(5), pp.901–911.
- Blanck, H., Eriksson, K. M., Grönvall, F., Dahl, B., Guijarro, K. M., Birgersson, G., & Kylin, H. (2009). A retrospective analysis of contamination and periphyton PICT patterns for the antifoulant irgarol 1051, around a small marina on the Swedish west coast. Marine pollution bulletin, 58(2), 230–237. doi:10.1016/j.marpolbul.2008.09.021
- Clough, S. J., & Bent, A. F. (1998). Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. The Plant journal : for cell and molecular biology, 16(6), 735–743.
- Eriksson, K. M., Clarke, A. K., Franzen, L.-G., Kuylenstierna, M., Martinez, K., & Blanck, H. (2009). Community-level analysis of psbA gene sequences and irgarol tolerance in marine periphyton. Applied and Environmental Microbiology, 75(4), 897–906. doi:10.1128/AEM.01830-08
- Lindell, D., Jaffe, J. D., Johnson, Z. I., Church, G. M., & Chisholm, S. W. (2005). Photosynthesis genes in marine viruses yield proteins during host infection. Nature, 438(7064), 86–89. doi:10.1038/nature04111
More on antibiotic resistance genes as pollutants
I received some well-formulated and very much relevant critique on my post Why viewing antibiotic resistance genes as a pollutant is a problem, which I wrote in January. To encourage the debate on this issue, I have asked the author – Amy Pruden – for her permission to republish it here, to give it the visibility it deserves. I intend to follow up on her comments in a forthcoming post, but I have not had time to formulate my answer yet. Until then, please read and contemplate both the original post by me, and Amy’s highly relevant answer below. I hope that we can continue this discussion in the same fruitful manner!
First of all I thank Johan Bengtsson for initiating a lively and much needed discussion on which pollutant we should precisely be targeting, antibiotics or antibiotic resistance genes (ARGs), in our important war against the spread of antibiotic resistance. As Bengtsson correctly alludes, my perspective comes from that of environmental science and engineering. At the core of these disciplines is defining and predicting the fate of pollutants in the environment, as well as designing appropriate means for their control. For these purposes, the definition of the pollutant of interest is of central importance. In general they may be defined as “undesired or harmful constituents within an environmental matrix, usually of human origin.” Pollutants may be classified in all shapes and sizes, including conservative (i.e., not subject to degradation or growth), non-conservative, biotic, abiotic, dissolved, and suspended (i.e., not dissolved). Thus, the first point, regarding the nature by which ARGs are spread disqualifying them from being considered as pollutants, is inaccurate.
At the same time, I recognize and agree that ARGs are indeed a natural and important aspect of the natural ecosystem. I commend recent work revealing the vast “antibiotic-resistome” in ancient environments (D’Costa et al. 2011; Allen et al. 2009), as it provides an essential understanding of the baseline antibiotic resistance in the pre-antibiotic era, which may serve as contrast for observations in the current antibiotic era. Thus, I agree that not all ARGs are pollutants, rather, anthropogenic sources of ARGs are the agents of interest. Perhaps I and others are guilty of not making this distinction more clear. It should also be pointed out that likewise, the vast majority of antibiotics in use today are derived from natural compounds, yet I agree that they can also serve as important environmental pollutants of concern. Thus, it is not necessarily whether the constituent is naturally occurring that defines the pollutant, rather its magnitude and distribution, as influenced by human activities.
It is agreed that viewing ARGs as contaminants does pose technical challenges. They may amplify within a host, or attenuate due to degradation or diminished selection pressure. However, with appropriate understanding of the mechanisms of transport and persistence, accurate models may be developed. I do contend that the jury is still out regarding the relative importance of extracellular and intracellular ARGs. The pool of extracellular DNA remains vastly uncharacterized, and some studies suggest that it is more extensive than previously thought (Wu et al. 2009; Corinaldesi et al. 2005). Other studies have specifically demonstrated the capability of extracellular ARGs to persist under certain environmental conditions and maintain its integrity for host uptake (Cai et al. 2007). While focusing attention on individual resistant strains of bacteria has merit in some instances, this approach is also greatly limited by the unculturability of the vast majority of environmental microbes. As we have now entered the metagenomic era, we now have the tools to tackle the complexity of resistance elements in the environment and precisely define the human influence. Distribution of ARGs may also be considered in parallel with key genetic elements driving their horizontal gene transfer, such as plasmids, transposons, and integrons.
Regarding the antibiotics themselves, clearly they are important. The direct relationship between clinical use and increasing rates of antibiotic resistance is well-documented and certainly continued vigilance in promoting their appropriate use and disposal is called for. What remains much foggier is the exact role of environmental antibiotics in enabling selection once released into the environment. There is good evidence that even sub-inhibitory levels of antibiotics can stimulate various functions in the cell, especially horizontal gene transfer, as reviewed recently by Aminov (2011). However, environmentally-relevant concentrations driving selection of resistant strains are largely unknown. Further, at what point along a discharge pathway from wastewater treatment plant or livestock lagoon do ARGs persist independently of ambient antibiotic conditions? Indeed, some studies have noted correlations between antibiotics and ARGs in environmental matrices while others have noted an absence of such a correlation. In either case, it appears that ARGs persist and are transported further along pathways than antibiotics, suggesting distinct factors governing transport (McKinney et al. 2010; Peak et al. 2007). Research is needed to better understand the mechanisms at play, such as antibiotics other selectors (e.g. metals and other toxins), in leaving a human foot-print on environmental reservoirs of resistance. Nonetheless, a reasonable approach for mitigating risk seems to be focusing attention on developing appropriate technologies for eliminating both antibiotics and genetic material from wastestreams.
Thanks again for opening this discussion- I hope to meet you at a conference sometime in the future!
References
1. Allen, H.K., Moe, L.A., Rodbumrer, J., Gaarder, A., & Handelsman, J., 2009. Functional metagenomics reveals diverse b-lactamases in a remote Alaskan soil. ISME 3, pp. 243-251.
2. Aminov, R.I., 2011. Horizontal gene exchange in environmental microbiota. Front. Microbiol. 2,158 doi:10.3389/fmicb.2011.00158.
3. Corinaldesi, C., Danovaro, R. & Dell‘Anno, A., 2005. Simultaneous recovery of intracellular and extracellular DNA suitable for molecular studies from marine sediments. Appl. Environ. Microbiol. 71, pp. 46-50.
4. D’Costa, V.M., McGrann, K.M., Hughes, D.W., & Wright, G.D., 2006. Sampling the antibiotic resistome. Science 311, pp. 374-377.
5. McKinney, C.W., Loftin, K.A., Meyer, M.T., Davis, J.G., & Pruden, A., 2010. tet and sul antibiotic resistance genes in livestock lagoons of various operation type, configuration, and antibiotic occurrence. Environ. Sci. Technol. 44 (16), pp. 6102-6109.
6. Peak, N., C.W. Knapp; R.K. Yang; M.M. Hanfelt; M.S. Smith, D.S. Aga, & Graham, D. W., 2007. Abundance of six tetracycline resistance genes in wastewater lagoons at cattle feedlots with different antibiotic use strategies. Environ. Microbiol. 9 (1), pp. 143–151.
7. Wu, J. F. & Xi, C. W., 2009. Evaluation of different methods for extracting extracellular DNA from the biofilm matrix. Appl. Environ. Microbiol. 75, pp. 5390-5395.
What is it like, being a bioinformatician 2012?
Michael Barton, Pierre Lindenbaum, and Rob Syme are currently running a survey on what it is like to be a bioinformatician today. The survey has a history since back in 2008, and I think everyone who’s doing bioinformatics should take it. It aims “to understand the field of bioinformatics by surveying the people whom work in it,” which I think is a nice objective for running a survey. It will be interesting to see what comes out of it. Take the survey, and read more about it at: http://bioinfsurvey.org/