Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg | Wisconsin Institute for Discovery

Michael BartonPierre Lindenbaum, and Rob Syme are currently running a survey on what it is like to be a bioinformatician today. The survey has a history since back in 2008, and I think everyone who’s doing bioinformatics should take it. It aims “to understand the field of bioinformatics by surveying the people whom work in it,” which I think is a nice objective for running a survey. It will be interesting to see what comes out of it. Take the survey, and read more about it at: http://bioinfsurvey.org/

It is not uncommon that scientists, especially researchers active within the environmental field, view antibiotic resistance genes (ARGs) as pollutants (e.g. Pruden et al. 2006). While there are practical benefits of doing so, especially when explaining the threat of antibiotic resistance to politicians and the public, this generalization is a little bit problematic from a scientific view. There are several reasons why this view is not as straightforward as one might think.

The first is that ARGs does not spread the same way as pollutants do. ARGs are carried in bacteria. This means that ARGs cannot readily be transferred into, e.g. the human body by themselves. They need to be carried by a bacterial host (ARGs present on free DNA floating around is of course possible, but likely not a major source of ARG transmission into new systems). Therefore, when we find resistance genes in an environment, that is an extremely strong indication of that we also have resistant bacteria. Also, finding ARGs is not necessarily an indication of high levels of antibiotics, as the resistance genes can remain present in the bacterial genome for extended periods of time after exposure (Andersson & Hughes 2011).

The second reason why ARGs should not be viewed as pollutants is that they are not. If anything, the ARGs contribute to the resilience of the ecosystem towards the actual toxicants, which are the antibiotics themselves. Having a resistance gene is an insurance that you will survive antibiotic perturbations. Calling ARGs pollutants just deflects attention from the real problem to nature’s response to our contaminant.

What we have to do is not to try to defeat the resistance itself, but to try to minimize the spread of it. This means that we need to constantly monitor our usage and possible emissions of antibiotics and try to reduce risk environments as much as possible. Emissions from sewage treatment plants (Karthikeyan & Meyer 2006; Lindberg et al. 2007), hospitals (Lindberg et al. 2004), production facilities (Larsson et al. 2007; Fick et al. 2009) and food production (Davis et al. 2011) are obvious starting points, but we need to continuously monitor sources of antibiotic pollutions. Of course, this is only my view of the problem, but I believe that while the problem for our society lies within the resistance genes, the cause lies within the actual pollutants – the antibiotics we use and abuse.

References

  1. Andersson, D.I. & Hughes, D., 2011. Persistence of antibiotic resistance in bacterial populations. FEMS Microbiology Reviews, 35(5), pp.901–911.
  2. Davis, M.F. et al., 2011. An ecological perspective on U.S. industrial poultry production: the role of anthropogenic ecosystems on the emergence of drug-resistant bacteria from agricultural environments. Current Opinion in Microbiology, 14(3), pp.244–250.
  3. Fick, J. et al., 2009. Contamination of surface, ground, and drinking water from pharmaceutical production. Environmental toxicology and chemistry / SETAC, 28(12), pp.2522–2527.
  4. Karthikeyan, K.G. & Meyer, M.T., 2006. Occurrence of antibiotics in wastewater treatment facilities in Wisconsin, USA. The Science of the total environment, 361(1-3), pp.196–207.
  5. Larsson, D.G.J., de Pedro, C. & Paxeus, N., 2007. Effluent from drug manufactures contains extremely high levels of pharmaceuticals. Journal of hazardous materials, 148(3), pp.751–755.
  6. Lindberg, R. et al., 2004. Determination of antibiotic substances in hospital sewage water using solid phase extraction and liquid chromatography/mass spectrometry and group analogue internal standards. Chemosphere, 57(10), pp.1479–1488.
  7. Lindberg, R.H. et al., 2007. Environmental risk assessment of antibiotics in the Swedish environment with emphasis on sewage treatment plants. Water research, 41(3), pp.613–619.
  8. Pruden, A. et al., 2006. Antibiotic resistance genes as emerging contaminants: studies in northern Colorado. Environmental Science & Technology, 40(23), pp.7445–7450.

Merry Christmas

Comments off

20111222-231047.jpg

I just want to wish everybody a merry Christmas and a happy new year, from the sunny town of Stellenbosch in South Africa. I will have the pleasure of spending Christmas here this year. As some holiday reading I have provided a longer peace on metagenomics, and I hope to be able to provide a shorter one on resistance genes as well. Happy holidays!

One thing that I find slightly annoying is when people do not get the basic concepts right – or when debatable concepts are used without discussion of their implications. This further annoys me when it is done by senior scientists, who should know better. Sometimes, I guess this happens out of ignorance, and sometimes to be able to stick your subject to a certain buzzword concept. Neither is good, even though the former reason is little more forgivable then the latter. One area where this problem becomes agonizingly evident is when molecular biologists or medical scientists moves into ecology, as has happened with the advent of metagenomics. When the study of the human gut microflora turned into a large-scale sequencing effort, people who had previously studied bacteria grown on plates started facing a world of community ecology. However, I get the impression that way too often these people do not ask ecologists for advice, or even read up on the ecological literature. Which, I suppose, is the reason why medical scientists can talk about how the human gut microflora can “evolve” into a stable community a couple years after birth, even though words such as “development” or “succession” would be much more accurate to describe this change.

The marker gene flaw

To set what I mean straight, let us compare the human gut to a forest. If an open field is left to itself, larger plants will slowly inhabit it, and gradually different species will replace each other, until we have a fully developed forest. Similarly, the human gut microflora is at birth rather unstable, but stabilizes relatively quickly and within a few years we have a microbial community with “adult-like” characteristics. To arrive at this conclusion, scientists generally use the 16S (small sub-unit) genetic marker to study the bacterial species diversity. This works in pretty much the same way as going out into the forest and count trees of different kinds.

Now, if I went out into the forest once and counted the tree species, waited for 50 years and then did the same thing again, I would presumably see that the forest species composition had changed. However, if I called this “evolution”, fellow scientists would laugh at me. Raspberry bushes do not evolve into birches, and birches do not evolve into firs. Instead, ecologists talk about “succession”; a progressive transformation of a community, going on until a stable community is formed. The concept of succession seems well-suited also to describe what is happening in the human gut, and should of course also be used in that setting. The most likely driver of the functional community changes is not that some bacterial species have evolved new functions, but rather that bacterial species performing these new functions have outcompeted the once previously present.

In fact, I would argue that it is impossible to study evolution through a genetic marker such as the 16S gene (except in the rare case when you study evolution of the 16S gene itself). Instead, the only thing we could assess using a marker gene is how the copy number of the different gene variants change over time (or space, or conditions). The copy number tells us about the species composition of the community at a given time, which can be used to measure successional changes. However, evolutionary changes would require heritable changes in the characteristics of biological populations, i.e. that their genetic material change in some way. Unless that change happens in the marker gene of choice, we cannot measure it, and the alterations of composition we measure will only reflect differences in species abundances. These differences might have arisen from genetic (i.e. evolutionary) changes, but we cannot assess that.

What are we studying with metagenomics?

This brings us to the next problem, which is not only a problem of semantics and me getting annoyed, but a problem with real implications. What are we really studying using metagenomics? When we apply an environmental sequencing approach to a microbial community, we get a snapshot of the genetic material at a given time and site; at specific conditions. Usually, we aim to characterize the community from a taxonomic or functional perspective, and we often have some other community which we want to compare to. However, if we only collect data from different communities at one time point, or if we only study a community before and after exposure, we have no way of telling if differences stem from selective pressures or from more a random succession progress. As most microbial habitats are not as well studied as the human gut, we know little about microbial community assembly and succession.

Also, in ecology a disturbance to a particular community is generally considered as a starting point for a new succession process. This process may, or may not, return the community to the same stable state. However, if the disturbance was of permanent nature, the new community will have to adapt to the new conditions, and the stable state will likely not have the same species distribution. Such an adaption could be caused by genetic changes (which would clearly be an evolutionary process), or by simple replacement of sensitive species with tolerant ones. The latter would be a selective process, but not necessarily an evolutionary one. If the selection does not alter the genetic material within the species, but only the species composition, I would argue that this is also a case of succession.

Complications with resistance

This complicates the work with metagenomic data. If we study antibiotic resistance genes, and say that bacteria in an environment have evolved antibiotic resistance, we base that assertion on that genes responsible for resistance have either evolved within the present bacteria, or have (more likely) been transferred into the genomes of the bacteria via horizontal gene transfer. However, if the resistance profile we see is simply caused by a replacement of sensitive species with resistant ones, we have not really discovered something new evolving, but are only witnessing spread of already resistant bacteria. In the gut, this would be a problem by itself, but say that we do the same study in the open environment. We already know that environmental bacteria have contained resistance genes for ages, so the real threat to human health here would be a spread from naturally resistant bacteria to human pathogens. However, as mentioned earlier, without extremely well thought-through methodology we cannot really see such transmissions of resistance genes. Here, the search for mobile elements, and large-scale takes on community composition vs. resistance profiles in contaminated and non-polluted areas can play a huge role in shedding light on the question of spreading. However, this will require larger and better planned experiments using metagenomics than what is generally performed at the moment. The questions of microbial community assembly, dispersal, succession and adaption are still largely unanswered, and our metagenomic and environmental sequencing approaches have just started to tinker around with the lid of the jar.

I am extremely happy to announce that Metaxa 1.1 (first announced back in July) has finally left the beta stage, and is now designated as a feature complete 1.1 update. We consider this update stable for production use. The 1.1 update utilize hmmsearch instead of hmmscan for higher extraction speeds and better accuracy. This clever trick was inspired by a blog post by HMMER’s creator Sean Eddy on hmmscan vs hmmsearch (http://selab.janelia.org/people/eddys/blog/?p=424). As the speedup comes from the extraction step, the speed increase will be largest for huge data sets with only a small proportion of actual SSU sequences (typically large 454 metagenomes).

What took so long, you might ask, as I promised an imminent release already in August. Well, during testing a difference in scoring was discovered. This difference did not have any implications for long sequences (> ~350 bp), but caused Metaxa to have problems on short reads (most evident on ~150 bp and shorter). Therefore, the scoring system had to be redesigned, which in turn required more extensive testing. Now, however, Metaxa 1.1 has a fine-tuned scoring system, which by default is based on scores instead of E-values, and in some instances have even better detection accuracy than the old Metaxa version. We encourage everyone to try out this new version of Metaxa (although the 1.0.2 version will remain available for download). It should be bug free, but we cannot ensure 100% compatibility in all usage scenarios. Therefore, we are happy if you report any bugs or inconsistencies to the e-mail address: metaxa (at] microbiology [dot) se.

The new version of Metaxa can be downloaded here: http://microbiology.se/software/metaxa/ Please note that the manual has not yet been updated yet, so use the help feature for the up-to-date options. Happy SSU detecting!

I usually try not to be too personal on this site, but to stay more scientific in my tone and scope. However, today’s news that Steve Jobs have passed away inclines me to make a short comment. My dad brought home a Macintosh Plus in 1989. Since then, I have without any breaks been a Mac user. I did my first programming attempts around the age of nine in HyperCard. In short, without the Mac I probably wouldn’t have ended up where I am today. And without Steve, there would have been no Mac. Steve’s vision on ease-of-use computers helped me getting into the programming bandwagon in the early nineties. And even though my computer use has changed a lot since, I still feel hugely indebted to Steve Jobs for bringing the computing revolution into my living room. Thanks for the ride, Steve.

The 11th annual meeting of PhD students and Postdoc researchers in Bioinformatics in Sweden will take place in Lund the 29-30 September. The workshop is an opportunity for young researchers to meet, exchange ideas, and keep up to date with the growing body of knowledge. I will go there, and you should be there too! Besides, it’s free for PhD students and Postdocs! All info can be found at the workshop website. The last time I was there (2010) really fueled some interesting discussions, and I am really looking forward to the event this year. Hope to meet you there, fellow Swedish bioinformaticians!

Metaxa FAQ

1 comment

Finally, the Metaxa FAQ is ready! If you have any other questions, please mail them to metaxa [at] microbiology [dot] se, and I will include them in the FAQ at some later point. I would like to thank anyone who has contributed with questions, suggestions, comments and other types of feedback so far. It really helps improving this software. The FAQ is found here.

You may also wonder what has happened to the stable version of the 1.1 Metaxa speedup I promised in July. It is still on the way, but due to a minor computer failure and other CPU-heavy tasks being of higher priority the software still has not been fully tested. As we want to release a truly stable and functional update, we need to hold back on the package for some more time. Be patient, or try out the beta that is already available.

Phil Goetz at JCVI recently posted his reflections from the Summit of Systems Biology. I was not there, but I read his summary with interest. Now, what strikes me as interesting is the notion that “there were no talks on metagenomics.  This also struck me as odd; bacterial communities seem like a natural systems biology problem.” Having been working with microbial communities for a while, I am surprised that the modeling perspective that is so prevalent in macro-organism ecosystems ecology have not yet really come to fruition in microbial ecology. With the tremendous amounts of sequences that are pouring over us from microbial communities, and with the plethora of functional metagenomics annotation that is made, how come that there has been so little research in the actual interactions between microorganisms within e.g. biofilms?

The problem is also connected to the lack of time-series data from community research. To be able to understand how a system behaves under changing conditions, we need to measure its reactions to various parameter changes over time. Instead of pooling metagenomes to reduce temporal “noise” we need to be better at identifying the changing parameters and then use the temporal differences to look for responses to the parameter changes. By applying a functional metagenomics perspective at each sample point, combining this with measured changes in community species structure (as measured e.g. by 16S or some other marker gene), and correlating this with changes in the parameters, we should be able to build a model of how the ecosystem responds to changing environments. With the large-scale sequencing technologies available today, and the possibilities given by metatranscriptomics, these ideas should be challenging but not impossible.

I am not saying that any of these things have not been done. But it has been done to a surprisingly small extent. I would highly appreciate reading a paper trying to build a mathematical model of how the ecosystem functions in bacterial communities shift in response to an environmental stressor. Because when someone builds such a model we suddenly have a tool to take microbial community research from an explorative perspective to an applied one. The applied perspective will be useful for actually protecting environments and ecosystem services, as well as for understanding how to manipulate microbial ecosystems to maximize the outtake beneficial to society. Also, the understanding the ecosystem dynamics of microbial systems could be carried over to macro-ecosystems and provide a small-scale ecosystem laboratory for all ecosystem research. Such a shift towards applied microbial community systems biology will be more or less necessary to be able to argue for more resources and time being spent on e.g. metagenomics. And I believe that we will soon be there, because the step is shorter than we might imagine.

I’m working on an update to Metaxa that will bring at least double speed to the program (and even more when run on really large data sets on many cores). While there is still no real release version of this update (version 1.1), I have today posted a public “beta”, which you can use for testing purposes. Do not use this version for anything important (e.g. research) as it contains at least one known bug (and maybe even more I haven’t discovered yet). I would appreciate, if you are interested, that you download this version and e-mailed any bugs or inconsistencies found to me (firstname.lastname[at]microbiology.se).

Note that to install this version, you first need to download and install the current version of Metaxa (1.0.2). Then the new version can be used with the old’s databases.

Download the Metaxa 1.1 beta here