Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

Just a short note; Metaxa has been updated to version 1.0.1. This incremental version brings two small new features, and a minimal bug fix.

  • Added the option to select whether HMMER’s heuristic filtering should be used or not. This can be configured using the –heuristics option:
    –heuristics {T or F} : Selects whether to use HMMER’s heuristic filtering, off (F) by default
  • Removed some redundant information written to the screen, as output to the screen was a bit cluttered.

Bug fix:

  • Fixed a rare bug affecting detection sensivity of some SSU sequences.

Of course I would recommend it to every Metaxa user as it fixes a small bug, but the update is not in anyway critical for normal use.  The updated version can be downloaded using this link.

I proudly announce that today Metaxa has been officially released. Metaxa is a a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequence datasets. We have been working on Metaxa for quite some time, and it has now been in beta for about two months. However, it seems to be stable enough for public consumption. In addition, the software package is today presented in a talk at the SocBiN conference in Helsinki.

A more thorough post on the rationale behind Metaxa, and how it works will follow when I am not occupied by the SocBiN conference. A paper on Metaxa is to be published in the journal Antonie van Leeuwenhoek. The  software can be downloaded from here.

For those of you who are not already fed up with my writings on biology stuff on the web site, two opportunities to hear me talk in real life has popped up in May. The first is already on May 2nd, on the Open Day in Life Sciences, arranged by the Science Faculty at the University of Gothenburg. I will talk about the search for detoxification systems in metagenomic sequence data (from a collections point of view, as that is the theme for the day). There will also be an opportunity be guided in the herbarium and the botanical garden, plus having lunch and an optional after-work drink at Botaniska Paviljongen. But hurry, last day of admission is tomorrow! Register here.

The second opportunity will be at the SocBiN-2011 bioinformatics conference in Helsinki, on the 12th of May. I will present in the session called “Bioinformatics of Metagenomics”, and talk about a software tool for rRNA classification. I really look forward to this Bioinformatics conference, there are a number of highly prominent and interesting speakers, and I have heard that Helsinki in May is very beautiful. Besides, I am going there with extremely nice people, adding up to potentially being the best biology venue I will attend this spring.

So, last week I started my Ph.D. in Joakim Larsson’s group at the Sahlgrenska Academy. While I am very happy about how things have evolved, I will also miss the ecotox group and the functional genomics group a lot (though both do their research within 10 minutes walking distance from my new place…) I spent last week getting through the usual administrative hassle; getting keys and cards, signing papers, installing bioinformatics software on my new monster of a computer etc. Slowly, the new room starts to feel like it is mine (after nailing phylogenetic trees, my favorite map of the amino acids, and my remember-why-Cytoscape-visualisation-might-not-be-a-good-idea-for-all-network-like-structures poster to the billboard).

So what will this change of positions mean? Will I quit doing research on microbial communities? Of course not! In my new position, my subject of investigation will be bacterial communities subjected to antibiotics. We will look for resistance genes in such communities, and try to answer questions like: How do a high antibiotic selection pressure affect abundance of resistance genes and mobile elements that could facilitate their transfer between bacteria? Can resistance genes found in environmental bacteria be transferred to the microbes of the human gut? Can the environmental bacteria tell us what resistance genes that will be present in clinical situations in the near future? All these questions could, at least partially, be answered by metagenomic approaches and good bioinformatics tools, and my role will be to come up with the solutions provide answers to them.

I am excited over this new project, which involves my favorite subject – metagenomics and community analysis – as well as important factors, such as the clinical connections, the possibility to add pieces to the antibiotic resistance puzzle, and the role of gene and species transfer in resistance development. I also like the fact that I will need to handle high-throughput  sequence data, meaning that there will be many opportunities to develop tools, a task I highly enjoy. I think the next couple of years will be an exciting time.

Browsing the Pfam web site today, I discovered that the database finally has launched its Wikipedia co-ordination efforts.

This has happened along with the 25th release of the Pfam database (released 1st of April), and basically means that Wikipedia articles will be linked to Pfam families. Gradually, this will (hopefully) improve the annotation of Pfam families, which has in many cases been rather poor. The Xfam blog post related to Pfam release 25 says the change will be happening gradually, which might actually be good thing, given the quirks that might pop up.

(…) a major change is that Pfam annotation is now beginning to be co-ordinated via Wikipedia. Unlike Rfam, where every entry has a Wikipedia entry, we expect this to be a more gradual transition for Pfam, so not all entries currently have a corresponding Wikipedia article. For a more detailed discussion, check the help page.  We actively encourage the addition of new/updated annotations via Wikipedia as they will appear far quicker than waiting for a Pfam release.  If there are articles in Wikipedia that you think correspond to a family, then please mail us!

I have awaited this change for a long time, and is very happy that Pfam has finally taken this step. Congratulations and my sincerest thanks to the Pfam team! Now, let’s go editing!

I have reorganised my Software page a little bit, putting the smaller scripts on a separate page, to make the main software page tidier. The content of the pages is the same, and you still find bloutminer and metaorf on the main software page.

I have put some “new” software online. I have had this piece of code lying around for some time but never got to upload it as I didn’t view it as “finished”. It is still not finished, but I would nevertheless like to share it with a wider audience. So, today I introduce bloutminer – the BLAST output mining script I have been using lately. bloutminer allows you to specify e.g. an E-value cutoff, a length cutoff and a percent identity cutoff, and extract a list of the hits satisfying these cutoffs. It takes table output (blastall option -m 8 ) as input. This is the software I used for the BLAST visualisation I have discussed earlier.

I normally use an E-value cutoff of 10 for my BLAST searches, and then extracts hits with bloutminer, allowing me to change the cutoffs at a later stage without redoing the whole BLAST search. You can also “pool” sequences into groups, based on their sequence tags. bloutminer is work in progress, and may contain nasty bugs. It can be found on the Software page. Please improve it at will.

I will present my master thesis “Metagenomic Analysis of Marine Periphyton Communities”, on Tuesday the 22nd of March, at 13.00. The presentation will take place in the room Folke Andreasson at Medicinaregatan 11 in Gothenburg. The presentation is open for everyone, but the number of seats are limited.

In December, Alex Bateman, whose opinions on open science I support and have touched upon earlier, wrote a short correspondence letter to Nature [1] in which he again repeated the points of his talk at FEBS last summer. He concludes by the paragraph:

Many in the scientific community will admit to using Wikipedia occasionally, yet few have contributed content. For society’s sake, scientists must overcome their reluctance to embrace this resource.

I agree with this statement. However, as I also touched upon earlier, but like to repeat again – bold statements doesn’t make dreams come true – action does. Rfam, and the collaboration with RNA Biology and Wikipedia is a great example of such actions. So what other actions may be necessary to get researchers to contribute to the Wikipedian wisdom?

First of all, I do not think that the main obstacle to get researchers to edit Wikipedia articles is reluctance to doing so because Wikipedia is “inconsistent with traditional academic scholarship”, though that might be a partial explanation. What I think is the major problem is the time-reward tradeoff. Given the focus on publishing peer-reviewed articles, the race for higher impact factor, and the general tendency of measuring science by statistical measures, it should be no surprise that Wikipedia editing is far down on most scientists to-do lists, so also on mine. The reward of editing a Wikipedia article is a good feeling in your stomach that you have benefitted society. Good stomach feelings will, however, feed my children just as little as freedom of speech. Still, both Wikipedia editing and freedom of speech are extremely important, especially as a scientist.

Thus, there is a great need of a system that:

  • Provides a reward or acknowledgement for Wikipedia editing.
  • Makes Wikipedia editing economically sustainable.
  • Encourages publishing of Wikipedia articles, or contributions to existing ones as part of the scientific publishing process.

Such a system could include a “contribution factor” similar to the impact factor, in which contribution of Wikipedia and other open access forums was weighted, with or without a usefulness measure. Such a usefulness measure could easily be determined by links from other Wikipedia articles, or similar. I realise that there would be severe drawbacks of such a system, similar to those of the impact factor system. I am not a huge fan of impact factors (read e.g. Per Seglen’s 1997 BMJ article [2] for  some reasons why), but I do not see that system changing any time soon, and thus some kind of contribution factor could provide an additional statistical measure for evaluators to consider when examining scientists’ work.

While a contribution factor would be an incitement for  researchers to contribute to the common knowledge, it will still not provide an economic value to do so. This could easily be changed by allowing, and maybe even requiring, scientists to contribute to Wikipedia and other public fora of scientific information as part of their science outreach duties. In fact, this public outreach duty (“tredje uppgiften” in Swedish) is governed in Swedish law. In 2009, the universities in Sweden have been assigned to “collaborate with the society and inform about their operations, and act such that scientific results produced at the university benefits society” (my translation). It seems rational that Wikipedia editing would be part of that duty, as that is the place were many (most?) people find information online today. Consequently, it is only up to the universities to demand 30 minutes of Wikipedia editing per week/month from their employees. Note here that I am referring to paid editing.

Another way of increasing the economic appeal of writing Wikipedia articles would be to encourage funding agencies and foundations to demand Wikipedia articles or similar as part of project reports. This would require researchers to make their findings public in order to get further funding, a move that would greatly increase the importance of increasing the common wisdom treasure. However, I suspect that many funding agencies, as well as researchers would be reluctant to such a solution.

Lastly, as shown by the Rfam/RNA Biology/Wikipedia relationship, scientific publishing itself could be tied to Wikipedia editing. This process could be started by e.g. open access journals such as PLoS ONE, either by demanding short Wikipedia notes to get an article published, or by simply provide prioritised publishing of articles which also have an accompanying Wiki-article. As mentioned previously, these short Wikipedia notes would also go through a peer-review process along with the full article. By tying this to the contribution factor, further incitements could be provided to get scientific progress in the hands of the general public.

Now, all these ideas put a huge burden on already hard-working scientists. I realise that they cannot all be introduced simultaneously. Opening up publishing requires time and thought, and should be done in small steps. But doing so is in the interest of scientists, the general public and the funders, as well as politicians. Because in the long run it will be hard to argue that society should pay for science when scientists are reluctant to even provide the public with an understandable version of the results. Instead of digging such a hole for ourselves, we should adapt the reward, evaluation, funding and publishing systems in a way that they benefit both researchers and the society we often say we serve.

  1. Bateman and Logan. Time to underpin Wikipedia wisdom. Nature (2010) vol. 468 (7325) pp. 765
  2. Seglen. Why the impact factor of journals should not be used for evaluating research. BMJ (1997) vol. 314 (7079) pp. 498-502

The Swedish Foundation for Strategic Research (SSF) has made public their grants to the research leaders of the future (link in Swedish), aiming to help and promote young researchers with a lot of potential and ambition to build their own research groups within their fields. 18 persons got 10 million SEK each (roughly 1.5 million USD), and also a leadership education. However, SSF obviously believes that men are superior in building and leading research groups, as 14 of the researchers were men (that’s 78%).

It is often argued that the reason that men get more and larger grants than women [1] is that they are more abundant in academia and that the over-representation of men will solve itself given sufficient time. This makes the SSF decisions particularly saddening. These 18 researchers represent the future of Swedish research, and SSF thinks that the research of the future is better of being led by… men. Alarmingly, the foundation’s statements on gender equality (in Swedish) says that (my translation):

The foundation for strategic research views gender equality as something self-evident, that should permeate not only the operations of the foundation, but also all activities that the foundation supports. Thus, the foundation strives towards that all treatment should be gender neutral, and that the under-represented gender should be given priority when other merits are similar. In an equal nation, research resources of men and women should always be taken advantage of, within all areas.

Still, only 20% of the chosen researchers are women. You may think this is a one-time-only event, but no, no, no, it’s much worse than this. In 2005, six of 18 researchers chosen were women (33%), in 2002 six out of 23 (26%), and 2008 six of 20 (30%). It seems that the SSF regards equality to mean 70% men, 30% women. That’s pretty bad for a foundation says it “views gender equality as something self-evident, that should permeate not only the operations of the foundation, but only all activities that the foundation supports.” Obviously, the words on equality are just words, and women still have a long way to go before treated equally by foundations supporting research.

In the long run, this inequality only cements the established norm with men on the top of the research departments. Wennerås and Wold wrote in 2000 that “junior scientists’ frustration at the pace of their scientific productivity is normal at the beginning of their careers, when they do most of the benchwork by themselves. But female scientists tend to remain at this level their entire working lives” [2]. Maybe it would be a good idea for the directors of the SSF to read this, and think about what their actions actually mean for the future of strategic research, and contemplate why women are leaving academia to a much larger extent than men [3]. Because research funders has a huge responsibility for the future of the scientific community.


  1. Wennerås and Wold. Nepotism and sexism in peer-review. Nature (1997) vol. 387 (6631) pp. 341-3
  2. Wennerås and Wold. A chair of one’s own. Nature (2000) vol. 408 (6813) pp. 647
  3. Handelsman et al. Careers in science. More women in science. Science (2005) vol. 309 (5738) pp. 1190-1