Microbiology, Metagenomics and Bioinformatics

Johan Bengtsson-Palme, University of Gothenburg

Browsing Posts tagged Illumina

As the 8th Next Generation Sequencing Congress in London is drawing to a close as I write this, I have a few reflections that might warrant sharing. The first thing that has been apparent this year compared to the two previous times I have visited the event (in 2012 and 2013) is that there was very little talk about where Illumina sequencing is heading next. Instead the discussion was about the applications of Illumina sequencing in the clinical setting; so apparently this is now so mainstream that we only expect slow progress towards longer reads. Apart from that, Illumina is a completed, mature technology. Instead, the flashlight is now pointing entirely towards long-read sequencing (PacBio, NanoPore) as the next big thing. However, the excitement around these technologies has also sort of faded compared to in 2013 when they were soon-to-arrive. Indeed, it seems like there’s not much to be excited about in the sequencing field at the moment, or at least Oxford Global (who are hosting the conference) has failed to get these technologies here.

What also strikes me is the vast amounts of talk about RNAseq of cancer cells. The scope of this event has narrowed dramatically in the past three years. Which makes me substantially less interested in returning next year. If there is not much to be excited about, and the focus is only on cancer sequencing – despite the human microbiota being a very hot topic at the moment – what is the reason for non-cancer researchers to come to the event? There will need to be a stark shift towards another direction of this event if the arrangers want it to remain a broad NGS event. Otherwise, they may just as well go all in and rename the event the Next Generation Sequencing of Cancer Congress. But I hope they choose to widen the scope again; conferences discussing technology as a foundation for a variety of applications are important meeting points and spawning grounds for novel ideas.

Late yesterday, Microbiome put online our most recent work, covering the resistomes to antibiotics, biocides and metals across a vast range of environments. In the paper (1), we perform the largest characterization of resistance genes, mobile genetic elements (MGEs) and bacterial taxonomic compositions to date, covering 864 different metagenomes from humans (350), animals (145) and external environments such as soil, water, sewage, and air (369 in total). All the investigated metagenomes were sequenced to at least 10 million reads each, using Illumina technology, making the results more comparable across environments than in previous studies (2-4).

We found that the environment types had clear differences both in terms of resistance profiles and bacterial community composition. Humans and animals hosted microbial communities with limited taxonomic diversity as well as low abundance and diversity of biocide/metal resistance genes and MGEs. On the contrary, the abundance of ARGs was relatively high in humans and animals. External environments, on the other hand, showed high taxonomic diversity and high diversity of biocide/metal resistance genes and MGEs. Water, sediment and soil generally carried low relative abundance and few varieties of known ARGs, whereas wastewater and sludge were on par with the human gut. The environments with the largest relative abundance and diversity of ARGs, including genes encoding resistance to last resort antibiotics, were those subjected to industrial antibiotic pollution and air samples from a Beijing smog event.

A paper investigating this vast amount of data is of course hard to describe in a blog post, so I strongly suggest the interested reader to head over to Microbiome’s page and read the full paper (1). However, here’s a ver short summary of the findings:

  • The median relative abundance of ARGs across all environments was 0.035 copies per bacterial 16S rRNA
  • Antibiotic-polluted environments have (by far) the highest abundances of ARGs
  • Urban air samples carried high abundance and diversity of ARGs
  • Human microbiota has high abundance and diversity of known ARGs, but low taxonomic diversity compared to the external environment
  • The human and animal resistomes are dominated by tetracycline resistance genes
  • Over half of the ARGs were only detected in external environments, while 20.5 % were found in human, animal and at least one of the external environments
  • Biocide and metal resistance genes are more common in external environments than in the human microbiota
  • Human microbiota carries low abundance and richness of MGEs compared to most external environments

Importantly, less than 1.5 % of all detected ARGs were found exclusively in the human microbiome. At the same time, 57.5 % of the known ARGs were only detected in metagenomes from environmental samples, despite that the majority of the investigated ARGs were initially encountered in pathogens. Still, our analysis suggests that most of these genes are relatively rare in the human microbiota. Environmental samples generally contained a wider distribution of resistance genes to a more diverse set of antibiotics classes. For example, the relative abundance of beta-lactam resistance genes was much larger in external environments than in human and animal microbiomes. This suggests that the external environment harbours many more varieties of resistance genes than the ones currently known from the clinic. Indeed, functional metagenomics has resulted in the discovery of many novel ARGs in external environments (e.g. 5). This all fits well with an overall much higher taxonomic diversity of environmental microbial communities. In terms of consequences associated with the potential transfer of ARGs to human pathogens, we argue that unknown resistance genes are of greater concern than those already known to circulate among human-associated bacteria (6).

This study describes the potential for many external environments, including those subjected to pharmaceutical pollution, air and wastewater/sludge, to serve as hotspots for resistance development and/or transmission of ARGs. In addition, our results indicate that these environments may play important roles in the mobilization of yet unknown ARGs and their further transmission to human pathogens. To provide guidance for risk-reducing actions we – based on this study – suggest strict regulatory measures of waste discharges from pharmaceutical industries and encourage more attention to air in the transmission of antibiotic resistance (1).


  1. Pal C, Bengtsson-Palme J, Kristiansson E, Larsson DGJ: The structure and diversity of human, animal and environmental resistomes. Microbiome, 4, 54 (2016). doi: 10.1186/s40168-016-0199-5
  2. Durso LM, Miller DN, Wienhold BJ. Distribution and quantification of antibiotic resistant genes and bacteria across agricultural and non-agricultural metagenomes. PLoS One. 2012;7:e48325.
  3. Nesme J, Delmont TO, Monier J, Vogel TM. Large-scale metagenomic-based study of antibiotic resistance in the environment. Curr Biol. 2014;24:1096–100.
  4. Fitzpatrick D, Walsh F. Antibiotic resistance genes across a wide variety of metagenomes. FEMS Microbiol Ecol. 2016. doi:10.1093/femsec/fiv168.
  5. Allen HK, Moe LA, Rodbumrer J, Gaarder A, Handelsman J. Functional metagenomics reveals diverse β-lactamases in a remote Alaskan soil. ISME J. 2009;3:243–51.
  6. Bengtsson-Palme J, Larsson DGJ: Antibiotic resistance genes in the environment: prioritizing risks. Nature Reviews Microbiology, 13, 369 (2015). doi: 10.1038/nrmicro3399-c1

I have had the pleasure to be chosen as a speaker for next week’s (ten days from now) Swedish Bioinformatics Workshop. My talk is entitled “Turn up the signal – wipe out the noise: Gaining insights into bacterial community functions using metagenomic data“, and will largely deal with the same questions as my talk on EDAR3 in May this year. As then, the talk will highlight the some particular pitfalls related to interpretation of data, and exemplify how flawed analysis practices can result in misleading conclusions regarding community function, and use examples from our studies of environments subjected to pharmaceutical pollution in India, the effect of travel on the human resistome, and modern municipal wastewater treatment processes.

The talk will take place on Thursday, September 24, 2015 at 16:30. The full program for the conference can be found here. And also, if you want a sneak peak of the talk, you can drop by on Friday 13.00 at Chemistry and Molecular Biology, where I will give a seminar on the same topic in the Monthly Bioinformatic Practical Meetings series.

I will be giving a talk at the Third International symposium on the environmental dimension of antibiotic resistance (EDAR2015) next month (five weeks from now. The talk is entitled “Turn up the signal – wipe out the noise: Gaining insights into antibiotic resistance of bacterial communities using metagenomic data“, and will deal with handling of metagenomic data in antibiotic resistance gene research. The talk will highlight the some particular pitfalls related to interpretation of data, and exemplify how flawed analysis practices can result in misleading conclusions regarding antibiotic resistance risks. I will particularly address how taxonomic composition influences the frequencies of resistance genes, the importance of knowledge of the functions of the genes in the databases used, and how normalization strategies influence the results. Furthermore, we will show how the context of resistance genes can allow inference of their potential to spread to human pathogens from environmental or commensal bacteria. All these aspects will be exemplified by data from our studies of environments subjected to pharmaceutical pollution in India, the effect of travel on the human resistome, and modern municipal wastewater treatment processes.

The talk will take place on Monday, May 18, 2015 at 13:20. The full scientific program for the conference can be found here. Registration for the conference is still possible, although not for the early-bird price. I look forward to see a lot of the people who will attend the conference, and hopefully also you!

The first work in which I have employed metagenomics to investigate antibiotic resistance has been accepted in Frontiers in Microbiology, and is (at the time of writing) available as a provisional PDF. In the paper (1), which is co-authored by Fredrik Boulund, Jerker Fick, Erik Kristiansson and Joakim Larsson, we have used shotgun metagenomic sequencing of an Indian lake polluted by dumping of waste from pharmaceutical production. We used this data to describe the diversity of antibiotic resistance genes and the genetic context of those, to try to predict their genetic transferability. We found resistance genes against essentially every major class of antibiotics, as well as large abundances of genes responsible for mobilization of genetic material. Resistance genes were estimated to be 7000 times more abundant in the polluted lake than in a Swedish lake included for comparison, where only eight resistance genes were found. The abundances of resistance genes have previously only been matched by river sediment subject to pollution from pharmaceutical production (2). In addition, we describe twenty-six known and twenty-one putative novel plasmids from the Indian lake metagenome, indicating that there is a large potential for horizontal gene transfer through conjugation. Based on the wide range and high abundance of known resistance factors detected, we believe that it is plausible that novel resistance genes are also present in the lake. We conclude that environments polluted with waste from antibiotic manufacturing could be important reservoirs for mobile antibiotic resistance genes. This work further highlights previous findings that pharmaceutical production settings could provide sufficient selection pressure from antibiotics (3) to drive the development of multi-resistant bacteria (4,5), resistance which may ultimately end up in pathogenic species (6,7). The paper can be read in its entirety here.


  1. Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, Volume 5, Issue 648 (2014). doi: 10.3389/fmicb.2014.00648
  2. Kristiansson E, Fick J, Janzon A, Grabic R, Rutgersson C, Weijdegård B, Söderström H, Larsson DGJ: Pyrosequencing of antibiotic-contaminated river sediments reveals high levels of resistance and gene transfer elements. PLoS ONE, Volume 6, e17038 (2011). doi:10.1371/journal.pone.0017038.
  3. Larsson DGJ, de Pedro C, Paxeus N: Effluent from drug manufactures contains extremely high levels of pharmaceuticals. J Hazard Mater, Volume 148, 751–755 (2007). doi:10.1016/j.jhazmat.2007.07.008
  4. Marathe NP, Regina VR, Walujkar SA, Charan SS, Moore ERB, Larsson DGJ, Shouche YS: A Treatment Plant Receiving Waste Water from Multiple Bulk Drug Manufacturers Is a Reservoir for Highly Multi-Drug Resistant Integron-Bearing Bacteria. PLoS ONE, Volume 8, e77310 (2013). doi:10.1371/journal.pone.0077310
  5. Johnning A, Moore ERB, Svensson-Stadler L, Shouche YS, Larsson DGJ, Kristiansson E: Acquired genetic mechanisms of a multiresistant bacterium isolated from a treatment plant receiving wastewater from antibiotic production. Appl Environ Microbiol, Volume 79, 7256–7263 (2013). doi:10.1128/AEM.02141-13
  6. Pruden A, Larsson DGJ, Amézquita A, Collignon P, Brandt KK, Graham DW, Lazorchak JM, Suzuki S, Silley P, Snape JR., et al.: Management options for reducing the release of antibiotics and antibiotic resistance genes to the environment. Environ Health Perspect, Volume 121, 878–885 (2013). doi:10.1289/ehp.1206446
  7. Finley RL, Collignon P, Larsson DGJ, McEwen SA, Li X-Z, Gaze WH, Reid-Smith R, Timinouni M, Graham DW, Topp E: The scourge of antibiotic resistance: the important role of the environment. Clin Infect Dis, Volume 57, 704–710 (2013). doi:10.1093/cid/cit355

It’s been a while since the PETKit got any attention from me. Partially, that has been due to a nasty bug that could produce no output for one of the read files in Pefcon when using FASTA input files, but mostly it has simply been due to lack of time to continue development on the package. Now, I have finally put all threads together (bug fixes, new features, documentation) and today the 1.1 version is released! The new features are:

  • A new tool has been added – peacat – that can be used to e.g. stitch contigs together that have been separated for one reason or another in an assembly
  • Another tool – pemap – has been added that can be used to determine whether an assembled contig is from a circular DNA element
  • The default offset value for FASTQ files has been set to 33 (as in Sanger and Illumina 1.8+ PHRED format)
  • The documentation has been vastly improved (but is still rather inferior)

Those attending the Metagenomics lab (part of the basic NGS course for PhD students given at GU this week), can find the material for the lab on this page:
Of course, the page is open for anyone else as well, although you won’t get the support that the GU students are given.

Some good and some bad news regarding the PETKit. Good news first; I have written a fourth tool for the PETKit, which is included in the latest release (version 1.0.2b, download here). The new tool is called Pesort, and sorts input read pairs (or single reads) so that the read pairs occur in the same order. It also sorts out which reads that don’t have a pair and outputs them to a separate file. All this is useful if you for some reason have ended up with a scrambled read file (pair). This can e.g. happen if you want to further process the reads after running Khmer or investigate the reads remaining after mapping to a genome.

Then the bad news. There’s a critical bug in PETKit version 1.0.1b. This bug manifest itself when using custom offsets for quality scores (using the –offset option), and makes the Pearf and Pepp tools too strict – leading to that they discard reads that actually are of good quality. This does not affect the Pefcon program. If you use the PETKit for read filtering or ORF prediction, and have used custom offset values, I recommend that you re-run your data with the newly released PETKit version (1.0.2b), in which this bug has been fixed. If you have only used the default offset setting, your safe. I sincerely apologize for any inconveniences that this might have caused.

You know the feeling when your assembler supports paired-end sequences, but your FASTQ quality filterer doesn’t care about what pairs that belong together? Meaning that you end up with a mess of sequences that you have to script together in some way. Gosh, that feeling is way too common. It is for situations like that I have put together the Paired-End ToolKit (PETKit), a collection of FASTQ/FASTA sequence handling programs written in Perl. Currently the toolkit contains three command-line tools that does sequence conversion, quality filtering, and ORF prediction, all adapted for paired-end sequences specifically. You can read more about the programs, which are released as open source software, on the PETKit page. At the moment they lack proper documentation, but running the software with the “–help” option should bring up a useful set of options for each tool. This is still considered beta-software, so any bug reports, and especially suggestions, are welcome.

Also, if you have an idea of another problem that is unsolved or badly executed for paired-end sequences, let me know, and I will see if I can implement it in PETKit.

For the last months I have been (part time) struggling with getting Metaxa to eat Illumina paired-end data. This is a pretty tricky task, mainly due to the fact that Illumina reads are so much shorter than those obtained by Sanger and 454 sequencing. Therefore, I am more than happy to inform the community that today (the day before I go on vacation) I have a working prototype up and running. In fact, calling it a prototype is unfair, it is a quite far gone piece of software by now. Currently, I am running it on test data sets, and I will try to keep it running over the next couple of weeks. Thereafter, I hope to be able to release it sometime this autumn (but don’t expect a September release!), harnessing the power of Illumina sequencing for SSU identification. Stayed tuned, and have a great summer!