PhD position with Erik Kristiansson (and me)
Erik Kristiansson, who was co-supervisor for my PhD thesis, has an opening for a PhD student funded by the DDLS program. The project is combining bioinformatics and artificial intelligence with a focus on large-scale data analysis to better understand antibiotic resistance and the emergence of novel resistance genes. The research will be centered on DNA sequence analysis, inference in biological networks, and modelling of evolution. The primary applications will be related to antibiotic resistance and bacterial genomics.
I am particularly excited about this position because I will have the benefit of co-supervising the student. The student will also be part of the DDLS research school which is now being launched, which is also super-exciting for Swedish data driven life science.
The candidate is expected to have a degree in bioinformatics, mathematical statistics, mathematics, computer science, physics, molecular biology, or any equivalent topic. Previous experience in analysis of large-scale biological data is desirable. It is important to have good computing and programming skills (e.g. in Python and R), experience with the Linux/UNIX computer environment, and, to the extent possible, previous experience in working with machine learning and/or artificial intelligence.
I had such a good time with Erik as my co-supervisor, and he has put together a truly amazing supervision team with Joakim Larsson, Anna Johnning and myself. I could not imagine a better place to apply bioinformatics and ML/AI on antibiotic resistance! Deadline is June 7! Application link here: https://www.chalmers.se/om-chalmers/arbeta-hos-oss/lediga-tjanster/?rmpage=job&rmjob=12840&rmlang=SE
Published paper: Improving mosquito barcoding
I have had the fortune to be involved in a study on the quality of reference material for mosquito barcoding for biodiversity studies. The study, which was led by Maurício Moraes Zenker at the Universidade Federal de São Carlos in Brazil, looked at the availability of public data for mosquitoes in online databases for two widely used DNA barcoding markers in Culicidae: the COI and ITS2 regions (1). Last week, this study was published in Scientific Reports.
The paper shows that around 30% of known species were covered for the COI gene in BOLD and GenBank, and 12% of species for ITS2 in GenBank. The Afrotropical, Australian and Oriental biogeographic regions had the lowest coverages, while the Nearctic, Palearctic and Oceanian regions had the highest. Countries with a higher diversity of mosquitoes tended to have lower coverage, which was surprisingly also the case for countries with higher numbers of medically important species. At the same time, countries with a higher number of endemic species tended to have a higher species coverage in the databases.
With this study, we would like to advocate for better curatorship of voucher specimens representing sequences in the databases. Also, an integrative taxonomic approach that combines various genetic markers with morphological analyses is important to allow a better use of DNA barcoding and metabarcoding in a diverse array of applications, including vector species detection and biodiversity monitoring.
Importantly, this work underscores how reliant DNA barcoding is on proper taxonomic foundations, including morphological characterisations. Molecular identification of species cannot happen in a vacuum! I would like to extend a big thanks for Maurício who invited me to take part in this study and who have done an excellent job putting it all together!
Reference
- Moraes Zenker M, Pineda Portella T, Costa Pessoa FA, Bengtsson-Palme J, Galetti PM: Low coverage of species constrains the use of DNA barcoding to assess mosquito biodiversity. Scientific Reports, 14, 7432 (2024). doi: 10.1038/s41598-024-58071-1 [Paper link]
Published paper: The latent resistome
What is the latent resistome? This is a term we coin in a new paper published yesterday in Microbiome. In the paper, we distinguish between the small number antibiotic resistance genes (ARGs) that are established, well-characterized, and available in existing resistance gene databases (what we refer to as “established ARGs”). These are typically ARGs encountered in clinical pathogens and are often already causing problems in human and animal infections. The remaining latently present ARGs, which we denote “latent ARGs”, are less or not at all studied, and are therefore much harder to detect (1). These latent ARGs are typically unknown and generally overlooked in most studies of resistance. They are also seldom accounted for in risk assessments of antibiotic resistance (2-4). This means that our view of the resistome and its diversity is incomplete, which hampers our ability to assess risk for promotion and spread of yet undiscovered resistance determinants.
In our new study, we try to alleviate this issue by analyzing more than 10,000 metagenomic samples. We show that the latent ARGs are more abundant and diverse than established ARGs in all studied environments, including the human- and animal-associated microbiomes. The total pan-resistomes, i.e., all ARGs present in an environment (including the latent ARGs), are heavily dominated by these latent ARGs. In contrast, the core resistome (the ARGs that are commonly encountered) comprise both latent and established ARGs.
In the study, we identified several latent ARGs that were shared between environments or that are already present in human pathogens. These are often located on mobile genetic elements that can be transferred between bacteria. Finally, we also show that wastewater microbiomes have surprisingly large pan- and core-resistomes, which makes this environment a potent high-risk environment for mobilization and promotion of latent ARGs, which may make it into pathogens in the future.
It is also interesting to note that this new study echoes the results of my own study from 2018, showing that soil and water environments contain a high diversity of latent ARGs (or ARGs not found in pathogens as I put it in the 2018 study), despite being almost devoid of established ARGs (5).
This project has been a collaboration with Erik Kristiansson’s research group, and particularly with Juan Inda-Diáz. It has been great fun to work with them and I hope that we will keep this collaboration going into the future! The study can be read in its entirety here.
References
- Inda-Díaz JS, Lund D, Parras-Moltó M, Johnning A, Bengtsson-Palme J, Kristiansson E: Latent antibiotic resistance genes are abundant, diverse, and mobile in human, animal, and environmental microbiomes. Microbiome, 11, 44 (2023). doi: 10.1186/s40168-023-01479-0 [Paper link]
- Martinez JL, Coque TM, Baquero F: What is a resistance gene? Ranking risk in resistomes. Nature Reviews Microbiology 2015, 13:116–123. doi:10.1038/nrmicro3399
- Bengtsson-Palme J, Larsson DGJ: Antibiotic resistance genes in the environment: prioritizing risks. Nature Reviews Microbiology, 13, 369 (2015). doi: 10.1038/nrmicro3399-c1
- Bengtsson-Palme J: Assessment and management of risks associated with antibiotic resistance in the environment. In: Roig B, Weiss K, Thoreau V (Eds.) Management of Emerging Public Health Issues and Risks: Multidisciplinary Approaches to the Changing Environment, 243–263. Elsevier, UK (2019). doi: 10.1016/B978-0-12-813290-6.00010-X
- Bengtsson-Palme J: The diversity of uncharacterized antibiotic resistance genes can be predicted from known gene variants – but not always. Microbiome, 6, 125 (2018). doi: 10.1186/s40168-018-0508-2
Welcome Vi and Marcus
I am very happy to share with you that our two doctoral students funded by the Wallenberg DDLS initiative have now started. One of them – Marcus Wenne – is already a well-known figure in the lab, as he has been with us as a master student and then as a bioinformatician for more than a year. The other student – Vi Varga – is a completely new face in the lab and just started yesterday.
Marcus will work in a project on global environmental AMR. He will also continue on his work on large-scale metagenomics to understand community dynamics and antibiotic resistance selection in microbial communities subjected to antibiotics selection. Marcus will work very closely to EMBARK and continue the important work we have done in that project over the next four years.
Vi will study responses of microbial communities to change, with a particular focus on comparative genomics and transcriptional approaches. We will link this to both community stability, pathogenesis and resistance to antibiotics, so this project involves a little bit of everything in terms of the lab’s research interests. Vi’s background is in comparative genomics and pathogenesis, so this seems to be the perfect mix to be able to carry out this project successfully!
Very welcome to the lab Marcus and Vi! We look forward to work with you for the next four years or so!
Published paper: Mumame
I am happy to share the news that the paper describing out software tool Mumame is now out in its final form! (1) The paper got published today in the journal Metabarcoding and Metagenomics after being available as a preprint (2) since last autumn. This version has not changed a whole lot since the preprint, but it is more polished and better argued (thanks to a great review process). The software is virtually the same, but is not also available via Conda.
In the paper, we describe the Mumame software, which can be used to distinguish between wildtype and mutated sequences in shotgun metagenomic sequencing data and quantify their relative abundances. We further demonstrate the utility of the tool by quantifying antibiotic resistance mutations in several publicly available metagenomic data sets (3-6), and find that the tool is useful but that sequencing depth is a key factor to detect rare mutations. Therefore, much larger numbers of sequences may be required for reliable detection of mutations than is needed for most other applications of shotgun metagenomics. Since the preprint was published, Mumame has also found use in our recently published paper on selection for antibiotic resistance in a Croatian macrolide production wastewater treatment plant, unfortunately with inconclusive results (7). Mumame is freely available here.
I again want to stress the fantastic work that Shruthi Magesh did last year as a summer student at WID in the evaluation of this tool. As I have pointed out earlier, I did write the code for the software (with a lot of input from Viktor Jonsson), but Shruthi did the software testing and evaluations. Thanks and congratulations Shruthi, and good luck in pursuing your PhD program!
References
- Magesh S, Jonsson V, Bengtsson-Palme J: Mumame: A software tool for quantifying gene-specific point-mutations in shotgun metagenomic data. Metabarcoding and Metagenomics, 3: 59–67 (2019). doi: 10.3897/mbmg.3.36236
- Magesh S, Jonsson V, Bengtsson-Palme J: Quantifying point-mutations in metagenomic data. bioRxiv, 438572 (2018). doi: 10.1101/438572
- Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014). doi: 10.3389/fmicb.2014.00648
- Lundström S, Östman M, Bengtsson-Palme J, Rutgersson C, Thoudal M, Sircar T, Blanck H, Eriksson KM, Tysklind M, Flach C-F, Larsson DGJ: Minimal selective concentrations of tetracycline in complex aquatic bacterial biofilms. Science of the Total Environment, 553, 587–595 (2016). doi: 10.1016/j.scitotenv.2016.02.103
- Pal C, Bengtsson-Palme J, Kristiansson E, Larsson DGJ: The structure and diversity of human, animal and environmental resistomes. Microbiome, 4, 54 (2016). doi: 10.1186/s40168-016-0199-5
- Kraupner N, Ebmeyer S, Bengtsson-Palme J, Fick J, Kristiansson E, Flach C-F, Larsson DGJ: Selective concentration for ciprofloxacin in Escherichia coli grown in complex aquatic bacterial biofilms. Environment International, 116, 255–268 (2018). doi: 10.1016/j.envint.2018.04.029
- Bengtsson-Palme J, Milakovic M, Švecová H, Ganjto M, Jonsson V, Grabic R, Udiković Kolić N: Pharmaceutical wastewater treatment plant enriches resistance genes and alter the structure of microbial communities. Water Research, 162, 437-445 (2019). doi: 10.1016/j.watres.2019.06.073
Published paper: NGS and antibiotic resistance
AMR Control just released (some of) the articles of their 2019-20 issue, and among the papers hot of the press is one that I have co-authored with Etienne Ruppé, Yannick Charretier and Jacques Schrenzel on how next-generation sequencing can be used to address antibiotic resistance problems (1).
The paper contains a brief overview of next-generation sequencing platforms and tools, the resources that can be used to detect and quantify resistance from sequencing data, and descriptions of applications in clinical genomics, clinical/human metagenomics as well as in environmental settings (the latter being the part where I contributed the most). Compared to much of the writing on antibiotic resistance and sequencing applications, I think this paper is pretty easily accessible to a general audience.
I first met Etienne on the JRC workshops for how next-generation sequencing could be implemented in the EU’s Coordinated Action Plan against Antimicrobial Resistance (2,3), and it seems quite fitting that we now ended up writing a paper on such implementations together.
- Ruppé E, Bengtsson-Palme J, Charretier Y, Schrenzel J: How next-generation sequencing can address the antimicrobial resistance challenge. AMR Control, 2019-20, 60-65 (2019). [Paper link]
- Angers A, Petrillo P, Patak, A, Querci M, Van den Eede G: The Role and Implementation of Next-Generation Sequencing Technologies in the Coordinated Action Plan against Antimicrobial Resistance. JRC Conference and Workshop Report, EUR 28619 (2017). doi: 10.2760/745099 [Link]
- Angers-Loustau A, Petrillo M, Bengtsson-Palme J, Berendonk T, Blais B, Chan KG, Coque TM, Hammer P, Heß S, Kagkli DM, Krumbiegel C, Lanza VF, Madec J-Y, Naas T, O’Grady J, Paracchini V, Rossen JWA, Ruppé E, Vamathevan J, Venturi V, Van den Eede G: The challenges of designing a benchmark strategy for bioinformatics pipelines in the identification of antimicrobial resistance determinants using next generation sequencing technologies. F1000Research, 7, 459 (2018). doi: 10.12688/f1000research.14509.2 [Paper link]
Published paper: benchmarking resistance gene identification
Since F1000Research uses a somewhat different publication scheme than most journals, I still haven’t understood if this paper is formally published after peer review, but I start to assume it is. There have been very little changes since the last version, so hence I will be lazy and basically repost what I wrote in April when the first version (the “preprint”) was posted online. The paper (1) is the result of a workshop arranged by the JRC in Italy in 2017. It describes various challenges arising from the process of designing a benchmark strategy for bioinformatics pipelines in the identification of antimicrobial resistance genes in next generation sequencing data.
The paper discusses issues about the benchmarking datasets used, testing samples, evaluation criteria for the performance of different tools, and how the benchmarking dataset should be created and distributed. Specially, we address the following questions:
- How should a benchmark strategy handle the current and expanding universe of NGS platforms?
- What should be the quality profile (in terms of read length, error rate, etc.) of in silico reference materials?
- Should different sets of reference materials be produced for each platform? In that case, how to ensure no bias is introduced in the process?
- Should in silico reference material be composed of the output of real experiments, or simulated read sets? If a combination is used, what is the optimal ratio?
- How is it possible to ensure that the simulated output has been simulated “correctly”?
- For real experiment datasets, how to avoid the presence of sensitive information?
- Regarding the quality metrics in the benchmark datasets (e.g. error rate, read quality), should these values be fixed for all datasets, or fall within specific ranges? How wide can/should these ranges be?
- How should the benchmark manage the different mechanisms by which bacteria acquire resistance?
- What is the set of resistance genes/mechanisms that need to be included in the benchmark? How should this set be agreed upon?
- Should datasets representing different sample types (e.g. isolated clones, environmental samples) be included in the same benchmark?
- Is a correct representation of different bacterial species (host genomes) important?
- How can the “true” value of the samples, against which the pipelines will be evaluated, be guaranteed?
- What is needed to demonstrate that the original sample has been correctly characterised, in case real experiments are used?
- How should the target performance thresholds (e.g. specificity, sensitivity, accuracy) for the benchmark suite be set?
- What is the impact of these performance thresholds on the required size of the sample set?
- How can the benchmark stay relevant when new resistance mechanisms are regularly characterized?
- How is the continued quality of the benchmark dataset ensured?
- Who should generate the benchmark resource?
- How can the benchmark resource be efficiently shared?
Of course, we have not answered all these questions, but I think we have come down to a decent description of the problems, which we see as an important foundation for solving these issues and implementing the benchmarking standard. Some of these issues were tackled in our review paper from last year on using metagenomics to study resistance genes in microbial communities (2). The paper also somewhat connects to the database curation paper we published in 2016 (3), although this time the strategies deal with the testing datasets rather than the actual databases. The paper is the first outcome of the workshop arranged by the JRC on “Next-generation sequencing technologies and antimicrobial resistance” held October 4-5 2017 in Ispra, Italy. You can find the paper here (it’s open access).
On another note, the new paper describing the UNITE database (4) has now got a formal issue assigned to it, as has the paper on tandem repeat barcoding in fungi published in Molecular Ecology Resources last year (5).
References and notes
- Angers-Loustau A, Petrillo M, Bengtsson-Palme J, Berendonk T, Blais B, Chan KG, Coque TM, Hammer P, Heß S, Kagkli DM, Krumbiegel C, Lanza VF, Madec J-Y, Naas T, O’Grady J, Paracchini V, Rossen JWA, Ruppé E, Vamathevan J, Venturi V, Van den Eede G: The challenges of designing a benchmark strategy for bioinformatics pipelines in the identification of antimicrobial resistance determinants using next generation sequencing technologies. F1000Research, 7, 459 (2018). doi: 10.12688/f1000research.14509.1
- Bengtsson-Palme J, Larsson DGJ, Kristiansson E: Using metagenomics to investigate human and environmental resistomes. Journal of Antimicrobial Chemotherapy, 72, 2690–2703 (2017). doi: 10.1093/jac/dkx199
- Bengtsson-Palme J, Boulund F, Edström R, Feizi A, Johnning A, Jonsson VA, Karlsson FH, Pal C, Pereira MB, Rehammar A, Sánchez J, Sanli K, Thorell K: Strategies to improve usability and preserve accuracy in biological sequence databases. Proteomics, 16, 18, 2454–2460 (2016). doi: 10.1002/pmic.201600034
- Nilsson RH, Larsson K-H, Taylor AFS, Bengtsson-Palme J, Jeppesen TS, Schigel D, Kennedy P, Picard K, Glöckner FO, Tedersoo L, Saar I, Kõljalg U, Abarenkov K: The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Research, 47, D1, D259–D264 (2019). doi: 10.1093/nar/gky1022
- Wurzbacher C, Larsson E, Bengtsson-Palme J, Van den Wyngaert S, Svantesson S, Kristiansson E, Kagami M, Nilsson RH: Introducing ribosomal tandem repeat barcoding for fungi. Molecular Ecology Resources, 19, 1, 118–127 (2019). doi: 10.1111/1755-0998.12944
Mumame – Quantifying mutations in metagenomes
Let me get straight to something somewhat besides the point here: summer students can achieve amazing things! One such student I had the pleasure to work with this summer is Shruthi Magesh, and a preprint based on work she did with me at the Wisconsin Institute for Discovery this summer just got published on bioRxiv (1). The preprint describes a software tool called Mumame, which uses database information on mutations in DNA or protein sequences to search metagenomic datasets and quantifies the relative proportion of resistance mutations over wild type sequences.
In the preprint (1), we first of all show that Mumame works on amplicon data where we already knew the true outcome (2). Second, we show that we can detect differences in mutation frequencies in controlled experiments (2,3). Lastly, we use the tool to gain some further information about resistance patterns in sediments from polluted environments in India (4,5). Together these analyses show that one of the most central aspects for Mumame to be able to find mutations is having a very high number of sequenced reads in all libraries (preferably more than 50 million per library), because these mutations are generally rare – even in polluted environments and microcosms exposed to antibiotics. We expect Mumame to be a useful addition to metagenomic studies of e.g. antibiotic resistance, and to increase the detail by which metagenomes can be screened for phenotypically important differences.
While I did write the code for the software (with a lot of input from Viktor Jonsson, who also is a coauthor on the preprint, on the statistical analysis), Shruthi did the software testing and evaluations, and the paper would not have been possible hadn’t she wanted a bioinformatic summer project related to metagenomics, aside from her laboratory work. The resulting preprint is available from bioRxiv and the Mumame software is freely available from this site.
References
- Magesh S, Jonsson V, Bengtsson-Palme J: Quantifying point-mutations in metagenomic data. bioRxiv, 438572 (2018). doi: 10.1101/438572 [Link]
- Kraupner N, Ebmeyer S, Bengtsson-Palme J, Fick J, Kristiansson E, Flach C-F, Larsson DGJ: Selective concentration for ciprofloxacin in Escherichia coli grown in complex aquatic bacterial biofilms. Environment International, 116, 255–268 (2018). doi: 10.1016/j.envint.2018.04.029 [Paper link]
- Lundström S, Östman M, Bengtsson-Palme J, Rutgersson C, Thoudal M, Sircar T, Blanck H, Eriksson KM, Tysklind M, Flach C-F, Larsson DGJ: Minimal selective concentrations of tetracycline in complex aquatic bacterial biofilms. Science of the Total Environment, 553, 587–595 (2016). doi: 10.1016/j.scitotenv.2016.02.103 [Paper link]
- Bengtsson-Palme J, Boulund F, Fick J, Kristiansson E, Larsson DGJ: Shotgun metagenomics reveals a wide array of antibiotic resistance genes and mobile elements in a polluted lake in India. Frontiers in Microbiology, 5, 648 (2014). doi: 10.3389/fmicb.2014.00648 [Paper link]
- Kristiansson E, Fick J, Janzon A, Grabic R, Rutgersson C, Weijdegård B, Söderström H, Larsson DGJ: Pyrosequencing of antibiotic-contaminated river sediments reveals high levels of resistance and gene transfer elements. PLoS ONE, Volume 6, e17038 (2011). doi:10.1371/journal.pone.0017038.
Published paper: Breast milk and the infant gut resistome
This week, a paper by my former roommate Katariina Pärnänen was published by Nature Communications. In the paper (1), we use shotgun metagenomics to show that infants carry more resistant bacteria in their gut than adults do, irrespective of whether they themselves have been treated with antibiotics or not. We also found that the antibiotic resistance gene and mobile genetic element profiles of infant feces are more similar to those of their own mothers than to those of unrelated mothers. This is suggestive of a pathway of transmission of resistance genes from the mothers, and importantly we find that the mobile genetic elements in breastmilk are shared with those of the infant feces, despite vast differences in their microbiota composition. Finally, we find that termination of breastfeeding and intrapartum antibiotic prophylaxis of mothers are associated with higher abundances of specific ARGs in the infant gut. Our results suggest that infants inherit the legacy of past antibiotic consumption of their mothers via transmission of genes, but that the taxonomic composition of the microbiota still strongly dictates the overall load of resistance genes.
I am not going to dwell in to details of the study here, but I instead encourage you to read the paper (hey, it’s open access!) or the excellent popular summary that Katariina has already written. Finally, I want to emphasize the great work Katariina has put into this (I would know, since I shared room with her) and congratulate her on her own little infant!
Reference
- Pärnänen K, Karkman A, Hultman J, Lyra C, Bengtsson-Palme J, Larsson DGJ, Rautava S, Isolauri E, Salminen S, Kumar H, Satokari R, Virta M: Maternal gut and breast milk microbiota affect infant gut antibiotic resistome and mobile genetic elements. Nature Communications, 9, 3891 (2018). doi: 10.1038/s41467-018-06393-w [Paper link]
Published paper: Ribosomal tandem repeat barcoding for fungi
On Friday, Molecular Ecology Resources put online Christian Wurzbacher‘s latest paper, of which I am also a coauthor. The paper presents three sets of general primers that allow for amplification of the complete ribosomal operon from the ribosomal tandem repeats, covering all the ribosomal markers (ETS, SSU, ITS1, 5.8S, ITS2, LSU, and IGS) (1). This paper is important because it introduces a technique to utilize third generation sequencing (PacBio and Nanopore) to generate high‐quality reference data (equivalent or better than Sanger sequencing) in a high‐throughput manner. The paper shows that the quality of the Nanopore generated sequences was 99.85%, which is comparable with the 99.78% accuracy described for Sanger sequencing.
My main contribution to this paper is the consensus sequence generation script – Consension – which is available from my software page. Importantly, there are huge gaps in the reference databases we use for taxonomic classification and this method will facilitate the integration of reference data from all of the ribosomal markers. We hope that this work will stimulate large-scale generation of ribosomal reference data covering several marker genes, linking previously spread-out information together.
Reference
- Wurzbacher C, Larsson E, Bengtsson-Palme J, Van den Wyngaert S, Svantesson S, Kristiansson E, Kagami M, Nilsson RH: Introducing ribosomal tandem repeat barcoding for fungi. Molecular Ecology Resources, Accepted article (2018). doi: 10.1111/1755-0998.12944 [Paper link]