ITSx updated to version 1.0.9

I and one of the other developers of ITSx had a discussion a while ago about that using the --anchor option should output the “anchor sequences” around the ITS regions also for the full-length output file (given that the --truncate option is activated). I have today changed ITSx to employ this behaviour, updating it to version 1.0.9. The update also improves sensitivity when using the --anchor HMM option slightly, and can be downloaded here. Happy barcoding!

Swedish Bioinformatics Workshop 2014

I am part of the organizing committee for the Swedish Bioinformatics Workshop (#SBW2014) that will be held October 23-24 this year in Gothenburg. I would like to invite you all, especially master/PhD students and PostDocs in Sweden, to come and share the event with us!

SBW is an annual event that has been organized by the different universities in Sweden. This year it will take place at the Wallenberg Conference Centre in Gothenburg and is arranged by both University of Gothenburg and Chalmers University of Technology. SBW2014 will, as the tradition abides, be a meeting point for PhD students and postdocs working with any kind of bioinformatics within Sweden and is therefore free of charge for these groups. We are proud to announce a program including both invited speakers – such as Mick Watson from the Roslin institute, Dawn Field from University of Oxford, and Joakim Lundeberg from KTH – along with participant presentations and poster sessions. This year, the program will also contain a number of workshop sessions where hands-on problems will be used as starting points for discussions on new bioinformatics approaches to these problems. This will provide opportunities for attendees with different methodological backgrounds to interact and work together to find synergies between fields and come up with creative solutions.

More information about the event including registration and abstract submission can be found at www.sbw2014.se.

I, and the rest of the organizers, look forward to meeting you in Gothenburg in October!

Webpage: http://www.sbw2014.se

Facebook: https://www.facebook.com/events/1450513325188910/

Google+: https://plus.google.com/events/cuhlpovcc275stut854dk5ussnk

If you want, you can spread the word, for example using this flyer!

Scientific Data – a way of getting credit for data

In an interesting development, Nature Publishing Group has launched a new initiative: Scientific Data – a online-only open access journal that publishes data sets without the demand of testing scientific hypotheses in connection to the data. That is, the data itself is seen as the valuable product, not any findings that might result from it. There is an immediate upside of this; large scientific data sets might be accessible to the research community in a way that enables proper credit for the sample collection effort. Since there is no demand for a full analysis of the data, the data itself might quicker be of use to others, without worrying that someone else might steal the bang of the data per se. I also see a possible downside, though. It would be easy to hold on to the data until you have analyzed it yourself, and then release it separately just about when you submit the paper on the analysis, generating extra papers and citation counts. I don’t know if this is necessarily bad, but it seems it could contribute to “publishing unit dilution”. Nevertheless, I believe that this is overall a good initiative, although how well it actually works will be up to us – the scientific community. Some info copied from the journal website:

Scientific Data’s main article-type is the Data Descriptor: peer-reviewed, scientific publications that provide an in-depth look at research datasets. Data Descriptors are a combination of traditional scientific publication content and structured information curated in-house, and are designed to maximize reuse and enable searching, linking and data mining. (…) Scientific Data aims to address the increasing need to make research data more available, citable, discoverable, interpretable, reusable and reproducible. We understand that wider data-sharing requires credit mechanisms that reward scientists for releasing their data, and peer evaluation mechanisms that account for data quality and ensure alignment with community standards.

New ITSx update – added feature plus bug fix

ITSx has today been updated, bringing it to version 1.0.8. This update adds the “--only_full” option, which restricts output in the ITS1, 5.8S and ITS2 files to only the files that contain the full region, i.e. that both surrounding domains have been detected. The update also fixes a bug with the --anchor option, and can be downloaded here. Happy barcoding!

Published paper: Distributed annotation of plant pathogenic fungi

Another paper I have co-authored related to the UNITE database for fungal rDNA ITS sequences is now published as an Online Early article in Fungal Diversity. The paper describes an effort to improve the annotation of ITS sequences from fungal plant pathogens. Why is this important? Well, apart from fungal plant pathogens being responsible for great economic losses in agriculture, the paper is also conceptually important as it shows that together we can accomplish a substantial improvement to the metadata in sequence databases. In this work we have hunted down high-quality reference sequences for various plant pathogenic fungi, and re-annotated incorrectly or insufficiently annotated ITS sequences from the same fungal lineages. In total, the 59 authors have made 31,954 changes to UNITE database data, on average 540 changes per author. While one, or a few, persons could not feasibly have made this effort alone, this work shows that in larger consortia vast improvements can be made to the quality of databases, by distributing the work among many scientists. In many ways, this relates to proposals to “wikify” GenBank, and after Rfam and Pfam it might now be time to take the user-contribution model to, at least, the RefSeq portion of GenBank, which despite its description as being “comprehensive, integrated, non-redundant, [and] well-annotated” still contains errors and examples of non-usable annotation. More on that at a later point…

Paper reference:

Nilsson RH, Hyde KD, Pawlowska J, Ryberg M, Tedersoo L, Aas AB, Alias SA, Alves A, Anderson CL, Antonelli A, Arnold AE, Bahnmann B, Bahram M, Bengtsson-Palme J, Berlin A, Branco S, Chomnunti P, Dissanayake A, Drenkhan R, Friberg H, Frøslev TG, Halwachs B, Hartmann M, Henricot B, Jayawardena R, Jumpponen A, Kauserud H, Koskela S, Kulik T, Liimatainen K, Lindahl B, Lindner D, Liu J-K, Maharachchikumbura S, Manamgoda D, Martinsson S, Neves MA, Niskanen T, Nylinder S, Pereira OL, Pinho DB, Porter TM, Queloz V, Riit T, Sanchez-García M, de Sousa F, Stefaczyk E, Tadych M, Takamatsu S, Tian Q, Udayanga D, Unterseher M, Wang Z, Wikee S, Yan J, Larsson E, Larsson K-H, Kõljalg U, Abarenkov K: Improving ITS sequence data for identification of plant pathogenic fungi. Fungal Diversity Online early (2014). doi: 10.1007/s13225-014-0291-8 [Paper link]

EU report on effect-based tools for ecotoxicology

Because of my previous involvement in a Swedish report on toxicological monitoring using (meta)-genomics tools [1], I also became in a related EU report on effect-based tools for use in toxicology in the aquatic environment. This report has recently been officially published [2], and can be found here, with the annex available on the European Commission document website. My contribution to this report has been in the genomics and metagenomics section (Chapter 7: OMICS techniques), in which I wrote the metagenomics part and contributed to the rest. I personally think this is a quite forward-thinking report, which is nice for a large institution such as the EU.

  1. Länsstyrelsen i Västra Götalands län. (2012). Swedish monitoring of hazardous substances in the aquatic environment (No. 2012:23). (A.-S. Wernersson, Ed.) Current vs required monitoring and potential developments (pp. 1–291). Länsstyrelsen i Västra Götalands län, vattenvårdsenheten.
  2. Wernersson A-S, Carere M, Maggi C, Tusil P, Soldan P, James A, Sanchez W, Broeg K, Kammann U, Reifferscheid G, Buchinger S, Maas H, Van Der Grinten E, Ausili A, Manfra L, Marziali L, Polesello S, Lacchetti I, Mancini L, Lilja K, Linderoth M, Lundeberg T, Fjällborg B, Porsbring T, Larsson DGJ, Bengtsson-Palme J, Förlin L, Kase R, Kienle C, Kunz P, Vermeirssen E, Werner I, Robinson CD, Lyons B, Katsiadaki I, Whalley C, den Haan K, Messiaen M, Clayton H, Lettieri T, Negrão Carvalho R, Gawlik BM, Dulio V, Hollert H, Di Paolo C, Brack W (2014). Technical Report on Aquatic Effect-Based Monitoring Tools. European Commission. Technical Report 2014-077, Office for Official Publications of European Communities, ISBN: 978-92-79-35787-9. doi:10.2779/7260

ITSx updated to version 1.0.7 – Minor bugfix

Last week, I was informed by an ITSx user that the software behaved strangely when input files containing extremely long sequence identifiers were used. The bug is not likely to have affected a majority of users, but in any case it is now fixed, and ITSx can now handle sequence identifiers of any length. The new update brings ITSx to version 1.0.7, and it can be downloaded here. Happy barcoding!

Metagenomics workshop at SciLifeLab

Science for Life Laboratories (SciLifeLab) in Stockholm will host a metagenome data analysis workshop on May 21-23, in which I will participate as a tutorial assistant. Additionally, our group leader Joakim Larsson will be giving a lecture about how we use metagenomics to assess the environmental reservoir of antibiotic resistance genes (much of my recent work will likely go into that). I hope to meet you there, so don’t forget to register!

Confirmed speakers:
Lex Nederbragt, Oslo University, Norway
Saskia Smits, Erasmus University Rotterdam, Netherlands
Joakim Larsson, Göteborg University, Sweden
Paul Wilmes, University of Luxembourg, Luxembourg
Anders Andersson, SciLifeLab, Sweden
Noan Le Bescot, UPMC (Tara expedition), France

The workshop is part of the AllBio Bioinformatics initiative.

PhD position: Come and work with us!

If you are thinking about doing a PhD and think that bioinformatics and antibiotic resistance is a cool subject, then now is your chance to come and join us for the next four years! There is a PhD position open i Joakim Larsson’s group, which means that if you get the job you will work with me, Joakim Larsson, Erik Kristiansson, Ørjan Samuelsen and Carl-Fredrik Flach on a super-interesting project relating to discovery of novel beta-lactamase genes (NoCURE). The project aims to better understand where, how and under what circumstances these genetic transfer events take place, in order to provide opportunities to limit or delay resistance development and thus increase the functional lifespan of precious antibiotics. The lion’s share of the work will be related to interpreting large-scale sequencing data generated by collaborators within the project; both genome sequencing and metagenomic data.

This is a great opportunity to prove your bioinformatics skills and use them for something urgently important. Full details about the position can be found here.

PetKit updated to version 1.1

It’s been a while since the PETKit got any attention from me. Partially, that has been due to a nasty bug that could produce no output for one of the read files in Pefcon when using FASTA input files, but mostly it has simply been due to lack of time to continue development on the package. Now, I have finally put all threads together (bug fixes, new features, documentation) and today the 1.1 version is released! The new features are:

  • A new tool has been added – peacat – that can be used to e.g. stitch contigs together that have been separated for one reason or another in an assembly
  • Another tool – pemap – has been added that can be used to determine whether an assembled contig is from a circular DNA element
  • The default offset value for FASTQ files has been set to 33 (as in Sanger and Illumina 1.8+ PHRED format)
  • The documentation has been vastly improved (but is still rather inferior)