In this lab, we will in less than two hours perform a complete (although somewhat simplified) analysis of a metagenomic data set. We will learn about metagenomic read assembly, annotation of metagenomic contigs, and some aspects of taxonomic and functional analysis of the metagenomic data. We will also briefly touch upon how this kind of data can be visualized.

The aim of this lab is to work with true next generation sequencing data, of the kind that is generated at the Genomics facility today. In other words, we will be working with paired-end Illumina sequences. An important of aspect of this is that such data sets are too large to be analyzed within two hours. In fact, they would many times require weeks of computational time, not to mention human working hours. To solve this issue, we will be working with subsets of samples that obey to the small timeframe we have. This reduces our ability to draw really good conclusions from the data, but still conveys the message of metagenomic data analysis in a realistic way (except from the time point of view, of course). With this in mind, let’s get started with the exercise!

Lab Instructions

Command Cheat Sheet

Command Cheat Sheet in Text-format

You are left with the following:

  • Three metagenomic data sets of quality filtered, paired-end DNA reads
  • A database of Hidden Markov Models representing HTH domains, extracted from Pfam

To be able to complete this in time, I would like you to team up with two other people, where you take responsibility for one of the data sets each. The data sets can be downloaded from the following addresses:

The results from the lab can be downloaded here:
All files needed for visualization part
Visualization PDF with questions