This FAQ intends to answer some of the questions we have gotten about the Metaxa2 software package and its predecessor – Metaxa. Please look through this document before contacting me with a question. If the FAQ does not answer your question, you are very welcome to send an e-mail with the question to metaxa [at] microbiology [dot] se

How should I cite Metaxa2?

Since there is now a number of Metaxa-related publications, the decision on which to cite when has grown more complicated. This is how I would prefer that Metaxa2 was cited in various contexts:

How complete is the rRNA database of Metaxa2?

The Metaxa2 database is mainly based on SILVA release 111. We are aware of that this release is getting outdated and plan to update the classification database in a forthcoming update to Metaxa2.

Where do the chloroplast rRNA sequences in the Metaxa2 database come from?

The chloroplast sequences in the Metaxa2 database come from SILVA release 111.

Where do the mitochondrial rRNA sequences in the Metaxa2 database come from?

The mitochondrial sequences in the Metaxa2 database come from SILVA release 111 and the MitoZoa database (version 2).

I have paired-end sequence data, but in this case only provide Metaxa2 with the forward reads using the following command:
metaxa2 -i read1.fastq -o test
However, it seems Metaxa2 is still treating it as paired-end data. What is going on here?

Probably, your pair file (xxxxx_2.fastq) is also present in the same directory as the xxxxxx_1.fastq file. In that case, the Metaxa2 auto-detect format function will guess the name of the pair file and if that file exists Metaxa2 will switch into paired-end mode using both files. This behavior is intended to help users avoid typing a lot of stuff, but it may make things confusing in some situations. Adding -f fastq solves this issue since that explicitly tells Metaxa2 which format that is used.

Can I concatenate my two paired-end reads files and into one file, and process it as if it all was from non-paired data?

Yes, but bear in mind that this kind of concatenation introduces a bias towards the detected reads, so results on community composition will be complicated to interpret. Generally, I would advise against doing this. Furthermore, depending on how your reads were named there may be duplicate read IDs when files are concatenated way. Check this first (that is, if the read ID includes a designation for _1 or _2). Furthermore, use the “-f fastq” option to make sure that Metaxa2 doesn’t treat the data as paired-end.

Does the different orientation of the paired reads matter?

No, as Metaxa2 detects rRNA on the reverse complements as well (by default).

Is there any script or tool to merge the output of different samples into a single table?

Yes, as of version 2.1 and later, Metaxa2 includes metaxa2_dc which does this neatly. metaxa2_dc is part of the Metaxa2 Diversity Tools.

When extracting the Metaxa2 archive I get a lot of error messages of this type:
tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword `SCHILY.dev'
tar: Ignoring unknown extended header keyword `SCHILY.ino'
tar: Ignoring unknown extended header keyword `SCHILY.nlink'

Is there a problem with the Metaxa2 archive?

No, some versions of Metaxa2 have been prepared on MacOS X using a distribution of tar that creates these strange “extended headers”. You can just ignore these error messages if you see them.

I analyze data from microbial communities that live on/in host organism X. Is there a way to remove all rRNA sequences from organism X and just keep all the other ones?

Yes, you can use the --reference option to do this. Provide a FASTA file with the rRNA sequences of organism X to the --reference option and those will be sorted out separately.

Which version of MAFFT should I use for Metaxa2?

We recommend using any 7.X version of MAFFT.

Does Metaxa2 work on Windows?

Short answer is no. I do not know anyone who has gotten Metaxa2 to work under Cygwin, so that is not a good option. However, Metaxa2 works very fine under virtualization software such as Virtual Box running a virtual Linux environment. Another good option is to find a colleague with a Mac and kindly ask him or her to run your sequences for you.

There are no people running either Mac or Linux on my department [or: I am afraid of the Linux people/Mac fanboys], what do I do?

I would recommend downloading Virtual Box (http://www.virtualbox.org/) and Bio-Linux (http://envgen.nox.ac.uk/tools/bio-linux). When you have gotten Virtual Box to run with Bio-Linux it should be relatively straightforward to install HMMER3 and Metaxa2 on the virtual machine using the Metaxa2 manual.

Metaxa2 runs with my specified input file but the output file says that no sequences are detected. What goes on?

Probably you have specified a file that does not exist. Currently, Metaxa2 does not warn the user that the input file is non-existent (which of course it will in a later version). It may also be that you are executing Metaxa2 in another directory than you intend. Check that the input file you desire to use is located in the current directory by typing “ls” on the command line.

Does Metaxa2 support BLAST+?

Yes. Use the “--plus T” option to specify use of BLAST+ instead of the “old-school” BLAST.

The Metaxa2 installer script (install_metaxa2) does not work. What can I do?

Under some operating system environments (primarily on some Linux distributions and on Mac OS X versions prior to 10.4) the installer script that comes with Metaxa2 might not work. Usually this happens because of lack of permissions, but it can also be caused by differing login file structures on various Linux distribution. The first thing that should be tested if Metaxa2 refuses to run after install, is to logout of the system completely, and then re-login again. Sometimes this solves the issue.

If this does not help, the installer script actually failed, and Metaxa2 must be installed manually. This can be done by moving into the Metaxa2 directory and copy all files starting with “metaxa2” into the preferred bin-directory, e.g. by typing “cp -r metaxa2* ~/bin/”. After this has been done, you may need to edit your login script so Metaxa2 is added to your PATH. If your bin-directory is already present in your PATH, this should not be a problem, and you should be able to run Metaxa2 immediately after copying it into the bin-directory, e.g. by typing “metaxa2 -h”.

I have a multicore CPU, why is Metaxa2 so slow?

Most likely you have not specified Metaxa2 to use more than one CPU. Using the “--cpu [number]” option the speed of Metaxa2 improves dramatically.

Metaxa2 spawns a lot of HMMER processes that use up more CPUs than I specified using the --cpu option. What goes on?

Metaxa2’s multi-threading system only partially takes into account the number of CPUs you specified using the --cpu option. If it is critical for you that Metaxa2 does not eat more CPU power than specified you should use the “--multi_thread F” option, which forces Metaxa2 to run the HMMER searches sequentially within the CPU limit.

I started Metaxa2 and it stopped early on in the progress with the following message: “Checking and handling input sequence data (should not take long)…” Why isn’t the software continuing?

You have not specified any input file (using the -i option). Therefore Metaxa2 has stopped, waiting for standard input. If this was unintended press control-C, and try again adding the input file option.

Can I use Metaxa2 to find SSU/LSU sequences in my newly sequenced and assembled genome?

Yes, as of Metaxa2 version 2.1 and later this is possible. Make sure that you have set the “--mode” option to either “auto” or “genome“.

Can I use Metaxa2 to determine the number of copies of SSU sequences in a certain species?

Yes, as of Metaxa2 version 2.1 and later this is possible. Make sure that you have set the “--mode” option to either “auto” or “genome“. That said, multiple SSUs within a genome is tricky for e.g. assembly software and therefore copy number detection can still be uncertain.