MetaxaQR FAQ

This FAQ intends to answer some of the questions we have gotten about the MetaxaQR software package and its predecessors – Metaxa and Metaxa2. Please look through this document before contacting me with a question. If the FAQ does not answer your question, you are very welcome to send an e-mail with the question to metaxa [at] microbiology [dot] se

How should I cite MetaxaQR?

Right now, there is no specific MetaxaQR publication. Please cite it as Metaxa2 until this changes. Since there is now a number of Metaxa-related publications, the decision on which to cite when has grown a bit complicated. This is how I would prefer that Metaxa2 was cited in various contexts:

How complete is the rRNA database of MetaxaQR?

The MetaxaQR database is (at the time of writing, April 2022) mainly based on SILVA release 138. There are newer versions, and once the code has stabilized we will update to the most recent SILVA release.

Where do the chloroplast rRNA sequences in the MetaxaQR database come from?

The chloroplast sequences in the MetaxaQR database come from SILVA release 138.

Where do the mitochondrial rRNA sequences in the MetaxaQR database come from?

The mitochondrial sequences in the MetaxaQR database come from SILVA release 138 and the MitoZoa database (version 2).

Can I use MetaxaQR on another barcoding region than the rRNA genes?

Yes. Use the MetaxaQR Database Builder (metaxaQR_dbb, included with the software package) to build a custom database, and then classify your sequences using this new database.

I have paired-end sequence data, but in this case only provide MetaxaQR with the forward reads using the following command:
metaxaQR -i read1.fastq -o test
However, it seems MetaxaQR is still treating it as paired-end data. What is going on here?

Probably, your pair file (xxxxx_2.fastq) is also present in the same directory as the xxxxxx_1.fastq file. In that case, the MetaxaQR auto-detect format function will guess the name of the pair file and if that file exists MetaxaQR will switch into paired-end mode using both files. This behavior is intended to help users avoid typing a lot of stuff, but it may make things confusing in some situations. Adding -f fastq solves this issue since that explicitly tells MetaxaQR which format that is used.

Can I concatenate my two paired-end reads files into one file, and process it as if it all was from non-paired data?

Yes, but bear in mind that this kind of concatenation introduces a bias towards the detected reads, so results on community composition will be complicated to interpret. Generally, I would advise against doing this. Furthermore, depending on how your reads were named there may be duplicate read IDs when files are concatenated way. Check this first (that is, if the read ID includes a designation for _1 or _2). Furthermore, use the “-f fastq” option to make sure that MetaxaQR doesn’t treat the data as paired-end.

Does the different orientation of the paired reads matter?

No, MetaxaQR detects rRNA on the reverse complements as well (by default).

Is there any script or tool to merge the output of different samples into a single table?

Yes, MetaxaQR includes metaxaQR_dc which does this neatly. metaxaQR_dc is part of the MetaxaQR Diversity Tools.

When extracting the MetaxaQR archive I get a lot of error messages of this type:
tar: Ignoring unknown extended header keyword `LIBARCHIVE.creationtime'
tar: Ignoring unknown extended header keyword `SCHILY.dev'
tar: Ignoring unknown extended header keyword `SCHILY.ino'
tar: Ignoring unknown extended header keyword `SCHILY.nlink'

Is there a problem with the MetaxaQR archive?

No, some versions of MetaxaQR have been prepared on MacOS using a distribution of tar that creates these strange “extended headers”. You can just ignore these error messages if you see them.

I analyze data from microbial communities that live on/in host organism X. Is there a way to remove all rRNA sequences from organism X and just keep all the other ones?

Yes, you can use the --reference option to do this. Provide a FASTA file with the rRNA sequences of organism X to the --reference option and those will be sorted out separately.

Which version of MAFFT should I use for MetaxaQR?

We recommend using any 7.X version of MAFFT.

Does MetaxaQR work on Windows?

Short answer is no. I do not know anyone who has gotten any version of Metaxa to work under Cygwin, so that is not a good option. However, MetaxaQR works very fine under virtualization software such as Virtual Box running a virtual Linux environment. Another good option is to find a colleague with a Mac and kindly ask him or her to run your sequences for you.

There are no people running either Mac or Linux on my department [or: I am afraid of the Linux people/Mac fanboys], what do I do?

I would recommend downloading Virtual Box (http://www.virtualbox.org/) and Bio-Linux (http://envgen.nox.ac.uk/tools/bio-linux). When you have gotten Virtual Box to run with Bio-Linux it should be relatively straightforward to install HMMER3 and MetaxaQR on the virtual machine using the MetaxaQR manual.

MetaxaQR runs with my specified input file but the output file says that no sequences are detected. What goes on?

Probably you have specified a file that does not exist. Currently, MetaxaQR does not warn the user that the input file is non-existent (which of course it will in a later version). It may also be that you are executing MetaxaQR in another directory than you intend. Check that the input file you desire to use is located in the current directory by typing “ls” on the command line.

The MetaxaQR installer script (install_metaxaQR) does not work. What can I do?

Under some operating system environments (primarily on some Linux distributions) the installer script that comes with MetaxaQR might not work. Usually this happens because of lack of permissions, but it can also be caused by differing login file structures on various Linux distribution. The first thing that should be tested if MetaxaQR refuses to run after install, is to logout of the system completely, and then re-login again. Sometimes this solves the issue.

If this does not help, the installer script actually failed, and MetaxaQR must be installed manually. This can be done by moving into the MetaxaQR directory and copy all files starting with “metaxaQR” into the preferred bin-directory, e.g. by typing “cp -r metaxaQR* ~/bin/”. You also need to copy the directory called “src” and the file “get_fasta” to the bin-directory. After this has been done, you may need to edit your login script so MetaxaQR is added to your PATH. If your bin-directory is already present in your PATH, this should not be a problem, and you should be able to run MetaxaQR immediately after copying it into the bin-directory, e.g. by typing “metaxaQR -h”.

I have a multicore CPU, why is MetaxaQR so slow?

Most likely you have not specified MetaxaQR to use more than one CPU. Using the “--cpu [number]” option the speed of MetaxaQR improves dramatically.

MetaxaQR spawns a lot of HMMER processes that use up more CPUs than I specified using the --cpu option. What goes on?

MetaxaQR’s multi-threading system only partially takes into account the number of CPUs you specified using the --cpu option. If it is critical for you that MetaxaQR does not eat more CPU power than specified you should use the “--multi_thread F” option, which forces MetaxaQR to run the HMMER searches sequentially within the CPU limit.

I started MetaxaQR and it stopped early on in the progress with the following message: “Checking and handling input sequence data (should not take long)…” Why isn’t the software continuing?

You have not specified any input file (using the -i option). Therefore MetaxaQR has stopped, waiting for standard input. If this was unintended press control-C, and try again adding the input file option.

Can I use MetaxaQR to find SSU/LSU sequences in my newly sequenced and assembled genome?

Yes. Just set the “--mode” option to either “auto” or “genome“.

Can I use MetaxaQR to determine the number of copies of SSU sequences in a certain species?

Yes. Set the “--mode” option to either “auto” or “genome“. That said, multiple SSUs within a genome is tricky for e.g. assembly software and therefore copy number detection can still be uncertain.