Performance Tips

Following the golang philosophy, there are few knobs to turn for tweaking performance.

Processes

The simplest performance tweak is to use -p PROCS to specify the number of processes to use when running vcfanno. It scales well to ~15 processes.

Garbage Collection

On machines with even modest amounts of memory, it can be good to allow go to use more memory for the benefit of spending less time in garbage collection. Users can do this by preceding their vcfanno command with GOGC=1000. Where higher values allow go to use more memory and the default value is 100. For example:

GOGC=2000 vcfanno -p 12 a.conf a.vcf

CSI

For very dense files such as CADD, or even gnomAD or ExAC, it is recommended to index with csi, this allows finer resolution in the index. When a .csi file is present, vcfanno will prefer it over a .tbi. For example, using:

tabix -m 12 --csi $file

will work for most cases. When a csi is present, it seems to be best to lower the IRELATE_MAX_GAP (see below) to 1000 or lower. Doing this, we can see a 50 % speed improvement when using a csi-index ExAC file to annotate a clinvar file.

Experiment with what works best for each scenario.

Max Gap / Chunk Size

The parallel chrom-sweep algorithm has a gap size parameter that determines when a chunk of records from the the query file is sent to be annotated and a maximum chunk size with the same function. If a gap of a certain size is encountered or a number of records equally the requested chunk size, a new chunk is sent off. Given a (number of) dense annotation file(s), it might be good to reduce the gap size so that vcfanno will need to parse fewer unneeded records. However, given sparse annotation sets, it is best to have this value be large so that each annotation worker gets enough work to keep it busy.

The default gap size is 5000 bases. Users can alter this using the environment variable IRELATE_MAX_GAP. When using a csi index this can be much lower, for example 1000

The default chunk size is 8000 query intervals. Users can alter this using the environment variable IRELATE_MAX_CHUNK.

IRELATE_MAX_CHUNK=12000 IRELATE_MAX_GAP=5000 vcfanno -p 12 a.conf a.vcf