hts-nim is a library that allows one to use htslib via the nim programming language. Nim is a garbage-collected language that compiles to C and often has similar performance. I have become very productive in nim and especially in hts-nim and there are by now, at least a few other users of hts-nim. This post is to show how one particular feature of nim enables users to write their own functions that will be used no differently than hts-nim’s provided functionality.

Currently, in order to access the format (sample) fields in a VCF, one must first allocate a seq (something like a typed python list) that gets filled. The example below shows a code stub to extract the DP (depth) and GQ (genotype quality) fields for each sample into those pre-allocated seqs.

var dps = newSeq[int32]()
var gqs = newSeq[int32]()
for variant in vcf:
    doAssert variant.format.get("DP", dps) == Status.OK
    doAssert variant.format.get("GQ", gqs) == Status.OK
    # do stuff with dps and gqs ...

This lets hts-nim fill the dps and gqs sequences without allocating memory to provides maximal performance. It’s not much of a burden, but it might be nice to have a function that just returns the values and raises an error if the field is not found so that pre-allocation is not necesary.

Since nim has universal function call syntax (UFCS), it’s easy to write some sugar that can be used to get values without pre-allocating if that’s what’s required. Here’s how I’d write that (though see note at end of post).

proc ints*(f:FORMAT, field: string): seq[int32] =
  var vals = newSeq[int32]()
  if variant.format.get(field, vals) != Status.OK:
    raise newException(KeyError, "error getting field:" & field & ". " & $Status.OK
  return vals

This will throw an exception if the requested field does not exist. Note that the * after ints means the function is exported for use outside the currrent module, which is what we want here.

This function could be in a file named sugar.nim and then could be used as:

import hts/vcf
import ./sugar

for variant in vcf:
    var dps = variant.format.ints("DP") # note these could raise exceptions.
    var gqs = variant.format.ints("GQ")
    # do stuff with dps and gqs ...

This shows the UFCS as ints(variant.format, "DP") is equivalent to variant.format.ints("DP").

This means that users are not limited to the syntax and functionality that hts-nim provides, they can write their own functions and use them as ergonomically as those that are part of hts-nim. Since hts-nim errs on the side of efficiency rather than ergonomics, functions like these could make some nice syntactic sugar for using hts-nim as a sort of scripting language for processing genomic data.

side note.

The above sugar function ints could be written more succinctly using nimsresultvariable along with the fact thatseqs are nevernil` as:

proc ints*(f:FORMAT, field: string): seq[int32] =
  if variant.format.get(field, result) != Status.OK:
    raise newException(KeyError, "error getting field:" & field & ". " & $Status.OK

where result will be initalized by nim and set to the proper size (and filled) by hts-nim.