Crate htsvcf

Crate htsvcf 

Source
Expand description

VCF/BCF processing with embedded V8 JavaScript.

This crate exposes HTSlib VCF/BCF records to JavaScript via the V8 engine, enabling powerful filtering, transformation, and analysis using JS expressions.

§Overview

The recommended way to use this library is via the Evaluator struct, which compiles a JavaScript expression once and efficiently evaluates it against multiple VCF records. It supports generic return types for type-safe extraction.

For simpler use cases, runner::run_vcf_expr_with provides a callback-based API that handles file iteration for you.

§CLI Example

# Print chrom:pos for each variant
htsvcf input.vcf.gz "variant.chrom + ':' + variant.pos"

# Filter by INFO field
htsvcf input.vcf.gz "variant.info('DP') > 20 ? variant.toString() : ''"

# Access sample genotypes
htsvcf input.vcf.gz "variant.sample('NA12878').GT"

§Library Example

This example demonstrates the core API: reading VCF records, modifying the header to add a new INFO field, translating records to the updated header, computing and setting INFO values, and writing the modified output.

use htsvcf::Evaluator;
use htsvcf_core::{open_writer, Header as CoreHeader, WriterOptions};
use rust_htslib::bcf::{self, Read};

fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
    let mut reader = bcf::Reader::from_path("input.vcf.gz")?;
    let mut eval = Evaluator::new(reader.header())?;

    // Add a new INFO field to the header via JavaScript
    eval.run("header.addInfo('VARIANT_LENGTH', '1', 'Integer', 'Length of variant (REF - ALT)')")?;

    // Get the updated header for the writer
    let updated_header = eval.header()?;

    // Open writer with the modified header using htsvcf_core
    let core_header = unsafe { CoreHeader::new(updated_header.inner) };
    let mut writer = open_writer("output.vcf.gz", &core_header, WriterOptions::default())?;

    // Define a filter function
    eval.run("function passes(v) { return v.info('DP') >= 10 && v.qual >= 20 }")?;

    let mut count = 0usize;
    for result in reader.records() {
        let mut record = result?;

        // Translate the record to the updated header (required after adding INFO fields)
        record.translate(&mut eval.header()?)?;
        eval.set_record(record);

        // Compute variant length and set the new INFO field via JavaScript
        eval.run("variant.set_info('VARIANT_LENGTH', variant.ref.length - (variant.alt[0]?.length || 0))")?;

        let passes: bool = eval.eval("passes(variant)")?;
        if passes {
            count += 1;
            let mut record = eval.take().unwrap();
            writer.write_record(&mut record)?;
        }
    }

    eprintln!("Wrote {} variants", count);
    Ok(())
}

§JavaScript API

The following globals are available in JS expressions:

§variant - The current VCF record

Read-only fields:

  • variant.chrom - Chromosome name (string)
  • variant.pos - 1-based position (integer)
  • variant.start - 0-based start position
  • variant.stop - End position
  • variant.ref - Reference allele (string)
  • variant.alt - Alternate alleles (array of strings)

Read/write fields:

  • variant.id - Variant ID (string, e.g., “rs12345”)
  • variant.qual - Quality score (number or null)
  • variant.filter - Filter status (array of strings)

INFO field access:

variant.info('DP')           // => 42 (integer)
variant.info('AF')           // => [0.25, 0.75] (array)
variant.info('SOMATIC')      // => true (flag)
variant.info('MISSING')      // => undefined (absent)

// Modify INFO (value type must match header definition)
variant.set_info('DP', 100)
variant.set_info('AF', [0.1, 0.9])
variant.set_info('SOMATIC', true)
variant.set_info('DP', null) // Clear the field

FORMAT field access (per-sample):

variant.format('GT')         // => ["0/1", "0/0", "1/1"] (one per sample)
variant.format('DP')         // => [30, 25, null] (null for missing)
variant.format('AD')         // => [[10, 20], [25, 0], [0, 30]] (arrays)

Sample access:

// Get all FORMAT fields for one sample
const s = variant.sample('NA12878')
s.GT          // => "0/1"
s.DP          // => 30
s.AD          // => [10, 20]
s.sample_name // => "NA12878"

// Get all samples at once (array of objects)
const all = variant.samples()
all[0].GT     // First sample's genotype

// Get a subset of samples
const subset = variant.samples(['NA12878', 'NA12879'])

Output:

variant.toString()  // => Full VCF line (without newline)

§Writer - Write VCF/BCF files

const w = new Writer('out.vcf', header)
for (const v of new Reader('in.vcf.gz')) {
  // NOTE: write() consumes v
  w.write(v)
}
w.close()

§header - VCF header metadata

// List all samples
header.samples()  // => ["NA12878", "NA12879", ...]

// Get INFO/FORMAT field definitions
header.get('INFO', 'DP')
  // => { id: 'DP', type: 'Integer', number: '1', description: 'Read depth' }

header.get('FORMAT', 'GT')
  // => { id: 'GT', type: 'String', number: '1', description: 'Genotype' }

// List all header records
header.records()  // => [{ type: 'INFO', ID: 'DP', ... }, ...]

// Add new fields (for use with set_info)
header.addInfo('CUSTOM', '1', 'Integer', 'My custom field')
header.addFormat('CUSTOM', '1', 'Float', 'Per-sample value')

// Get full header text
header.toString()

§Reader - Iterate VCF files from JS

const r = new Reader('input.vcf.gz')

// Iterate all records
for (const v of r) {
    if (v.info('DP') > 20) {
        print(v.toString())
    }
}

// Query a region (requires index)
if (r.hasIndex()) {
    r.query('chr1:1000-2000')  // Region string
    // or: r.query('chr1', 999, 2000)  // 0-based coords
    for (const v of r) {
        // ... variants in region
    }
}

// Access header
const samples = r.header().samples()

Re-exports§

pub use evaluator::EvalError;
pub use evaluator::Evaluator;
pub use fromjs::FromJsValue;
pub use fromjs::ToJsValue;
pub use header::Header;
pub use variant::Variant;

Modules§

evaluator
Standalone evaluator for applying JS expressions to VCF records.
fromjs
Traits and implementations for converting between Rust types and V8 JavaScript values.
header
V8-based Header object exposed to JavaScript.
reader
V8-based Reader class exposed to JavaScript.
runner
High-level API for running JavaScript expressions over VCF/BCF files.
runtime
V8 runtime initialization and global locking.
variant
V8-based Variant object representing a single VCF/BCF record.
writer