Expand description
VCF/BCF processing with embedded V8 JavaScript.
This crate exposes HTSlib VCF/BCF records to JavaScript via the V8 engine, enabling powerful filtering, transformation, and analysis using JS expressions.
§Overview
The recommended way to use this library is via the Evaluator struct, which
compiles a JavaScript expression once and efficiently evaluates it against
multiple VCF records. It supports generic return types for type-safe extraction.
For simpler use cases, runner::run_vcf_expr_with provides a callback-based
API that handles file iteration for you.
§CLI Example
# Print chrom:pos for each variant
htsvcf input.vcf.gz "variant.chrom + ':' + variant.pos"
# Filter by INFO field
htsvcf input.vcf.gz "variant.info('DP') > 20 ? variant.toString() : ''"
# Access sample genotypes
htsvcf input.vcf.gz "variant.sample('NA12878').GT"§Library Example
This example demonstrates the core API: reading VCF records, modifying the header to add a new INFO field, translating records to the updated header, computing and setting INFO values, and writing the modified output.
use htsvcf::Evaluator;
use htsvcf_core::{open_writer, Header as CoreHeader, WriterOptions};
use rust_htslib::bcf::{self, Read};
fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
let mut reader = bcf::Reader::from_path("input.vcf.gz")?;
let mut eval = Evaluator::new(reader.header())?;
// Add a new INFO field to the header via JavaScript
eval.run("header.addInfo('VARIANT_LENGTH', '1', 'Integer', 'Length of variant (REF - ALT)')")?;
// Get the updated header for the writer
let updated_header = eval.header()?;
// Open writer with the modified header using htsvcf_core
let core_header = unsafe { CoreHeader::new(updated_header.inner) };
let mut writer = open_writer("output.vcf.gz", &core_header, WriterOptions::default())?;
// Define a filter function
eval.run("function passes(v) { return v.info('DP') >= 10 && v.qual >= 20 }")?;
let mut count = 0usize;
for result in reader.records() {
let mut record = result?;
// Translate the record to the updated header (required after adding INFO fields)
record.translate(&mut eval.header()?)?;
eval.set_record(record);
// Compute variant length and set the new INFO field via JavaScript
eval.run("variant.set_info('VARIANT_LENGTH', variant.ref.length - (variant.alt[0]?.length || 0))")?;
let passes: bool = eval.eval("passes(variant)")?;
if passes {
count += 1;
let mut record = eval.take().unwrap();
writer.write_record(&mut record)?;
}
}
eprintln!("Wrote {} variants", count);
Ok(())
}§JavaScript API
The following globals are available in JS expressions:
§variant - The current VCF record
Read-only fields:
variant.chrom- Chromosome name (string)variant.pos- 1-based position (integer)variant.start- 0-based start positionvariant.stop- End positionvariant.ref- Reference allele (string)variant.alt- Alternate alleles (array of strings)
Read/write fields:
variant.id- Variant ID (string, e.g., “rs12345”)variant.qual- Quality score (number or null)variant.filter- Filter status (array of strings)
INFO field access:
variant.info('DP') // => 42 (integer)
variant.info('AF') // => [0.25, 0.75] (array)
variant.info('SOMATIC') // => true (flag)
variant.info('MISSING') // => undefined (absent)
// Modify INFO (value type must match header definition)
variant.set_info('DP', 100)
variant.set_info('AF', [0.1, 0.9])
variant.set_info('SOMATIC', true)
variant.set_info('DP', null) // Clear the fieldFORMAT field access (per-sample):
variant.format('GT') // => ["0/1", "0/0", "1/1"] (one per sample)
variant.format('DP') // => [30, 25, null] (null for missing)
variant.format('AD') // => [[10, 20], [25, 0], [0, 30]] (arrays)Sample access:
// Get all FORMAT fields for one sample
const s = variant.sample('NA12878')
s.GT // => "0/1"
s.DP // => 30
s.AD // => [10, 20]
s.sample_name // => "NA12878"
// Get all samples at once (array of objects)
const all = variant.samples()
all[0].GT // First sample's genotype
// Get a subset of samples
const subset = variant.samples(['NA12878', 'NA12879'])Output:
variant.toString() // => Full VCF line (without newline)§Writer - Write VCF/BCF files
const w = new Writer('out.vcf', header)
for (const v of new Reader('in.vcf.gz')) {
// NOTE: write() consumes v
w.write(v)
}
w.close()§header - VCF header metadata
// List all samples
header.samples() // => ["NA12878", "NA12879", ...]
// Get INFO/FORMAT field definitions
header.get('INFO', 'DP')
// => { id: 'DP', type: 'Integer', number: '1', description: 'Read depth' }
header.get('FORMAT', 'GT')
// => { id: 'GT', type: 'String', number: '1', description: 'Genotype' }
// List all header records
header.records() // => [{ type: 'INFO', ID: 'DP', ... }, ...]
// Add new fields (for use with set_info)
header.addInfo('CUSTOM', '1', 'Integer', 'My custom field')
header.addFormat('CUSTOM', '1', 'Float', 'Per-sample value')
// Get full header text
header.toString()§Reader - Iterate VCF files from JS
const r = new Reader('input.vcf.gz')
// Iterate all records
for (const v of r) {
if (v.info('DP') > 20) {
print(v.toString())
}
}
// Query a region (requires index)
if (r.hasIndex()) {
r.query('chr1:1000-2000') // Region string
// or: r.query('chr1', 999, 2000) // 0-based coords
for (const v of r) {
// ... variants in region
}
}
// Access header
const samples = r.header().samples()Re-exports§
pub use evaluator::EvalError;pub use evaluator::Evaluator;pub use fromjs::FromJsValue;pub use fromjs::ToJsValue;pub use header::Header;pub use variant::Variant;
Modules§
- evaluator
- Standalone evaluator for applying JS expressions to VCF records.
- fromjs
- Traits and implementations for converting between Rust types and V8 JavaScript values.
- header
- V8-based
Headerobject exposed to JavaScript. - reader
- V8-based
Readerclass exposed to JavaScript. - runner
- High-level API for running JavaScript expressions over VCF/BCF files.
- runtime
- V8 runtime initialization and global locking.
- variant
- V8-based
Variantobject representing a single VCF/BCF record. - writer