This post is to introduce indexcov and answer common question I get about interpretation.
I have found that, as indexcov is applied to cohorts approaching size of 100 or so, the probability that it will reveal something very interesting (either a data artefact or large chromosomal anomaly) approaches 1.0. For example, who knew that there were trisomies in Simons diversity panel?
indexcov quickly estimates coverage for whole genome BAM and CRAM files.
mosdepth ideas
I’m working on a new project and part of it is made possible by an observation that we stumbled on with mosdepth. It’s something that’s obvious in retrospect but wasn’t fully apparent to me until after mosdepth was mostly written. In short, that observation is computers can do stuff with arrays quickly.
The longer story behind that obvious and simple observation is as follows. mosdepth is a tool to calculate depth from BAM/CRAM files.
Smoove
smoove wraps existing software and adds some internal read-filtering to simplify calling and genotyping structural variants. It parallelizes each step as it can, for example, it streams lumpy output directly to multiple svtyper processes for genotyping. It contains several sub-commands but users with cohorts of less than about 40 samples can get a joint-called, genotyped VCF in a single command:
smoove call -x --genotype --name $name --outdir . \ -f $fasta --processes 12 --exclude $bed *.
Noise
This afternoon, I did a quick analysis to attempt to find variants on the X chromosome that are under recessive constraint–that is that they appear in some non-zero frequency in females but never occur in male. That is, without an extra copy, a variant might be embryonic lethal in males, but could be seen in females thanks to a backup copy. I thought that these might be occurring at a relatively high allele frequency (greater than 0.