crystal

crystal.crystal contains the main functions to call DMRs and example cluster-modeling functions.

Model clustered, correlated data.

class crystal.crystal.CountFeature(chrom, pos, methylated, counts, rho_min=0.5)[source]

Feature Class that supports count data.

class crystal.crystal.Feature(chrom, pos, values, ovalues=None, rho_min=0.5)[source]

A feature object that can and likely should be used by all programs that call crystal. Takes a chromosome, a position and a list of float values that are the methylation measurements (should be logit transformed).

Attributes

chrom: str  
position: int  
values: list  
spos: str string position (chr1:12354)
rho_min (float) minimum spearman’s R to be considered correlated
ovalues (list) other values potentially used by modeling functions
distance(other)[source]

Distance between this feature and another.

is_correlated(other)[source]

Return boolean indicating correlation with other.

crystal.crystal.bump_cluster(formula, cluster, covs, coef, nsims=20000, value_fn=<function coef_sum at 0x2ac156fd10c8>, method=<class 'statsmodels.regression.linear_model.OLS'>)[source]

Model clusters by fitting model at each site and then comparing some metric to the same metric from models fit to simulated data. Uses sequential Monte-carlo to stop once we know the simulated p-value is high (since we are always interested in low p-values).

Same signature as gee_cluster()

crystal.crystal.gee_cluster(formula, cluster, covs, coef, cov_struct=<statsmodels.genmod.cov_struct.Exchangeable object at 0x2ac15544d610>, family=<statsmodels.genmod.families.family.Gaussian object at 0x2ac15544d650>)[source]

An example of a model_fn; any function with a similar signature can be used.

Parameters:

formula : str

R (patsy) style formula. Must contain ‘methylation’: e.g.: methylation ~ age + gender + race

cluster : list of Features

cluster of features from clustering or a region. most functions will create a methylation matrix with: >> meth = np.array([f.values for f in features])

covs : pandas.DataFrame

Contains covariates from formula

coef: str

coefficient of interest, e.g. ‘age’

cov_struct: object

one of the covariance structures provided by statsmodels. Likely either Exchangeable() or Independence()

family: object

one of the familyies provided by statsmodels. If Guassian(), then methylation is assumed to be count-based (clusters of CountFeatures.

Returns:

result : dict

dict with values (keys) of at least p-value (‘p’), coefficient estimate (‘coef’) and any other information desired.

crystal.crystal.mixed_model_cluster(formula, cluster, covs, coef)[source]

Model clusters with a mixed-model, same signature as gee_cluster()

crystal.crystal.model_clusters(clust_iter, clin_df, formula, coef, model_fn=<function gee_cluster at 0x2ac156fcbc80>, pool=None, transform=None, n_cpu=None, **kwargs)[source]

For each cluster in an iterable, evaluate the chosen model and yield a dictionary of information

Parameters:

clust_iter : iterable

iterable of clusters

clin_df : pandas.DataFrame

Contains covariates from formula

formula : str

R (patsy) style formula. Must contain ‘methylation’: e.g.: methylation ~ age + gender + race

coef : str

The coefficient of interest in the model, e.g. ‘age’

model_fn : fn

A function with signature fn(formula, methylation, covs, coef, kwargs) that returns a dictionary with at least p-value and coef

transform: fn

A function that modifies the data before modeling.

n_cpu : int

kwargs: dict

arguments sent to model_fn

crystal.crystal.nb_cluster(formula, cluster, covs, coef)[source]

Model a cluster of correlated features with the negative binomial

crystal.crystal.ols_cluster_robust(formula, cluster, covs, coef)[source]

Model clusters with cluster-robust OLS, same signature as gee_cluster()

crystal.crystal.one_cluster(formula, feature, covs, coef, method=<class 'statsmodels.regression.linear_model.OLS'>, _pat=<_sre.SRE_Pattern object at 0x2ac156fcc030>)[source]

used when we have a “cluster” with 1 probe.

crystal.crystal.wrapper(model_fn, formula, cluster, clin_df, coef, kwargs=None)[source]

wrap the user-defined functions to return everything we expect and to call just OLS when there is a single probe.

crystal.crystal.zscore_cluster(formula, cluster, covs, coef, method=<class 'statsmodels.regression.linear_model.OLS'>, method_kwargs=None, mid=<function mean at 0x2ac146cbc050>)[source]

Model clusters by fitting model at each site and then combining using the z-score method. Same signature as gee_cluster()