simulate

crystal.simulate contains functions to simulate clusters.

crystal.simulate.rr_cluster(cluster, covs, formula)[source]

Set cluster values to reduced-residuals.

crystal.simulate.simulate_cluster(cluster, w=0, class_order=None, get_reduced_residuals=None, grr_args=())[source]

Modify the data in existing clusters to create or remove and effect.

Parameters:

cluster : list of clusters

should include clusters of length 1.

w : float

w = 0 generates random data. Higher values are more likely to separate the groups.

class_order : np.array of 0/1

list of same length as cluster[*].values indicating which group (0 or 1) each sample belongs to.

get_reduced_residuals : function

optional, see simulate_regions()

crystal.simulate.simulate_regions(clust_list, region_fh, sizes={1: 400, 2: 400, 3: 200, 4: 100, 6: 100, 7: 80, 8: 60, 9: 40, 10: 10}, class_order=None, seed=42, get_reduced_residuals=None, get_reduced_residuals_args=())[source]

Simulate regions and randomize others.

Parameters:

clust_list : list of clusters

should include clusters of length 1.

region_fh : filehandle

a BED file of all position will be written to this file. The 4th column will indicate true/false indicating if it was simulated to have a difference. The fifth column will indicate the size of the cluster it was in.

size : dict

keys of the clust_size and values of how many clusters to create of that size. Default is to create 100 of each size from 3 to 8 and 200 clusters of size one and 2. All others are randomized.

classes : np.array

same length as cluster[i].values indicating which group each sample belongs to.

seed: int

get_reduced_residuals : function

If this parameter is None, then they values are shuffled as they are received. A function that accepts a cluster and returns residuals of the reduced model. e.g. if the full model of interest is:

methylation ~ disease + age + gender

the reduced model would be:

methylation ~ age + gender

so that only the residuals of the reduced model are shuffled and the other effects should remain. This will implement the bootsrap for linear models from Efron and Tibshirani. An Example function would be: rr_cluster()

Returns:

generator of clusters in the same order as clust_list.