hidimstat.clustered_inference#

hidimstat.clustered_inference(X_init, y, ward, n_clusters, scaler_sampling=None, train_size=1.0, groups=None, seed=0, n_jobs=1, memory=None, verbose=1, **kwargs)[source]#

Clustered inference algorithm for statistical analysis of high-dimensional data.

This algorithm implements the method described in [Chevalier et al., 2022] for performing statistical inference on high-dimensional linear models using feature clustering to reduce dimensionality.

Parameters:
X_initndarray, shape (n_samples, n_features)

Original high-dimensional input data matrix.

yndarray, shape (n_samples,) or (n_samples, n_times)

Target variable(s). Can be univariate or multivariate (temporal) data.

wardsklearn.cluster.FeatureAgglomeration

Hierarchical clustering object that implements Ward’s method for feature agglomeration.

n_clustersint

Number of clusters to use for dimensionality reduction.

scaler_samplingsklearn.preprocessing object, optional (default=None)

Scaler to standardize the clustered features.

train_sizefloat, optional (default=1.0)

Fraction of samples to use for computing the clustering. When train_size=1.0, all samples are used.

groupsndarray, shape (n_samples,), optional (default=None)

Sample group labels for stratified subsampling.

seedint, optional (default=0)

Random seed for reproducible subsampling.

n_jobsint, optional (default=1)

Number of parallel jobs for computation.

memorystr or joblib.Memory object, optional (default=None)

Used to cache the output of the computation of the clustering and the inference. By default, no caching is done. If a string is given, it is the path to the caching directory.

verboseint, optional (default=1)

Verbosity level for progress messages.

**kwargsdict

Additional arguments passed to the statistical inference function.

Returns:
ward_FeatureAgglomeration

Fitted clustering object.

beta_hatndarray, shape (n_clusters,) or (n_clusters, n_times)

Estimated coefficients at cluster level.

theta_hatndarray

Estimated precision matrix.

precision_diagndarray

Diagonal of the covariance matrix.

Notes

The algorithm follows these main steps: 1. Subsample the data (if train_size < 1) 2. Cluster features using Ward hierarchical clustering 3. Transform data to cluster space 4. Perform statistical inference using desparsified lasso

References

Examples using hidimstat.clustered_inference#

Support recovery on fMRI data

Support recovery on fMRI data

Support recovery on simulated data (2D)

Support recovery on simulated data (2D)