hidimstat.ensemble_clustered_inference#

hidimstat.ensemble_clustered_inference(X_init, y, ward, n_clusters, scaler_sampling=None, train_size=0.3, groups=None, seed=0, n_bootstraps=25, n_jobs=None, verbose=1, memory=None, **kwargs)[source]#

Ensemble clustered inference algorithm for high-dimensional statistical inference, as described in [Chevalier et al., 2022].

This algorithm combines multiple runs of clustered inference with different random subsamples to provide more robust statistical estimates. It uses the desparsified lasso method for inference.

Parameters:
X_initndarray, shape (n_samples, n_features)

Original high-dimensional input data matrix.

yndarray, shape (n_samples,) or (n_samples, n_times)

Target variable(s). Can be univariate or multivariate (temporal) data.

wardsklearn.cluster.FeatureAgglomeration

Feature agglomeration object implementing Ward hierarchical clustering.

n_clustersint

Number of clusters for dimensionality reduction.

scaler_samplingsklearn.preprocessing object, optional (default=None)

Scaler to standardize the clustered features.

train_sizefloat, optional (default=0.3)

Fraction of samples used for clustering. Using train_size < 1 enables random subsampling for better generalization.

groupsndarray, shape (n_samples,), optional (default=None)

Sample group labels for stratified subsampling. Ensures balanced representation of groups in subsamples.

inference_methodstr, optional (default=’desparsified-lasso’)

Method used for inference. Currently, the two available methods are ‘desparsified-lasso’ and ‘group-desparsified-lasso’. Use ‘desparsified-lasso’ for non-temporal data and ‘group-desparsified-lasso’ for temporal data.

seed: int, optional (default=0)

Seed used for generating the first random subsample of the data. This seed controls the clustering randomness.

ensembling_methodstr, optional (default=’quantiles’)

Method used for ensembling. Currently, the two available methods are ‘quantiles’ and ‘median’.

gamma_minfloat, optional (default=0.2)

Lowest gamma-quantile considered to compute the adaptive quantile aggregation formula. This parameter is used only if ensembling_method is ‘quantiles’.

n_bootstrapsint, optional (default=25)

Number of bootstrap iterations for ensemble inference.

n_jobsint or None, optional (default=None)

Number of parallel jobs. None means using all processors.

verbose: int, optional (default=1)

The verbosity level. If verbose > 0, a message is printed before running the clustered inference.

memoryjoblib.Memory or str, optional (default=None)

Used to cache the output of the clustering and inference computation. By default, no caching is done. If provided, it should be the path to the caching directory or a joblib.Memory object.

**kwargsdict

Additional keyword arguments passed to statistical inference functions.

Returns:
list_wardlist of FeatureAgglomeration

List of fitted clustering objects from each bootstrap.

list_beta_hatlist of ndarray

List of estimated coefficients from each bootstrap.

pvalndarray, shape (n_features,)

p-value, with numerically accurate values for positive effects (i.e., for p-values close to zero).

list_theta_hatlist of ndarray

List of estimated precision matrices.

list_precision_diaglist of ndarray

List of diagonal elements of covariance matrices.

one_minus_pvalndarray, shape (n_features,)

One minus the p-value, with numerically accurate values for negative effects (i.e., for p-values close to one).

Notes

The algorithm performs these steps for each bootstrap iteration: 1. Subsample the data using stratified sampling if groups are provided 2. Cluster features using Ward’s hierarchical clustering 3. Transform data to reduced cluster space 4. Perform statistical inference using desparsified lasso 5. Aggregate results across all iterations

References

Examples using hidimstat.ensemble_clustered_inference#

Support recovery on fMRI data

Support recovery on fMRI data

Support recovery on simulated data (2D)

Support recovery on simulated data (2D)