hidimstat.clustered_inference#
- hidimstat.clustered_inference(X_init, y, ward, n_clusters, scaler_sampling=None, train_size=1.0, groups=None, seed=0, n_jobs=1, memory=None, verbose=1, **kwargs)[source]#
Clustered inference algorithm for statistical analysis of high-dimensional data.
This algorithm implements the method described in [Chevalier et al., 2022] for performing statistical inference on high-dimensional linear models using feature clustering to reduce dimensionality.
- Parameters:
- X_initndarray, shape (n_samples, n_features)
Original high-dimensional input data matrix.
- yndarray, shape (n_samples,) or (n_samples, n_times)
Target variable(s). Can be univariate or multivariate (temporal) data.
- wardsklearn.cluster.FeatureAgglomeration
Hierarchical clustering object that implements Ward’s method for feature agglomeration.
- n_clustersint
Number of clusters to use for dimensionality reduction.
- scaler_samplingsklearn.preprocessing object, optional (default=None)
Scaler to standardize the clustered features.
- train_sizefloat, optional (default=1.0)
Fraction of samples to use for computing the clustering. When train_size=1.0, all samples are used.
- groupsndarray, shape (n_samples,), optional (default=None)
Sample group labels for stratified subsampling.
- seedint, optional (default=0)
Random seed for reproducible subsampling.
- n_jobsint, optional (default=1)
Number of parallel jobs for computation.
- memorystr or joblib.Memory object, optional (default=None)
Used to cache the output of the computation of the clustering and the inference. By default, no caching is done. If a string is given, it is the path to the caching directory.
- verboseint, optional (default=1)
Verbosity level for progress messages.
- **kwargsdict
Additional arguments passed to the statistical inference function.
- Returns:
- ward_FeatureAgglomeration
Fitted clustering object.
- beta_hatndarray, shape (n_clusters,) or (n_clusters, n_times)
Estimated coefficients at cluster level.
- theta_hatndarray
Estimated precision matrix.
- precision_diagndarray
Diagonal of the covariance matrix.
Notes
The algorithm follows these main steps: 1. Subsample the data (if train_size < 1) 2. Cluster features using Ward hierarchical clustering 3. Transform data to cluster space 4. Perform statistical inference using desparsified lasso
References