hidimstat.model_x_knockoff#
- hidimstat.model_x_knockoff(X, y, estimator=LassoCV(cv=KFold(n_splits=5, random_state=0, shuffle=True), max_iter=200000, tol=1e-06, verbose=0), preconfigure_estimator=<function preconfigure_estimator_LassoCV>, fdr=0.1, centered=True, cov_estimator=LedoitWolf(assume_centered=True), joblib_verbose=0, n_bootstraps=1, n_jobs=1, random_state=None, tol_gauss=1e-14, memory=None)[source]#
Model-X Knockoff
This module implements the Model-X knockoff inference procedure, which is an approach to control the False Discovery Rate (FDR) based on Candes et al.[1]. The original implementation can be found at msesia/knockoff-filter The noisy variables are generated with second-order knockoff variables using the equi-correlated method.
In addition, this function generates multiple sets of Gaussian knockoff variables and calculates the test statistics for each set. It then aggregates the test statistics across the sets to improve stability and power.
- Parameters:
- X2D ndarray (n_samples, n_features)
The design matrix.
- y1D ndarray (n_samples, )
The target vector.
- estimatorsklearn estimator instance or a cross validation instance, optional
The estimator used for fitting the data and computing the test statistics. This can be any estimator with a fit method that accepts a 2D array and a 1D array, and a coef_ attribute that returns a 1D array of coefficients. Examples include LassoCV, LogisticRegressionCV, and LinearRegression.
Configuration example:
LassoCV(alphas=alphas, n_jobs=None, verbose=0, max_iter=1000,cv=KFold(n_splits=5, shuffle=True, random_state=0), tol=1e-8)LogisticRegressionCV(penalty=”l1”, max_iter=1000, solver=”liblinear”,cv=KFold(n_splits=5, shuffle=True, random_state=0), n_jobs=None,tol=1e-8)LogisticRegressionCV(penalty=”l2”, max_iter=1000, n_jobs=None,verbose=0, cv=KFold(n_splits=5, shuffle=True, random_state=0),tol=1e-8,)- preconfigure_estimatorcallable, default=preconfigure_estimator_LassoCV
A function that configures the estimator for the Model-X knockoff procedure. If provided, this function will be called with the estimator, X, X_tilde, and y as arguments, and should modify the estimator in-place.
- fdrfloat, default=0.1
The desired controlled False Discovery Rate (FDR) level.
- centeredbool, default=True
Whether to standardize the data before performing the inference procedure.
- cov_estimatorestimator object, default=LedoitWolf()
Estimator for empirical covariance matrix.
- joblib_verboseint, default=0
Verbosity level for parallel jobs.
- n_bootstrapsint, default=1
Number of bootstrap samples for aggregation.
- n_jobsint, default=1
Number of parallel jobs.
- random_stateint or None, default=None
The random seed used to generate the Gaussian knockoff variables.
- tol_gaussfloat, default=1e-14
A tolerance value used for numerical stability in the calculation of the Cholesky decomposition in the gaussian generation function.
- memorystr or Memory object, default=None
Used to cache the output of the clustering and inference computation. By default, no caching is done. If provided, it should be the path to the caching directory or a joblib.Memory object.
- Returns:
- selectedndarray or list of ndarrays
Selected feature indices. List if n_bootstraps>1.
- test_scoresndarray or list of ndarrays
Test statistics. List if n_bootstraps>1.
- thresholdfloat or list of floats
Knockoff thresholds. List if n_bootstraps>1.
- X_tildesndarray or list of ndarrays
Generated knockoff variables. List if n_bootstraps>1.
References