The function implements the derandomized knockoffs and produces a rejection set

base_filter(
  X,
  y,
  v0 = 1,
  M = 30,
  tau = 0.5,
  knockoff_method = "gaussian",
  knockoff_stat = stat.glmnet_coefdiff,
  seed = 24601,
  mu = NULL,
  Sigma = NULL,
  pInit = NULL,
  Q = NULL,
  pEmit = NULL
)

Arguments

X

a n-by-p matrix of the covariates.

y

the response vector of length in (can be continuous or binary).

v0

a positive numver indicating the parameter of the base procedure.

M

an integer specifying the number of knockoff copies computed (default: 30).

tau

a number betweem 0 and 1 indicating the selection frequency (default: 0.5).

knockoff_method

either "gaussian" or "hmm" (default: "gaussian").

knockoff_stat

a feature importance statistic in the knockoff pacakge (e.g., stat.glmnet_coefdiff).

seed

an integer specifying the random seed used in the procedure.

mu

a length-p mean vector of X if it follows a Gaussian distribution.

Sigma

a p-by-p covariance matrix of X if it follows a Gaussian distribution.

pInit

n array of length K, containing the marginal distribution of the states for the first variable, if X is sampled from an HMM.

Q

an array of size (p-1,K,K), containing a list of p-1 transition matrices between the K states of the Markov chain, if X is sampled from an HMM.

pEmit

an array of size (p,M,K), containing the emission probabilities for each of the M possible emission states, from each of the K hidden states and the p variables, if X is sampled from an HMM.

Value

S the selection set. pi the selection frequency of all selected variables. W a M-by-p matrix of the computed knockoff feature importance statistics.

Examples

#Generate data
n <- 100; p <- 50; s <- 10;
rho <- 0.5;
Sigma <- toeplitz(rho^(1:p-1))
X <- matrix(rnorm(n*p),n,p)%*%chol(Sigma)
beta <- rep(0,p)
beta[1:s] <- 5/sqrt(n)
y <- X%*%beta+rnorm(n)

# Control PFER at level v=1
res <- base_filter(X,y,v0=1, knockoff_method = "gaussian",
                 knockoff_stat = stat.glmnet_coefdiff,
                 mu = rep(0,p),Sigma = Sigma)
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute
#> Warning: doParallel is not installed. Without parallelization, the statistics will be slower to compute