R/simData.R
simulate_data_step.RdSimulation of data from a confounded non-linear model. Where the non-linear function is a random regression tree. The data generating process is given by: $$Y = f(X) + \delta^T H + \nu$$ $$X = \Gamma^T H + E$$ where \(f(X)\) is a random regression tree with \(m\) random splits of the data. Resulting in a random step-function with \(m+1\) levels, i.e. leaf-levels. $$f(x_i) = \sum_{k = 1}^K 1_{\{x_i \in R_k\}} c_k$$ \(E\), \(\nu\) are random error terms and \(H \in \mathbb{R}^{n \times q}\) is a matrix of random confounding covariates. \(\Gamma \in \mathbb{R}^{q \times p}\) and \(\delta \in \mathbb{R}^{q}\) are random coefficient vectors. For the simulation, all the above parameters are drawn from a standard normal distribution, except for \(\delta\) which is drawn from a normal distribution with standard deviation 10. For a split a covariate is sampled uniformly and split at a random point using a beta distribution (with both shape parameters equal 2) on the support of the chosen covariate. The leaf levels \(c_k\) are drawn from a uniform distribution between \(cl\) and \(cu\).
simulate_data_step(q, p, n, m, make_tree = FALSE, cl = -50, cu = 50)number of confounding covariates in H
number of covariates in X
number of observations
number of splits done using a random covariate
Whether the random regression tree should be returned.
lower limit of the uniform distribution of the step levels
upper limit of the uniform distribution of the step levels
a list containing the simulated data:
a matrix of covariates
a vector of responses
a vector of the true function f(X)
the indices of the causal covariates in X
If make_tree, the random regression tree of class
SDTree
There are no references for Rd macro \insertAllCites on this help page.