Simulates data from a grouped instrument design model that includes discrete covariates and a binary endogenous regressor generated via a threshold crossing model. The function incorporates essential heterogeneity, where the random treatment coefficient is correlated with the first-stage selection error.
GenData_cov(
S = 5,
p1 = 7/8,
Het = 3,
sigeps = 0.5,
sigev = 0.1,
beta = 0,
beta0 = 0,
omega = 0.1,
K = 20,
c = 5
)Numeric. Concentration parameter \(\mu^2\), scaling the instrument strength.
Numeric (0 to 1). Probability parameter controlling the correlation between the first-stage error \(v\) and the treatment heterogeneity \(\xi\).
Numeric. Heterogeneity parameter, scaling the magnitude of \(\xi\).
Numeric. Standard deviation of the structural error \(\varepsilon\).
Numeric. Coefficient governing the correlation between \(\varepsilon\) and \(v\).
Numeric. Average Treatment Effect (ATE).
Numeric. Null hypothesis value for \(\beta\).
Numeric. Scaling parameter for the covariate effects \(\gamma\).
Integer. Number of instrument groups minus one.
Integer. Number of observations per group.
A data frame containing:
Overall observation index.
Instrument group identifier (the effective Z).
Covariate stratum identifier.
Instrument effect \(\pi\).
Covariate effect \(\gamma\).
Latent first-stage variable (Uniform).
Random treatment effect heterogeneity.
Binary endogenous regressor (-1 or 1).
Outcome variable.
Residual under the null.
Variables projected onto the annihilator \(M = I - P\).
The Data Generating Process (DGP) is structured as follows:
Design Matrix:
Covariates (W): Defined by pairs of instrument groups (e.g., strata).
Instruments (Z): Nested within covariates (e.g., judges within years).
Weights: The UJIVE weighting matrix \(G\) is computed as \(U(P_{[Z,W]}) - U(P_W)\).
Structural Equations: $$X_i = 2 \cdot \mathbb{I}(v_i < \pi_{g(i)} + \gamma_{d(i)}) - 1$$ $$Y_i = X_i (\beta + \xi_i) + \gamma_{d(i)} + \varepsilon_i$$
Heterogeneity: The variable \(X\) is binary (taking values -1, 1). The random slope \(\xi_i\) is drawn conditionally on the first-stage latent variable \(v_i\), inducing correlation between selection into treatment and treatment gains (Essential Heterogeneity).