Simulates data from a grouped instrument design model that includes discrete covariates and a binary endogenous regressor generated via a threshold crossing model. The function incorporates essential heterogeneity, where the random treatment coefficient is correlated with the first-stage selection error.

GenData_cov(
  S = 5,
  p1 = 7/8,
  Het = 3,
  sigeps = 0.5,
  sigev = 0.1,
  beta = 0,
  beta0 = 0,
  omega = 0.1,
  K = 20,
  c = 5
)

Arguments

S

Numeric. Concentration parameter \(\mu^2\), scaling the instrument strength.

p1

Numeric (0 to 1). Probability parameter controlling the correlation between the first-stage error \(v\) and the treatment heterogeneity \(\xi\).

Het

Numeric. Heterogeneity parameter, scaling the magnitude of \(\xi\).

sigeps

Numeric. Standard deviation of the structural error \(\varepsilon\).

sigev

Numeric. Coefficient governing the correlation between \(\varepsilon\) and \(v\).

beta

Numeric. Average Treatment Effect (ATE).

beta0

Numeric. Null hypothesis value for \(\beta\).

omega

Numeric. Scaling parameter for the covariate effects \(\gamma\).

K

Integer. Number of instrument groups minus one.

c

Integer. Number of observations per group.

Value

A data frame containing:

group

Overall observation index.

groupZ

Instrument group identifier (the effective Z).

groupW

Covariate stratum identifier.

pi

Instrument effect \(\pi\).

gammad

Covariate effect \(\gamma\).

v

Latent first-stage variable (Uniform).

xi

Random treatment effect heterogeneity.

X

Binary endogenous regressor (-1 or 1).

Y

Outcome variable.

e

Residual under the null.

MX, Me, MY

Variables projected onto the annihilator \(M = I - P\).

Details

The Data Generating Process (DGP) is structured as follows:

Design Matrix:

  • Covariates (W): Defined by pairs of instrument groups (e.g., strata).

  • Instruments (Z): Nested within covariates (e.g., judges within years).

  • Weights: The UJIVE weighting matrix \(G\) is computed as \(U(P_{[Z,W]}) - U(P_W)\).

Structural Equations: $$X_i = 2 \cdot \mathbb{I}(v_i < \pi_{g(i)} + \gamma_{d(i)}) - 1$$ $$Y_i = X_i (\beta + \xi_i) + \gamma_{d(i)} + \varepsilon_i$$

Heterogeneity: The variable \(X\) is binary (taking values -1, 1). The random slope \(\xi_i\) is drawn conditionally on the first-stage latent variable \(v_i\), inducing correlation between selection into treatment and treatment gains (Essential Heterogeneity).