This vignette shows how to plug
ebalance() weights into common outcome models. The pattern
is almost always the same:
ebalance().weights(fit).weights = w.set.seed(20260505)
n <- 1000
X <- data.frame(
x1 = rnorm(n),
x2 = rbinom(n, 1, 0.4),
x3 = rnorm(n)
)
# Selection on x1, x2: treatment more likely when x1 > 0 or x2 = 1
ps <- plogis(0.6 * X$x1 + 1.2 * X$x2 - 0.5)
treat <- rbinom(n, 1, ps)
# True ATT = +2; outcome depends on x1 too
y <- 1 + 0.7 * X$x1 + 0.5 * X$x2 + 2 * treat + rnorm(n, sd = 1)
df <- data.frame(treat = treat, y = y, X)
# Naive ATT
mean(df$y[df$treat == 1]) - mean(df$y[df$treat == 0])
#> [1] 2.499398The naive ATT is biased because covariates are unbalanced:
lm() with robust SEsThe point estimate comes straight from a weighted regression:
Default lm() standard errors are wrong here because the
weights induce heteroskedasticity. Use sandwich::vcovHC()
(or vcovCL() if you have a clustering variable):
library(sandwich); library(lmtest)
#>
#> Attaching package: 'sandwich'
#> The following object is masked from 'package:generics':
#>
#> estfun
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
coeftest(mod, vcov = vcovHC(mod, type = "HC1"))
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.395679 0.065177 21.414 < 2.2e-16 ***
#> treat 1.993894 0.084440 23.613 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1The treat coefficient should be near the true ATT of 2.
The ebal-balanced control group is a much better counterfactual than the
raw control group.
Including covariates on the right-hand side gives a doubly-robust estimator: the ATT coefficient is consistent if either the weighting or the outcome model is correctly specified.
mod_dr <- lm(y ~ treat + x1 + x2 + x3, data = df, weights = w)
coef(mod_dr)["treat"]
#> treat
#> 1.993644For the simulated DGP both are correct, so this should match
mod’s coefficient closely. In real data, regression
adjustment is a useful hedge.
If you’re already in the survey package world, ebalance
weights slot in as weights = in
svydesign():
library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#>
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#>
#> dotchart
des <- svydesign(ids = ~1, weights = ~w, data = df)
svymod <- svyglm(y ~ treat, design = des)
summary(svymod)$coefficients["treat", , drop = FALSE]
#> Estimate Std. Error t value Pr(>|t|)
#> treat 1.993894 0.0843981 23.62487 2.284447e-98The survey-package SEs are also robust to the weighting; they
typically agree with sandwich::vcovHC(..., "HC1") to a few
percent on cross-sectional data.
Sometimes the entropy-balancing weights have a heavy right tail
(max(w) / mean(w) in the dozens). Two options:
library(generics)
glance(fit)[, c("ess_control", "max_weight_ratio_control")]
#> ess_control max_weight_ratio_control
#> 1 297.1991 7.044764
trimmed <- ebalance.trim(fit) # automatic minimization
glance(trimmed)[, c("ess_control", "max_weight_ratio_control")]
#> ess_control max_weight_ratio_control
#> 1 282.0851 2.100974ebalance.trim() returns an object with the same shape,
so weights(trimmed) and the downstream regression code are
unchanged. The trimmed fit relaxes balance slightly to keep the max
weight ratio low.
Everything above runs under the default
estimand = "ATT". For ATE or ATC, the only thing that
changes is weights(fit) (it returns nontrivial weights for
both groups under ATE). The lm() /
svyglm() syntax is identical. See
vignette("estimands", package = "ebal") for the
comparison.