Outcome models with ebalance weights

This vignette shows how to plug ebalance() weights into common outcome models. The pattern is almost always the same:

  1. Build the weights with ebalance().
  2. Attach them to the data via weights(fit).
  3. Pass them to a downstream regression with weights = w.
  4. Use a heteroskedasticity-robust variance estimator for inference.
library(ebal)

A simulated example

set.seed(20260505)
n <- 1000
X <- data.frame(
  x1 = rnorm(n),
  x2 = rbinom(n, 1, 0.4),
  x3 = rnorm(n)
)
# Selection on x1, x2: treatment more likely when x1 > 0 or x2 = 1
ps <- plogis(0.6 * X$x1 + 1.2 * X$x2 - 0.5)
treat <- rbinom(n, 1, ps)
# True ATT = +2; outcome depends on x1 too
y <- 1 + 0.7 * X$x1 + 0.5 * X$x2 + 2 * treat + rnorm(n, sd = 1)
df <- data.frame(treat = treat, y = y, X)

# Naive ATT
mean(df$y[df$treat == 1]) - mean(df$y[df$treat == 0])
#> [1] 2.499398

The naive ATT is biased because covariates are unbalanced:

rbind(
  treated = colMeans(X[treat == 1, ]),
  control = colMeans(X[treat == 0, ])
)
#>                 x1        x2           x3
#> treated  0.2685523 0.5085106  0.046975573
#> control -0.2786596 0.2886792 -0.003859718

Fit

fit <- ebalance(treat ~ x1 + x2 + x3, data = df)
df$w <- weights(fit)

Weighted lm() with robust SEs

The point estimate comes straight from a weighted regression:

mod <- lm(y ~ treat, data = df, weights = w)
coef(mod)
#> (Intercept)       treat 
#>    1.395679    1.993894

Default lm() standard errors are wrong here because the weights induce heteroskedasticity. Use sandwich::vcovHC() (or vcovCL() if you have a clustering variable):

library(sandwich); library(lmtest)
#> 
#> Attaching package: 'sandwich'
#> The following object is masked from 'package:generics':
#> 
#>     estfun
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
coeftest(mod, vcov = vcovHC(mod, type = "HC1"))
#> 
#> t test of coefficients:
#> 
#>             Estimate Std. Error t value  Pr(>|t|)    
#> (Intercept) 1.395679   0.065177  21.414 < 2.2e-16 ***
#> treat       1.993894   0.084440  23.613 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The treat coefficient should be near the true ATT of 2. The ebal-balanced control group is a much better counterfactual than the raw control group.

Adding regression adjustment (doubly-robust)

Including covariates on the right-hand side gives a doubly-robust estimator: the ATT coefficient is consistent if either the weighting or the outcome model is correctly specified.

mod_dr <- lm(y ~ treat + x1 + x2 + x3, data = df, weights = w)
coef(mod_dr)["treat"]
#>    treat 
#> 1.993644

For the simulated DGP both are correct, so this should match mod’s coefficient closely. In real data, regression adjustment is a useful hedge.

Survey-style inference

If you’re already in the survey package world, ebalance weights slot in as weights = in svydesign():

library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#> 
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#> 
#>     dotchart
des <- svydesign(ids = ~1, weights = ~w, data = df)
svymod <- svyglm(y ~ treat, design = des)
summary(svymod)$coefficients["treat", , drop = FALSE]
#>       Estimate Std. Error  t value     Pr(>|t|)
#> treat 1.993894  0.0843981 23.62487 2.284447e-98

The survey-package SEs are also robust to the weighting; they typically agree with sandwich::vcovHC(..., "HC1") to a few percent on cross-sectional data.

Trimming if weights blow up

Sometimes the entropy-balancing weights have a heavy right tail (max(w) / mean(w) in the dozens). Two options:

library(generics)
glance(fit)[, c("ess_control", "max_weight_ratio_control")]
#>   ess_control max_weight_ratio_control
#> 1    297.1991                 7.044764

trimmed <- ebalance.trim(fit)        # automatic minimization
glance(trimmed)[, c("ess_control", "max_weight_ratio_control")]
#>   ess_control max_weight_ratio_control
#> 1    282.0851                 2.100974

ebalance.trim() returns an object with the same shape, so weights(trimmed) and the downstream regression code are unchanged. The trimmed fit relaxes balance slightly to keep the max weight ratio low.

Choice of estimand for inference

Everything above runs under the default estimand = "ATT". For ATE or ATC, the only thing that changes is weights(fit) (it returns nontrivial weights for both groups under ATE). The lm() / svyglm() syntax is identical. See vignette("estimands", package = "ebal") for the comparison.