Quickstart: entropy balancing with ebal

What is entropy balancing?

Entropy balancing (Hainmueller 2012) reweights a control sample so that its covariate moments match the treated group’s, with weights staying as close as possible to a base distribution (uniform by default) in a maximum-entropy sense. The output weights drop directly into a lm(..., weights = w) call to estimate the average treatment effect on the treated.

This vignette shows the package’s user-facing API on a small toy dataset.

library(ebal)
#> ##
#> ## ebal Package: Implements Entropy Balancing.
#> ## See https://web.stanford.edu/~jhain/ for additional information.

A minimal example

set.seed(20260504)
n0 <- 200; n1 <- 100
X <- rbind(
  replicate(3, rnorm(n0, mean = 0)),       # controls
  replicate(3, rnorm(n1, mean = 0.5))      # treated, shifted
)
colnames(X) <- c("x1", "x2", "x3")
treatment <- c(rep(0, n0), rep(1, n1))

Pre-weighting, the control means differ from the treated means markedly:

treated_means <- colMeans(X[treatment == 1, ])
control_means <- colMeans(X[treatment == 0, ])
rbind(treated = treated_means, control = control_means)
#>                  x1         x2          x3
#> treated  0.49121834  0.4640827  0.54795272
#> control -0.00708219 -0.0848847 -0.07201758

Fit

fit <- ebalance(treat ~ x1 + x2 + x3,
                data = data.frame(treat = treatment, X))
#> Warning: ebalance() fit converged but is concentrated on a small number of units:
#>   - control max/mean weight ratio = 10.3 > 10
#> Consider ebalance.trim(), tighter constraint.tolerance, or fewer moment constraints. See ?diagnostics. Suppress with options(ebal.warn_weak_fit = FALSE).
fit
#> Entropy balancing  (estimand: ATT)
#> ---------------------------------
#> Treated:    100
#> Controls:   200 (reweighted; sum of weights = 100.514)
#> Moments:    3 covariate moment(s) balanced
#> Converged:  TRUE   (max moment deviation = 0.514)
#> 
#> Use summary() for a balance table, weights() for the per-unit
#> weight vector, and plot() for a Love plot of standardized differences.

Either the formula interface above, or the matrix interface ebalance(Treatment = treatment, X = X), works. Both produce an ebalance object with print / summary / plot / weights methods.

Tidy output

# tidy() / glance() / augment() are registered against the generics in
# the `generics` package, which `broom` re-exports. Loading either
# makes the methods discoverable.
library(generics)
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
tidy(fit)
#>   term mean_treated_pre mean_treated_post mean_control_pre mean_control_post
#> 1   x1        0.4912183         0.4912183      -0.00708219         0.4897936
#> 2   x2        0.4640827         0.4640827      -0.08488470         0.4609244
#> 3   x3        0.5479527         0.5479527      -0.07201758         0.5451661
#>    diff_pre   diff_post std_diff_pre std_diff_post pct_reduction
#> 1 0.4983005 0.001424760    0.5032732   0.001438978      99.71408
#> 2 0.5489674 0.003158291    0.5676139   0.003265567      99.42469
#> 3 0.6199703 0.002786572    0.5544364   0.002492018      99.55053
glance(fit)
#>   estimand n_treated n_control n_moments sum_weights_control
#> 1      ATT       100       200         3            100.5143
#>   sum_weights_treated ess_control ess_treated max_weight_control
#> 1                 100     76.6416         100           5.189638
#>   max_weight_treated max_weight_ratio_control max_weight_ratio_treated
#> 1                  1                 10.32617                        1
#>   max_abs_std_diff_pre max_abs_std_diff_post   maxdiff converged
#> 1            0.5676139           0.003265567 0.5142846      TRUE

tidy() is a per-covariate balance table. glance() is a one-row summary including the Kish effective sample size and convergence flag.

weights(fit) returns a length-n vector aligned to the original data: treated units get weight 1, controls get the entropy-balancing weight.

length(weights(fit))
#> [1] 300
range(weights(fit)[treatment == 0])
#> [1] 0.01118102 5.18963792

Plotting

The base-graphics plot(fit) and the ggplot2 autoplot(fit) both produce a Love plot of standardized differences before vs. after weighting:

library(ggplot2)
autoplot(fit)

Using the weights downstream

The natural drop-in for a weighted regression:

df <- data.frame(treat = treatment, X, y = X[, 1] + 2 * treatment + rnorm(n0 + n1))
df$w <- weights(fit)
lm(y ~ treat, data = df, weights = w)

Two solver methods

By default ebalance() uses Newton-Raphson on the dual problem (fast, exact when the Hessian is well-conditioned). As of 0.3-0 you can also use a torch-based autodiff solver (BFGS on gradients computed via automatic differentiation) — contributed by Apoorva Lal:

fit_ad <- ebalance(treat ~ x1 + x2 + x3,
                   data = data.frame(treat = treatment, X),
                   method = "autodiff")

The two methods produce equivalent weights (within solver tolerance). Newton is faster on the small problems most users have; the autodiff path is more stable when the optimization landscape is poorly conditioned and scales better at large covariate counts. torch is in Suggests:; the first call may require torch::install_torch() to download libtorch.

References

Hainmueller, J. (2012). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, 20(1), 25–46.

Quickstart: entropy balancing with `ebal`