--- title: "Inference for Synthetic Control Estimators" author: "Jens Hainmueller and Alexis Diamond" date: "`r format(Sys.Date(), '%B %Y')`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Inference for Synthetic Control Estimators} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5, dpi = 96 ) set.seed(1) ``` ## Overview This vignette walks through inference for the synthetic control method as implemented in the `Synth` package, using the canonical example from Abadie, Diamond, and Hainmueller (2010): California's Proposition 99 cigarette tax of 1988. For the basic `dataprep()` → `synth()` → `path.plot()` flow, see `vignette("synth-quickstart", package = "Synth")`. ### Which inference method should I use? | Question | Method | Function | |---|---|---| | How surprising is the effect vs. other units? | placebo MSPE-ratio rank | `mspe_test()` | | Lightweight prediction band around the counterfactual | split-conformal | `synth_inference(method = "conformal")` | | Same, with i.i.d. Gaussian residuals | parametric | `synth_inference(method = "parametric")` | | Period-varying intervals decomposing in/out-of-sample uncertainty | CFPT | `scpi::scpi()` (separate package) | It covers: 1. Building the synthetic control with the new `synth_data()` wrapper. 2. Computing prediction intervals with `synth_inference()` (split-conformal and Gaussian-residual methods). 3. Running an in-space placebo test with `generate_placebos()`, `mspe_test()`, `mspe_plot()`, and `plot_placebos()`. For background on the method itself, see `?synth` and the references at the end. For methods that further decompose in-sample and out-of-sample uncertainty (period-varying prediction intervals), see the **scpi** package on CRAN. ```{r setup} library(Synth) data(smoking) ``` ## 1. Build the synthetic control The `smoking` dataset (new in `Synth 1.2-0`) is the canonical state-level panel from Abadie, Diamond, and Hainmueller (2010): 39 US states from 1970 through 2000, with annual per-capita cigarette sales as the outcome and four covariates (log income, beer consumption, the share of the population aged 15-24, and the retail price of cigarettes). California raised its cigarette tax via Proposition 99 in late 1988, so 1989 is the first post-treatment year and the donor pool is the 38 other states. ```{r dataprep} ca_id <- unique(smoking$state_id[smoking$state_name == "California"]) dp <- synth_data( panel = smoking, outcome = "cigsale", unit_col = "state_id", time_col = "year", treated = ca_id, treatment_time = 1989, predictors = c("lnincome", "age15to24", "retprice", "beer"), special_predictors = list( list("cigsale", 1988, "mean"), list("cigsale", 1980, "mean"), list("cigsale", 1975, "mean") ), unit_names_col = "state_name" ) fit <- synth(dp, verbose = FALSE) ``` The fitted weights match the published Proposition 99 paper closely: ```{r weights} top <- sort(fit$solution.w[, 1], decreasing = TRUE) head(top, 5) ``` Synthetic California is roughly Utah + Nevada + Montana + Connecticut — a convex combination of states that, in the pre-period, smoked at similar rates and had similar demographics. ```{r path-plot} path.plot(synth.res = fit, dataprep.res = dp, Ylab = "Cigarette sales per capita (packs)", Xlab = "Year", Legend = c("California", "Synthetic California"), tr.intake = 1989) ``` ## 2. Prediction intervals: `synth_inference()` `synth_inference()` returns a prediction band around the synthetic counterfactual. Two methods are available. ### Split-conformal (default) The half-width is the order statistic at rank $k = \lceil (n+1)(1-\alpha) \rceil$ of the absolute pre-treatment residuals (Chernozhukov, Wuthrich, and Zhu 2021): ```{r conformal} inf_conf <- synth_inference(fit, dp, method = "conformal", alpha = 0.10) inf_conf ``` ```{r conformal-plot, fig.cap = "Split-conformal 90% band around the synthetic counterfactual."} plot(inf_conf, Ylab = "Cigarette sales per capita (packs)", Xlab = "Year", Main = "California: 90% conformal band") ``` The band is constant in width by construction. It is finite-sample valid when pre-treatment residuals are exchangeable; with autocorrelated outcomes the nominal coverage is approximate. ### Parametric (Gaussian residuals) The half-width is `qnorm(1 - alpha/2) * sd(gap_pre)`: ```{r parametric} inf_par <- synth_inference(fit, dp, method = "parametric", alpha = 0.10) inf_par$sigma_pre ``` ```{r parametric-plot, fig.cap = "Parametric Gaussian 90% band."} plot(inf_par, Ylab = "Cigarette sales per capita (packs)", Xlab = "Year", Main = "California: 90% parametric band") ``` ### When to use which | Method | Validity | Width | |---|---|---| | `conformal` | Exact under exchangeability of pre-period residuals | order statistic of $|gap_{pre}|$ | | `parametric` | Exact under i.i.d. Gaussian residuals | `qnorm(1 - α/2) * sd(gap_pre)` | Both produce constant-width bands. They do *not* separately quantify uncertainty about the synthetic weights. For period-varying intervals that decompose in-sample and out-of-sample uncertainty, see the **scpi** package. ## 3. Placebo inference The classical Abadie-Diamond-Hainmueller approach treats each donor state as if *it* had received the intervention, refits `synth()`, and ranks the treated unit's post/pre MSPE ratio against the placebo distribution. ### Generate the placebos `generate_placebos()` runs one refit per donor. With 38 donor states this takes about a minute on a laptop. ```{r placebos} placebos <- generate_placebos(fit, dp, verbose = FALSE) placebos ``` ### Test `mspe_test()` returns a one-sided p-value via the empirical rank of the treated unit's post/pre MSPE ratio: ```{r test} test <- mspe_test(placebos) test$mspe_ratio_treated test$pvalue test$n_valid_placebos ``` California's post/pre MSPE ratio is far in the right tail of the placebo distribution: only one or two of the 38 placebo states have ratios as extreme as California's, so the one-sided p-value is small. ### Plot the placebo distribution ```{r mspe-plot, fig.cap = "Distribution of post/pre MSPE ratios across placebos. California is highlighted."} mspe_plot(placebos) ``` ### Plot the placebo gaps ```{r placebo-gaps, fig.cap = "Treated gap (black) overlaid on placebo gaps (grey)."} plot_placebos(placebos, Ylab = "Gap in cigarette sales per capita (packs)", Xlab = "Year", Main = "California and placebo gaps") abline(v = 1989, lty = 3) ``` For interpretability, ADH 2010 recommend dropping placebos whose pre-period MSPE is much worse than the treated unit's (their synthetic controls couldn't fit them well in the first place): ```{r placebo-gaps-trimmed, fig.cap = "Same plot, restricted to placebos with reasonable pre-period fit."} plot_placebos(placebos, mspe_threshold = 5, Ylab = "Gap in cigarette sales per capita (packs)", Xlab = "Year", Main = "California and placebo gaps (pre-MSPE <= 5x California)") abline(v = 1989, lty = 3) ``` ## 4. Combining prediction intervals and placebos The two inference modes are orthogonal. A typical Proposition 99 results section would include both: a prediction band on the path plot to show the size of the post-period drop, and a placebo p-value to show that the drop is unusual relative to other states. ```{r combined} data.frame( method = c("conformal 90%", "parametric 90%", "placebo p-value"), width_or_pvalue = c(2 * inf_conf$conformal_q, 2 * stats::qnorm(0.95) * inf_par$sigma_pre, test$pvalue), notes = c("constant-width band", "constant-width Gaussian band", "one-sided MSPE-ratio rank") ) ``` ## Alternative QP backends By default, `synth()` solves the inner simplex-constrained QP for the unit weights via `kernlab::ipop`. As of `Synth 1.2-0`, two opt-in backends are also available: ```{r alt-backends, eval = FALSE} # Convex-optimization backend (CVXR; default solver is OSQP) fit_cvxr <- synth(dp, quadopt = "cvxr", verbose = FALSE) # Autodiff/GPU backend (Frank-Wolfe simplex LS via torch) fit_torch <- synth(dp, quadopt = "torch", verbose = FALSE) ``` Both are in `Suggests:` and only loaded when requested. The CVXR backend is a drop-in alternative useful when ipop converges slowly or you want a different solver pool (OSQP by default; SCS, ECOS, or MOSEK selectable via `cvxr_pars = list(solver = ...)`). The torch backend supports CPU, CUDA, and Apple MPS via `torch_pars = list(device = "cuda")` and is most useful on large panels. The first torch call in a session may require `torch::install_torch()` to download libtorch (~600MB). For Python users with very large panels or who want autodiff at the estimator level, see [Apoorva Lal's `trex` package](https://github.com/apoorvalal/trex) — a PyTorch-native implementation of the synthetic control family (SC, synthdid, matrix completion). ## See also - `?synth_inference`, `?generate_placebos`, `?mspe_test`, `?mspe_plot`, `?plot_placebos`. - The **SCtools** package provides additional placebo-based machinery (the function names in `Synth` match `SCtools` by design). - The **scpi** package implements Cattaneo-Feng-Palomba-Titiunik prediction intervals, which decompose in-sample uncertainty about the synthetic weights and out-of-sample residual uncertainty. ## References - Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies. *Journal of the American Statistical Association*, 105(490), 493-505. - Chernozhukov, V., Wuthrich, K., and Zhu, Y. (2021). An exact and robust conformal inference method for counterfactual and synthetic controls. *Journal of the American Statistical Association*, 116(536), 1849-1864. - Cattaneo, M. D., Feng, Y., Palomba, F., and Titiunik, R. (2025). Uncertainty quantification in synthetic controls with staggered treatment adoption. *Review of Economics and Statistics*.