---
title: "Inference for Synthetic Control Estimators"
author: "Jens Hainmueller and Alexis Diamond"
date: "`r format(Sys.Date(), '%B %Y')`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Inference for Synthetic Control Estimators}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5,
  dpi = 96
)
set.seed(1)
```

## Overview

This vignette walks through inference for the synthetic control method
as implemented in the `Synth` package, using the canonical example
from Abadie, Diamond, and Hainmueller (2010): California's
Proposition 99 cigarette tax of 1988. For the basic `dataprep()` →
`synth()` → `path.plot()` flow, see `vignette("synth-quickstart",
package = "Synth")`.

### Which inference method should I use?

| Question | Method | Function |
|---|---|---|
| How surprising is the effect vs. other units? | placebo MSPE-ratio rank | `mspe_test()` |
| Lightweight prediction band around the counterfactual | split-conformal | `synth_inference(method = "conformal")` |
| Same, with i.i.d. Gaussian residuals | parametric | `synth_inference(method = "parametric")` |
| Period-varying intervals decomposing in/out-of-sample uncertainty | CFPT | `scpi::scpi()` (separate package) |

It covers:

1. Building the synthetic control with the new `synth_data()` wrapper.
2. Computing prediction intervals with `synth_inference()`
   (split-conformal and Gaussian-residual methods).
3. Running an in-space placebo test with `generate_placebos()`,
   `mspe_test()`, `mspe_plot()`, and `plot_placebos()`.

For background on the method itself, see `?synth` and the references at
the end. For methods that further decompose in-sample and out-of-sample
uncertainty (period-varying prediction intervals), see the **scpi**
package on CRAN.

```{r setup}
library(Synth)
data(smoking)
```

## 1. Build the synthetic control

The `smoking` dataset (new in `Synth 1.2-0`) is the canonical
state-level panel from Abadie, Diamond, and Hainmueller (2010): 39 US
states from 1970 through 2000, with annual per-capita cigarette
sales as the outcome and four covariates (log income, beer
consumption, the share of the population aged 15-24, and the retail
price of cigarettes). California raised its cigarette tax via
Proposition 99 in late 1988, so 1989 is the first post-treatment
year and the donor pool is the 38 other states.

```{r dataprep}
ca_id <- unique(smoking$state_id[smoking$state_name == "California"])

dp <- synth_data(
  panel              = smoking,
  outcome            = "cigsale",
  unit_col           = "state_id",
  time_col           = "year",
  treated            = ca_id,
  treatment_time     = 1989,
  predictors         = c("lnincome", "age15to24", "retprice", "beer"),
  special_predictors = list(
    list("cigsale", 1988, "mean"),
    list("cigsale", 1980, "mean"),
    list("cigsale", 1975, "mean")
  ),
  unit_names_col     = "state_name"
)

fit <- synth(dp, verbose = FALSE)
```

The fitted weights match the published Proposition 99 paper closely:

```{r weights}
top <- sort(fit$solution.w[, 1], decreasing = TRUE)
head(top, 5)
```

Synthetic California is roughly Utah + Nevada + Montana + Connecticut
— a convex combination of states that, in the pre-period, smoked at
similar rates and had similar demographics.

```{r path-plot}
path.plot(synth.res = fit, dataprep.res = dp,
          Ylab = "Cigarette sales per capita (packs)",
          Xlab = "Year",
          Legend = c("California", "Synthetic California"),
          tr.intake = 1989)
```

## 2. Prediction intervals: `synth_inference()`

`synth_inference()` returns a prediction band around the synthetic
counterfactual. Two methods are available.

### Split-conformal (default)

The half-width is the order statistic at rank
$k = \lceil (n+1)(1-\alpha) \rceil$ of the absolute pre-treatment
residuals (Chernozhukov, Wuthrich, and Zhu 2021):

```{r conformal}
inf_conf <- synth_inference(fit, dp,
                            method = "conformal", alpha = 0.10)
inf_conf
```

```{r conformal-plot, fig.cap = "Split-conformal 90% band around the synthetic counterfactual."}
plot(inf_conf,
     Ylab = "Cigarette sales per capita (packs)",
     Xlab = "Year",
     Main = "California: 90% conformal band")
```

The band is constant in width by construction. It is finite-sample
valid when pre-treatment residuals are exchangeable; with
autocorrelated outcomes the nominal coverage is approximate.

### Parametric (Gaussian residuals)

The half-width is `qnorm(1 - alpha/2) * sd(gap_pre)`:

```{r parametric}
inf_par <- synth_inference(fit, dp,
                           method = "parametric", alpha = 0.10)
inf_par$sigma_pre
```

```{r parametric-plot, fig.cap = "Parametric Gaussian 90% band."}
plot(inf_par,
     Ylab = "Cigarette sales per capita (packs)",
     Xlab = "Year",
     Main = "California: 90% parametric band")
```

### When to use which

| Method | Validity | Width |
|---|---|---|
| `conformal` | Exact under exchangeability of pre-period residuals | order statistic of $|gap_{pre}|$ |
| `parametric` | Exact under i.i.d. Gaussian residuals | `qnorm(1 - α/2) * sd(gap_pre)` |

Both produce constant-width bands. They do *not* separately quantify
uncertainty about the synthetic weights. For period-varying intervals
that decompose in-sample and out-of-sample uncertainty, see the
**scpi** package.

## 3. Placebo inference

The classical Abadie-Diamond-Hainmueller approach treats each donor
state as if *it* had received the intervention, refits `synth()`, and
ranks the treated unit's post/pre MSPE ratio against the placebo
distribution.

### Generate the placebos

`generate_placebos()` runs one refit per donor. With 38 donor states
this takes about a minute on a laptop.

```{r placebos}
placebos <- generate_placebos(fit, dp, verbose = FALSE)
placebos
```

### Test

`mspe_test()` returns a one-sided p-value via the empirical rank of
the treated unit's post/pre MSPE ratio:

```{r test}
test <- mspe_test(placebos)
test$mspe_ratio_treated
test$pvalue
test$n_valid_placebos
```

California's post/pre MSPE ratio is far in the right tail of the
placebo distribution: only one or two of the 38 placebo states have
ratios as extreme as California's, so the one-sided p-value is small.

### Plot the placebo distribution

```{r mspe-plot, fig.cap = "Distribution of post/pre MSPE ratios across placebos. California is highlighted."}
mspe_plot(placebos)
```

### Plot the placebo gaps

```{r placebo-gaps, fig.cap = "Treated gap (black) overlaid on placebo gaps (grey)."}
plot_placebos(placebos,
              Ylab = "Gap in cigarette sales per capita (packs)",
              Xlab = "Year",
              Main = "California and placebo gaps")
abline(v = 1989, lty = 3)
```

For interpretability, ADH 2010 recommend dropping placebos whose
pre-period MSPE is much worse than the treated unit's (their
synthetic controls couldn't fit them well in the first place):

```{r placebo-gaps-trimmed, fig.cap = "Same plot, restricted to placebos with reasonable pre-period fit."}
plot_placebos(placebos,
              mspe_threshold = 5,
              Ylab = "Gap in cigarette sales per capita (packs)",
              Xlab = "Year",
              Main = "California and placebo gaps (pre-MSPE <= 5x California)")
abline(v = 1989, lty = 3)
```

## 4. Combining prediction intervals and placebos

The two inference modes are orthogonal. A typical Proposition 99
results section would include both: a prediction band on the path
plot to show the size of the post-period drop, and a placebo
p-value to show that the drop is unusual relative to other states.

```{r combined}
data.frame(
  method                = c("conformal 90%", "parametric 90%", "placebo p-value"),
  width_or_pvalue       = c(2 * inf_conf$conformal_q,
                            2 * stats::qnorm(0.95) * inf_par$sigma_pre,
                            test$pvalue),
  notes                 = c("constant-width band",
                            "constant-width Gaussian band",
                            "one-sided MSPE-ratio rank")
)
```

## Alternative QP backends

By default, `synth()` solves the inner simplex-constrained QP for the
unit weights via `kernlab::ipop`. As of `Synth 1.2-0`, two opt-in
backends are also available:

```{r alt-backends, eval = FALSE}
# Convex-optimization backend (CVXR; default solver is OSQP)
fit_cvxr  <- synth(dp, quadopt = "cvxr",  verbose = FALSE)

# Autodiff/GPU backend (Frank-Wolfe simplex LS via torch)
fit_torch <- synth(dp, quadopt = "torch", verbose = FALSE)
```

Both are in `Suggests:` and only loaded when requested. The CVXR
backend is a drop-in alternative useful when ipop converges slowly or
you want a different solver pool (OSQP by default; SCS, ECOS, or
MOSEK selectable via `cvxr_pars = list(solver = ...)`). The torch
backend supports CPU, CUDA, and Apple MPS via `torch_pars =
list(device = "cuda")` and is most useful on large panels. The first
torch call in a session may require `torch::install_torch()` to
download libtorch (~600MB).

For Python users with very large panels or who want autodiff at the
estimator level, see [Apoorva Lal's `trex`
package](https://github.com/apoorvalal/trex) — a PyTorch-native
implementation of the synthetic control family (SC, synthdid, matrix
completion).

## See also

- `?synth_inference`, `?generate_placebos`, `?mspe_test`,
  `?mspe_plot`, `?plot_placebos`.
- The **SCtools** package provides additional placebo-based machinery
  (the function names in `Synth` match `SCtools` by design).
- The **scpi** package implements Cattaneo-Feng-Palomba-Titiunik
  prediction intervals, which decompose in-sample uncertainty about
  the synthetic weights and out-of-sample residual uncertainty.

## References

- Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic
  control methods for comparative case studies. *Journal of the
  American Statistical Association*, 105(490), 493-505.
- Chernozhukov, V., Wuthrich, K., and Zhu, Y. (2021). An exact and
  robust conformal inference method for counterfactual and synthetic
  controls. *Journal of the American Statistical Association*,
  116(536), 1849-1864.
- Cattaneo, M. D., Feng, Y., Palomba, F., and Titiunik, R. (2025).
  Uncertainty quantification in synthetic controls with staggered
  treatment adoption. *Review of Economics and Statistics*.