| Title: | Synthetic Control Group Method for Comparative Case Studies |
|---|---|
| Description: | Implements the synthetic control group method for comparative case studies as described in Abadie and Gardeazabal (2003) and Abadie, Diamond, and Hainmueller (2010, 2011, 2014). The synthetic control method allows for effect estimation in settings where a single unit (a state, country, firm, etc.) is exposed to an event or intervention. It provides a data-driven procedure to construct synthetic control units based on a weighted combination of comparison units that approximates the characteristics of the unit that is exposed to the intervention. A combination of comparison units often provides a better comparison for the unit exposed to the intervention than any comparison unit alone. |
| Authors: | Jens Hainmueller [aut, cre], Alexis Diamond [aut] |
| Maintainer: | Jens Hainmueller <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.2-0 |
| Built: | 2026-05-26 22:11:03 UTC |
| Source: | https://github.com/j-hai/synth |
The dataset contains information from 1955–1997 on 17 Spanish regions. It was used by Abadie and Gardeazabal (2003), which studied the economic effects of conflict, using the terrorist conflict in the Basque Country as a case study. This paper used a combination of other Spanish regions to construct a synthetic control region resembling many relevant economic characteristics of the Basque Country before the onset of political terrorism in the 1970s. The data contains per-capita GDP (the outcome variable), as well as population density, sectoral production, investment, and human capital (the predictor variables) for the relevant years, and is used here to demonstrate the implementation of the synthetic control method with the synth library.
data(basque)data(basque)
A panel dataframe made up of 18 units: 1 treated (no 17; the Basque country) and 16 control regions (no. 2-16,18). Region no. 1 is the average for the whole country of Spain. 1 outcome variable (gdpcap). 13 predictor variables (6 sectoral production shares, 6 highest educational attainment categories, population density, and the investment rate). Region names and numbers are stored in regionno and regionname. 42 time periods (1955 - 1997). All columns have self-explanatory column names. For reference the variables are:
regionno
: Region Number.
regionname
: Region Name.
year
: Year.
gdpcap
: real GDP per capita (in 1986 USD, thousands).
sec.agriculture
: production in agriculture, forestry, and fishing sector as a percentage of total production.
sec.energy
: production in energy and water sector as a percentage of total production.
sec.industry
: production in industrial sector as a percentage of total production.
sec.construction
: production in construction and engineering sector as a percentage of total production.
sec.energy
: production in marketable services sector as a percentage of total production.
sec.energy
: production in Nonmarketable services sector as a percentage of total production.
school.illit
: number of illiterate persons.
school.prim
: number of persons with primary education or without studies.
school.med
: number of persons with some high school education.
school.high
: number of persons with high school degree.
school.post.high
: number of persons with tertiary education.
popdens
: population density (persons per square kilometer).
invest
: gross total investment as a share of GDP.
Abadie, A. and Gardeazabal, J. (2003) Economic Costs of Conflict: A Case Study of the Basque Country American Economic Review 93 (1) 113–132.
Abadie, A., Diamond, A., Hainmueller, J. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
An internal function that collects the results from the different optimization methods run by optimx. It stores the parameter and function values and extracts the results for the best performing method (minimum or maximum).
collect.optimx(res, opt = "min")collect.optimx(res, opt = "min")
res |
Output from a call to optimx(). |
opt |
Either "min" or "max" to extract results for he methods that obtained the minimum or maximum function value across the methods. |
out.list |
Dataframe with results from the different methods. |
par |
Parameter values from method that attained minimum/maximum across the methods. |
value |
Function value from method that attained minimum/maximum across the methods. |
Jens Hainmueller
Also see optimx.
The synth function takes a standard panel dataset and produces a list of data objects necessary for running synth and other Synth package functions to construct synthetic control groups according to the methods outlined in Abadie and Gardeazabal (2003) and Abadie, Diamond, Hainmueller (2010, 2011, 2014) (see references and example).
User supplies a dataframe ("foo"), chooses predictors, special predictors (explained below), the operators that act upon these predictors, the dependent variable, identifies the columns associated with unit numbers, time periods (and unit names, when available), as well as the treated unit, the control units, the time-period over which to select the predictors, the time-period over which to optimize, and the time-period over which outcome data should be plotted.
The output of dataprep contains a list of matrices. This list object can be directly loaded into synth.
dataprep(foo = NULL, predictors = NULL, predictors.op = "mean", special.predictors = NULL, dependent = NULL, unit.variable = NULL, time.variable = NULL, treatment.identifier = NULL, controls.identifier = NULL, time.predictors.prior = NULL, time.optimize.ssr = NULL, time.plot = time.optimize.ssr, unit.names.variable = NA)dataprep(foo = NULL, predictors = NULL, predictors.op = "mean", special.predictors = NULL, dependent = NULL, unit.variable = NULL, time.variable = NULL, treatment.identifier = NULL, controls.identifier = NULL, time.predictors.prior = NULL, time.optimize.ssr = NULL, time.plot = time.optimize.ssr, unit.names.variable = NA)
foo |
The dataframe with the panel data. |
predictors |
A vector of column numbers or column-name character strings that identifies the predictors' columns. All predictors have to be numeric. |
predictors.op |
A character string identifying the method (operator) to be used on the predictors. Default is "mean". rm.na = T is hardwired into the code. See *Details*. |
special.predictors |
A list object identifying additional numeric predictors and their associated pre-treatment years and operators (analogous to “predictors.op” above). See *Details*. |
dependent |
A scalar identifying the column number or column-name character string that corresponds to the numeric dependent (outcome) variable. |
unit.variable |
A scalar identifying the column number or column-name character string associated unit numbers. The unit.varibale has to be numeric. |
time.variable |
A scalar identifying column number or column-name character string associated with period (time) data. The time variable has to be numeric. |
treatment.identifier |
A scalar identifying the “unit.variable” number or a character string giving the “unit.name ”of the treated unit. If a character is supplied, a unit.names.variable also has to be supplied to identify the treated unit. |
controls.identifier |
A scalar identifying the “unit.variable” numbers or a vector of character strings giving the “unit.name”s of control units. If a character is supplied, a unit.names.variable also has to be supplied to identify the control units unit. |
time.predictors.prior |
A numeric vector identifying the pretreatment periods over which the values for the outcome predictors should be averaged. |
time.optimize.ssr |
A numeric vector identifying the periods of the dependent variable over which the loss function should be minimized (i.e. the periods over which mean squared prediction error (MSPE) , that is the sum of squared residuals between treated and the synthetic control unit, are minimized. |
time.plot |
A vector identifying the periods over which results are to be plotted
with |
unit.names.variable |
A scalar or column-name character string identifying the column with the names of the units. This variable has to be of mode character. |
The predictors.op argument is a character string that provides a function (eg., "mean", "median", etc.) identifying the name of the operator to be applied to the predictors over the given time period.
The special.predictors argument is a list object that contains one or more lists of length = 3. The required components of each of these lists are:
(a) scalar column number associated with that predictor (b) vector of time-period number(s) desired (eg., 1998:2003) (c) character-string identifying the name of the operation to be applied (ie., "mean", "median", etc.)
eg., special.predictors <- list(listc(x1, 1990:2000, "mean"), listc(x2, 1980:1983, "median"), listc(x3, 1980, "mean") )
indicates that predictor x1, should be used with its values averaged over periods 1990:2000; predicator x2 should be used with its median values over periods 1980:1983; x3 should be used with the values from period 1980 only.
X1 |
matrix of treated predictor data. nrows = number of predictors and (possibly) special predictors. ncols = one. |
X0 |
matrix of controls' predictor data. nrows = number of predictors and (possibly) special predictors. ncols = number of control units. |
Z1 |
matrix of treated outcome data for the pre-treatment periods over which MSPE is to be minimized. nrows = number of pre-treatment periods. ncols = one. |
Z0 |
matrix of controls' outcome data for the pre-treatment periods over which MSPE is to be minimized. nrows = number of pre-treatment periods. ncols = number of control units. |
Y1plot |
matrix of outcome data for treated unit to be used for results plotting. nrows = number of periods. ncols = one. |
Y0plot |
matrix of outcome data for control units to be used for results plotting. nrows = number of periods. ncols = number of control units. |
names.and.numbers |
dataframe with two columns showing all unit numbers and corresponding unit names. |
tag |
a list of all arguments in initial function call. |
Jens Hainmueller and Alexis Diamond
Abadie, A., Diamond, A., Hainmueller, J. (2014). Comparative Politics and the Synthetic Control Method. American Journal of Political Science 59(2): 495-510.
Synthetic : An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie, A., Diamond, A., Hainmueller, J. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie A, Diamond A, Hainmueller J (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association 105 (490) 493–505.
Abadie, A. and Gardeazabal, J. (2003) Economic Costs of Conflict: A Case Study of the Basque Country American Economic Review 93 (1) 113–132.
synth, gaps.plot, path.plot, synth.tab
## The usual sequence of commands is: ## 1. dataprep() for matrix-extraction ## 2. synth() for the construction of the synthetic control group ## 3. synth.tab(), gaps.plot(), and path.plot() to summarize the results ## Below we provide two examples. ## First Example: Toy panel dataset # load data data(synth.data) # create matrices from panel data that provide inputs for synth() dataprep.out<- dataprep( foo = synth.data, predictors = c("X1", "X2", "X3"), predictors.op = "mean", dependent = "Y", unit.variable = "unit.num", time.variable = "year", special.predictors = list( list("Y", 1991, "mean"), list("Y", 1985, "mean"), list("Y", 1980, "mean") ), treatment.identifier = 7, controls.identifier = c(29, 2, 13, 17, 32, 38), time.predictors.prior = c(1984:1989), time.optimize.ssr = c(1984:1990), unit.names.variable = "name", time.plot = 1984:1996 ) ## run the synth command to identify the weights ## that create the best possible synthetic ## control unit for the treated. synth.out <- synth(dataprep.out) ## there are two ways to summarize the results ## we can either access the output from synth.out directly round(synth.out$solution.w,2) # contains the unit weights or synth.out$solution.v ## contains the predictor weights. ## the output from synth opt ## can be flexibly combined with ## the output from dataprep to ## compute other quantities of interest ## for example, the period by period ## discrepancies between the ## treated unit and its synthetic control unit ## can be computed by typing gaps<- dataprep.out$Y1plot-( dataprep.out$Y0plot%*%synth.out$solution.w ) ; gaps ## also there are three convenience functions to summarize results. ## to get summary tables for all information ## (V and W weights plus balance btw. ## treated and synthetic control) use the ## synth.tab() command synth.tables <- synth.tab( dataprep.res = dataprep.out, synth.res = synth.out) print(synth.tables) ## to get summary plots for outcome trajectories ## of the treated and the synthetic control unit use the ## path.plot() and the gaps.plot() commands ## plot in levels (treated and synthetic) path.plot(dataprep.res = dataprep.out,synth.res = synth.out) ## plot the gaps (treated - synthetic) gaps.plot(dataprep.res = dataprep.out,synth.res = synth.out) ## Second example: The economic impact of terrorism in the ## Basque country using data from Abadie and Gardeazabal (2003) ## see JSS paper in the references details data(basque) # dataprep: prepare data for synth dataprep.out <- dataprep( foo = basque ,predictors= c("school.illit", "school.prim", "school.med", "school.high", "school.post.high" ,"invest" ) ,predictors.op = c("mean") ,dependent = c("gdpcap") ,unit.variable = c("regionno") ,time.variable = c("year") ,special.predictors = list( list("gdpcap",1960:1969,c("mean")), list("sec.agriculture",seq(1961,1969,2),c("mean")), list("sec.energy",seq(1961,1969,2),c("mean")), list("sec.industry",seq(1961,1969,2),c("mean")), list("sec.construction",seq(1961,1969,2),c("mean")), list("sec.services.venta",seq(1961,1969,2),c("mean")), list("sec.services.nonventa",seq(1961,1969,2),c("mean")), list("popdens",1969,c("mean"))) ,treatment.identifier = 17 ,controls.identifier = c(2:16,18) ,time.predictors.prior = c(1964:1969) ,time.optimize.ssr = c(1960:1969) ,unit.names.variable = c("regionname") ,time.plot = c(1955:1997) ) # 1. combine highest and second highest # schooling category and eliminate highest category dataprep.out$X1["school.high",] <- dataprep.out$X1["school.high",] + dataprep.out$X1["school.post.high",] dataprep.out$X1 <- as.matrix(dataprep.out$X1[ -which(rownames(dataprep.out$X1)=="school.post.high"),]) dataprep.out$X0["school.high",] <- dataprep.out$X0["school.high",] + dataprep.out$X0["school.post.high",] dataprep.out$X0 <- dataprep.out$X0[ -which(rownames(dataprep.out$X0)=="school.post.high"),] # 2. make total and compute shares for the schooling catgeories lowest <- which(rownames(dataprep.out$X0)=="school.illit") highest <- which(rownames(dataprep.out$X0)=="school.high") dataprep.out$X1[lowest:highest,] <- (100 * dataprep.out$X1[lowest:highest,]) / sum(dataprep.out$X1[lowest:highest,]) dataprep.out$X0[lowest:highest,] <- 100 * scale(dataprep.out$X0[lowest:highest,], center=FALSE, scale=colSums(dataprep.out$X0[lowest:highest,]) ) # run synth synth.out <- synth(data.prep.obj = dataprep.out) # Get result tables synth.tables <- synth.tab( dataprep.res = dataprep.out, synth.res = synth.out ) # results tables: print(synth.tables) # plot results: # path path.plot(synth.res = synth.out, dataprep.res = dataprep.out, Ylab = c("real per-capita GDP (1986 USD, thousand)"), Xlab = c("year"), Ylim = c(0,13), Legend = c("Basque country","synthetic Basque country"), ) ## gaps gaps.plot(synth.res = synth.out, dataprep.res = dataprep.out, Ylab = c("gap in real per-capita GDP (1986 USD, thousand)"), Xlab = c("year"), Ylim = c(-1.5,1.5), ) ## To create the placebo studies simply reassign ## the intervention to other units or times (see references for details)## The usual sequence of commands is: ## 1. dataprep() for matrix-extraction ## 2. synth() for the construction of the synthetic control group ## 3. synth.tab(), gaps.plot(), and path.plot() to summarize the results ## Below we provide two examples. ## First Example: Toy panel dataset # load data data(synth.data) # create matrices from panel data that provide inputs for synth() dataprep.out<- dataprep( foo = synth.data, predictors = c("X1", "X2", "X3"), predictors.op = "mean", dependent = "Y", unit.variable = "unit.num", time.variable = "year", special.predictors = list( list("Y", 1991, "mean"), list("Y", 1985, "mean"), list("Y", 1980, "mean") ), treatment.identifier = 7, controls.identifier = c(29, 2, 13, 17, 32, 38), time.predictors.prior = c(1984:1989), time.optimize.ssr = c(1984:1990), unit.names.variable = "name", time.plot = 1984:1996 ) ## run the synth command to identify the weights ## that create the best possible synthetic ## control unit for the treated. synth.out <- synth(dataprep.out) ## there are two ways to summarize the results ## we can either access the output from synth.out directly round(synth.out$solution.w,2) # contains the unit weights or synth.out$solution.v ## contains the predictor weights. ## the output from synth opt ## can be flexibly combined with ## the output from dataprep to ## compute other quantities of interest ## for example, the period by period ## discrepancies between the ## treated unit and its synthetic control unit ## can be computed by typing gaps<- dataprep.out$Y1plot-( dataprep.out$Y0plot%*%synth.out$solution.w ) ; gaps ## also there are three convenience functions to summarize results. ## to get summary tables for all information ## (V and W weights plus balance btw. ## treated and synthetic control) use the ## synth.tab() command synth.tables <- synth.tab( dataprep.res = dataprep.out, synth.res = synth.out) print(synth.tables) ## to get summary plots for outcome trajectories ## of the treated and the synthetic control unit use the ## path.plot() and the gaps.plot() commands ## plot in levels (treated and synthetic) path.plot(dataprep.res = dataprep.out,synth.res = synth.out) ## plot the gaps (treated - synthetic) gaps.plot(dataprep.res = dataprep.out,synth.res = synth.out) ## Second example: The economic impact of terrorism in the ## Basque country using data from Abadie and Gardeazabal (2003) ## see JSS paper in the references details data(basque) # dataprep: prepare data for synth dataprep.out <- dataprep( foo = basque ,predictors= c("school.illit", "school.prim", "school.med", "school.high", "school.post.high" ,"invest" ) ,predictors.op = c("mean") ,dependent = c("gdpcap") ,unit.variable = c("regionno") ,time.variable = c("year") ,special.predictors = list( list("gdpcap",1960:1969,c("mean")), list("sec.agriculture",seq(1961,1969,2),c("mean")), list("sec.energy",seq(1961,1969,2),c("mean")), list("sec.industry",seq(1961,1969,2),c("mean")), list("sec.construction",seq(1961,1969,2),c("mean")), list("sec.services.venta",seq(1961,1969,2),c("mean")), list("sec.services.nonventa",seq(1961,1969,2),c("mean")), list("popdens",1969,c("mean"))) ,treatment.identifier = 17 ,controls.identifier = c(2:16,18) ,time.predictors.prior = c(1964:1969) ,time.optimize.ssr = c(1960:1969) ,unit.names.variable = c("regionname") ,time.plot = c(1955:1997) ) # 1. combine highest and second highest # schooling category and eliminate highest category dataprep.out$X1["school.high",] <- dataprep.out$X1["school.high",] + dataprep.out$X1["school.post.high",] dataprep.out$X1 <- as.matrix(dataprep.out$X1[ -which(rownames(dataprep.out$X1)=="school.post.high"),]) dataprep.out$X0["school.high",] <- dataprep.out$X0["school.high",] + dataprep.out$X0["school.post.high",] dataprep.out$X0 <- dataprep.out$X0[ -which(rownames(dataprep.out$X0)=="school.post.high"),] # 2. make total and compute shares for the schooling catgeories lowest <- which(rownames(dataprep.out$X0)=="school.illit") highest <- which(rownames(dataprep.out$X0)=="school.high") dataprep.out$X1[lowest:highest,] <- (100 * dataprep.out$X1[lowest:highest,]) / sum(dataprep.out$X1[lowest:highest,]) dataprep.out$X0[lowest:highest,] <- 100 * scale(dataprep.out$X0[lowest:highest,], center=FALSE, scale=colSums(dataprep.out$X0[lowest:highest,]) ) # run synth synth.out <- synth(data.prep.obj = dataprep.out) # Get result tables synth.tables <- synth.tab( dataprep.res = dataprep.out, synth.res = synth.out ) # results tables: print(synth.tables) # plot results: # path path.plot(synth.res = synth.out, dataprep.res = dataprep.out, Ylab = c("real per-capita GDP (1986 USD, thousand)"), Xlab = c("year"), Ylim = c(0,13), Legend = c("Basque country","synthetic Basque country"), ) ## gaps gaps.plot(synth.res = synth.out, dataprep.res = dataprep.out, Ylab = c("gap in real per-capita GDP (1986 USD, thousand)"), Xlab = c("year"), Ylim = c(-1.5,1.5), ) ## To create the placebo studies simply reassign ## the intervention to other units or times (see references for details)
Loss function for the nested optimization of W and V weights used for constructing synthetic control groups according to the methods outlined in Abadie and Gardeazabal (2003) and Abadie, Diamond, Hainmueller (2010, 2011, 2014) (see references). This function is called by synth internally, and should not be called manually by a normal user.
fn.V(variables.v = stop("variables.v missing"), X0.scaled = stop("X0.scaled missing"), X1.scaled = stop("X1.scaled missing"), Z0 = stop("Z0 missing"), Z1 = stop("Z1 missing"), margin.ipop = 5e-04, sigf.ipop = 5, bound.ipop = 10, quadopt = "ipop", cvxr_pars = list(), torch_pars = list())fn.V(variables.v = stop("variables.v missing"), X0.scaled = stop("X0.scaled missing"), X1.scaled = stop("X1.scaled missing"), Z0 = stop("Z0 missing"), Z1 = stop("Z1 missing"), margin.ipop = 5e-04, sigf.ipop = 5, bound.ipop = 10, quadopt = "ipop", cvxr_pars = list(), torch_pars = list())
variables.v |
1 by k a vector of v weights. |
X0.scaled |
matrix of controls' predictor data. nrows = number of predictors and (possibly) special predictors. ncols = number of control units. |
X1.scaled |
matrix of treated predictor data. nrows = number of predictors and (possibly) special predictors. ncols = one. |
Z0 |
matrix of controls' outcome data for the pre-treatment periods over which MSPE is to be minimized. nrows = number of pre-treatment periods. ncols = number of control units. |
Z1 |
matrix of treated outcome data for the pre-treatment periods over which MSPE is to be minimized. nrows = number of pre-treatment periods. ncols = one. |
margin.ipop |
setting for ipop optimization routine: how close we get to the constrains
(see |
sigf.ipop |
setting for ipop optimization routine: Precision (default: 7 significant figures)
(see |
bound.ipop |
setting for ipop optimization routine: Clipping bound for the variables
(see |
quadopt |
string vector that specifies the routine for quadratic optimization over w weights.
One of |
cvxr_pars |
Optional named list of CVXR backend tuning. Recognized fields: |
torch_pars |
Optional named list of torch backend tuning. Recognized fields: |
A scalar that contains the function value.
Jens Hainmueller and Alexis Diamond
Abadie, A., Diamond, A., Hainmueller, J. (2014). Comparative Politics and the Synthetic Control Method. American Journal of Political Science 59(2): 495-510.
Synthetic : An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie, A., Diamond, A., Hainmueller, J. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie A, Diamond A, Hainmueller J (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association 105 (490) 493–505.
Abadie, A. and Gardeazabal, J. (2003) Economic Costs of Conflict: A Case Study of the Basque Country American Economic Review 93 (1) 113–132.
synth, dataprep, gaps.plot, path.plot,
synth.tab
This function plots the gaps in the trajectories of the outcome variable for the treated unit and the
synthetic control group constructed by synth and dataprep. The user can specify whether
the whole time period or only the pre-treatment period should be plotted.
gaps.plot(synth.res = NA, dataprep.res = NA, Ylab = c("Title"), Xlab = c("Time"), Main = c("Gaps: Treated - Synthetic"), tr.intake = NA, Ylim = NA, Z.plot = FALSE)gaps.plot(synth.res = NA, dataprep.res = NA, Ylab = c("Title"), Xlab = c("Time"), Main = c("Gaps: Treated - Synthetic"), tr.intake = NA, Ylim = NA, Z.plot = FALSE)
synth.res |
Output list created by |
dataprep.res |
Output list created by |
tr.intake |
Optional scalar to indicate the time of treatment intake with a vertical line. |
Ylab |
Optional label for Y axis. |
Xlab |
Optional label for X axis. |
Ylim |
Optional Ylim. |
Main |
Optional main title. |
Z.plot |
Flag. If true, only pretreatment period is plotted. |
The trajectory of the outcome for the synthetic control group is calculated as: dataprep.res$Y0plot %*% synth.res$solution.w. You can use this calculation to construct custom made plots.
The plot of trajectories.
Jens Hainmueller and Alexis Diamond
Abadie, A., Diamond, A., Hainmueller, J. (2014). Comparative Politics and the Synthetic Control Method. American Journal of Political Science 59(2): 495-510.
Synthetic : An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie, A., Diamond, A., Hainmueller, J. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie A, Diamond A, Hainmueller J (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association 105 (490) 493–505.
Abadie, A. and Gardeazabal, J. (2003) Economic Costs of Conflict: A Case Study of the Basque Country American Economic Review 93 (1) 113–132.
dataprep, synth, path.plot, synth.tab
For each donor in the original control pool, swap that donor into the treated slot, refit synth, and record the resulting gap series. The returned object feeds mspe_test, mspe_plot, and plot_placebos.
generate_placebos(synth.res = NULL, dataprep.res = NULL, Sigf.ipop = 5, Margin.ipop = 0.0005, Bound.ipop = 10, optimxmethod = c("Nelder-Mead", "BFGS"), genoud = FALSE, custom.v = NULL, verbose = FALSE, parallel = FALSE, n_cores = NULL, quadopt = "ipop", quadopt_inner = NULL, quadopt_outer = NULL, cvxr_pars = list(), cvxr_pars_inner = NULL, cvxr_pars_outer = NULL, torch_pars = list(), torch_pars_inner = NULL, torch_pars_outer = NULL, treatment_time = NULL, keep_fits = FALSE)generate_placebos(synth.res = NULL, dataprep.res = NULL, Sigf.ipop = 5, Margin.ipop = 0.0005, Bound.ipop = 10, optimxmethod = c("Nelder-Mead", "BFGS"), genoud = FALSE, custom.v = NULL, verbose = FALSE, parallel = FALSE, n_cores = NULL, quadopt = "ipop", quadopt_inner = NULL, quadopt_outer = NULL, cvxr_pars = list(), cvxr_pars_inner = NULL, cvxr_pars_outer = NULL, torch_pars = list(), torch_pars_inner = NULL, torch_pars_outer = NULL, treatment_time = NULL, keep_fits = FALSE)
synth.res |
Output list from |
dataprep.res |
Output list from |
Sigf.ipop, Margin.ipop, Bound.ipop, optimxmethod, custom.v, verbose
|
Passed through to each |
parallel |
Parallelization mode. One of |
n_cores |
Number of cores when |
quadopt, quadopt_inner, quadopt_outer, cvxr_pars, cvxr_pars_inner, cvxr_pars_outer, torch_pars, torch_pars_inner, torch_pars_outer
|
Forwarded to each placebo refit's |
genoud |
Forwarded to each placebo refit's |
treatment_time |
Optional first post-treatment period. Defaults to |
keep_fits |
If |
For donor , the swap moves donor 's columns from X0, Z0, and Y0plot into the treated slots, and the original treated unit takes donor 's former column in the control pool. The donor pool size is preserved.
Refits that error are caught with tryCatch; the corresponding failed flag is set and downstream functions exclude them from denominators.
An object of class synth_placebos, a list with components:
treated |
Named list with the real fit's gap series and MSPE summaries ( |
placebos |
Named list of length |
time |
Plot horizon, equal to |
pre_idx, post_idx
|
Integer indices into |
donor_names |
Character vector of donor labels. |
failed |
Logical vector aligned with |
The function names generate_placebos, mspe_test, mspe_plot, and plot_placebos match those in the SCtools package by design. If both packages are loaded, namespace-qualify (e.g., Synth::generate_placebos).
Jens Hainmueller and Alexis Diamond
Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. Journal of the American Statistical Association 105 (490) 493–505.
mspe_test, mspe_plot, plot_placebos, synth_inference.
## Not run: data(synth.data) d <- dataprep( foo = synth.data, predictors = c("X1", "X2", "X3"), predictors.op = "mean", dependent = "Y", unit.variable = "unit.num", time.variable = "year", treatment.identifier = 7, controls.identifier = c(29, 2, 13, 17, 32, 38), time.predictors.prior = 1984:1989, time.optimize.ssr = 1984:1990, unit.names.variable = "name", time.plot = 1984:1996 ) fit <- synth(d) pl <- generate_placebos(fit, d) print(pl) mspe_test(pl)$pvalue plot_placebos(pl) ## End(Not run)## Not run: data(synth.data) d <- dataprep( foo = synth.data, predictors = c("X1", "X2", "X3"), predictors.op = "mean", dependent = "Y", unit.variable = "unit.num", time.variable = "year", treatment.identifier = 7, controls.identifier = c(29, 2, 13, 17, 32, 38), time.predictors.prior = 1984:1989, time.optimize.ssr = 1984:1990, unit.names.variable = "name", time.plot = 1984:1996 ) fit <- synth(d) pl <- generate_placebos(fit, d) print(pl) mspe_test(pl)$pvalue plot_placebos(pl) ## End(Not run)
Plots the distribution of post/pre MSPE ratios across placebos with the treated unit highlighted. Operates on the output of generate_placebos.
mspe_plot(placebos, Main = "Post/Pre MSPE Ratio", Xlab = "MSPE ratio", Ylab = "")mspe_plot(placebos, Main = "Post/Pre MSPE Ratio", Xlab = "MSPE ratio", Ylab = "")
placebos |
Output of |
Main |
Optional main title. |
Xlab |
X axis label. |
Ylab |
Y axis label. |
Invisibly returns NULL.
generate_placebos, mspe_test, plot_placebos.
Computes a one-sided p-value for the treated unit's post/pre mean squared prediction error (MSPE) ratio against the empirical distribution of placebo ratios. Operates on the output of generate_placebos.
mspe_test(placebos)mspe_test(placebos)
placebos |
Output of |
The p-value is the empirical rank
where is the treated unit's post/pre MSPE ratio and is the corresponding ratio for placebo donor . Failed refits are excluded from the denominator.
A list with elements
mspe_ratio_treated |
The treated unit's post/pre MSPE ratio. |
mspe_ratios_placebos |
Numeric vector of placebo ratios, with |
pvalue |
One-sided empirical p-value. |
n_valid_placebos |
Count of placebos with finite ratios. |
Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. Journal of the American Statistical Association 105 (490) 493–505.
generate_placebos, mspe_plot, plot_placebos.
This function plots the trajectories of the outcome variable for the treated unit and the synthetic control group constructed by synth and dataprep. The user can specify whether the whole
time period or only the pretreatment period should be plotted.
path.plot(synth.res = NA, dataprep.res = NA, tr.intake = NA, Ylab = c("Y Axis"), Xlab = c("Time"), Ylim = NA, Legend=c("Treated","Synthetic"), Legend.position=c("topright"), Main = NA, Z.plot = FALSE)path.plot(synth.res = NA, dataprep.res = NA, tr.intake = NA, Ylab = c("Y Axis"), Xlab = c("Time"), Ylim = NA, Legend=c("Treated","Synthetic"), Legend.position=c("topright"), Main = NA, Z.plot = FALSE)
synth.res |
Output list created by |
dataprep.res |
Output list created by |
tr.intake |
Optional scalar to indicate the time of treatment intake with a vertical line. |
Ylab |
Optional label for Y axis. |
Xlab |
Optional label for X axis. |
Ylim |
Optional Ylim. |
Main |
Optional main title. |
Legend |
Optional legend text (e.g. c("Treated","Synthetic")); see ?legend for details. |
Legend.position |
Optional legend position (e.g. "bottomright"); see ?legend for details. |
Z.plot |
Flag. If true, only pretreatment period is plotted. |
The trajectory of the outcome for the synthetic control group is calculated as: dataprep.res$Y0plot%*% synth.res$solution.w. You can use this calculation to construct custom made plots.
The plot of trajectories.
Jens Hainmueller and Alexis Diamond
Abadie, A., Diamond, A., Hainmueller, J. (2014). Comparative Politics and the Synthetic Control Method. American Journal of Political Science 59(2): 495-510.
Synthetic : An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie, A., Diamond, A., Hainmueller, J. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie A, Diamond A, Hainmueller J (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association 105 (490) 493–505.
Abadie, A. and Gardeazabal, J. (2003) Economic Costs of Conflict: A Case Study of the Basque Country American Economic Review 93 (1) 113–132.
dataprep, gaps.plot, synth, synth.tab
Overlay plot of placebo gap series (in grey) with the treated gap highlighted in black. Operates on the output of generate_placebos.
plot_placebos(placebos, mspe_threshold = NULL, Ylab = "Gap", Xlab = "Time", Main = "Placebo Gaps", Ylim = NA, tr.intake = NA, treated_col = "black", placebo_col = "grey60")plot_placebos(placebos, mspe_threshold = NULL, Ylab = "Gap", Xlab = "Time", Main = "Placebo Gaps", Ylim = NA, tr.intake = NA, treated_col = "black", placebo_col = "grey60")
placebos |
Output of |
mspe_threshold |
Optional scalar; if supplied, placebos whose pre-period MSPE exceeds |
Ylab |
Y axis label. |
Xlab |
X axis label. |
Main |
Optional main title. |
Ylim |
Optional Ylim. If |
tr.intake |
Optional scalar locating the treatment date with a vertical dashed line. If |
treated_col |
Color for the treated gap line. |
placebo_col |
Color for placebo gap lines. |
Invisibly returns NULL.
Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. Journal of the American Statistical Association 105 (490) 493–505.
generate_placebos, mspe_test, mspe_plot.
Plots the treated and synthetic series with a shaded prediction band on the same axes. The band comes from synth_inference.
## S3 method for class 'synth_inference' plot(x, Ylab = "Y", Xlab = "Time", Main = NA, Ylim = NA, Legend = c("Treated", "Synthetic", paste0(100 * (1 - x$alpha), "% band")), Legend.position = "topright", tr.intake = NA, band.col = grDevices::rgb(0, 0, 0, 0.15), ...)## S3 method for class 'synth_inference' plot(x, Ylab = "Y", Xlab = "Time", Main = NA, Ylim = NA, Legend = c("Treated", "Synthetic", paste0(100 * (1 - x$alpha), "% band")), Legend.position = "topright", tr.intake = NA, band.col = grDevices::rgb(0, 0, 0, 0.15), ...)
x |
Object of class |
Ylab |
Y axis label. |
Xlab |
X axis label. |
Main |
Optional main title. |
Ylim |
Optional Ylim. If |
Legend |
Legend entries. Set to |
Legend.position |
Legend position; passed to |
tr.intake |
Optional scalar locating the treatment date with a vertical dashed line. If |
band.col |
Fill color for the prediction band; default a translucent grey. |
... |
Currently unused. |
Invisibly returns x.
A panel of per-capita cigarette sales for 39 US states from 1970 through 2000, plus four covariates (per-capita log income, beer consumption, share of population aged 15-24, and the retail price of cigarettes). This is the dataset used in the canonical synthetic control application of Abadie, Diamond, and Hainmueller (2010), which estimated the effect of California's 1988 Proposition 99 on cigarette consumption.
data(smoking)data(smoking)
A data.frame with 1209 rows and 8 columns:
state_idNumeric state identifier (3 = California).
state_nameState name as a character string.
yearCalendar year (1970-2000).
cigsalePer-capita cigarette sales in packs.
lnincomeLog per-capita state income.
beerPer-capita beer consumption.
age15to24Share of population aged 15-24.
retpriceRetail price of cigarettes.
lnincome and beer are missing for some early years; the
canonical Proposition 99 analysis matches on covariate averages over
windows where the data are observed.
Compiled from the data files distributed with Abadie, Diamond, and
Hainmueller (2010) and shipped with the Stata synth package.
Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493-505.
basque, synth, synth_inference.
## Not run: data(smoking) # California is state 3; treatment is Proposition 99 in late 1988, # so 1989 is the first post-treatment year. ca_id <- unique(smoking$state_id[smoking$state_name == "California"]) dp <- synth_data( panel = smoking, outcome = "cigsale", unit_col = "state_id", time_col = "year", treated = ca_id, treatment_time = 1989, predictors = c("lnincome", "age15to24", "retprice", "beer"), special_predictors = list( list("cigsale", 1988, "mean"), list("cigsale", 1980, "mean"), list("cigsale", 1975, "mean") ), unit_names_col = "state_name" ) fit <- synth(dp) inf <- synth_inference(fit, dp, method = "conformal", alpha = 0.10) plot(inf) ## End(Not run)## Not run: data(smoking) # California is state 3; treatment is Proposition 99 in late 1988, # so 1989 is the first post-treatment year. ca_id <- unique(smoking$state_id[smoking$state_name == "California"]) dp <- synth_data( panel = smoking, outcome = "cigsale", unit_col = "state_id", time_col = "year", treated = ca_id, treatment_time = 1989, predictors = c("lnincome", "age15to24", "retprice", "beer"), special_predictors = list( list("cigsale", 1988, "mean"), list("cigsale", 1980, "mean"), list("cigsale", 1975, "mean") ), unit_names_col = "state_name" ) fit <- synth(dp) inf <- synth_inference(fit, dp, method = "conformal", alpha = 0.10) plot(inf) ## End(Not run)
This function is called by dataprep to handle special predictors in the process of setting up the dataset to be loaded into synth. It should not be called manually by the normal user.
spec.pred.func(list.object = NULL, tr.numb = NULL, co.numb = NULL, unit.var = NULL, time.var = NULL, foo.object = NULL, X0.inner = NULL, X1.inner = NULL)spec.pred.func(list.object = NULL, tr.numb = NULL, co.numb = NULL, unit.var = NULL, time.var = NULL, foo.object = NULL, X0.inner = NULL, X1.inner = NULL)
list.object |
NA |
tr.numb |
NA |
co.numb |
NA |
unit.var |
NA |
time.var |
NA |
foo.object |
NA |
X0.inner |
NA |
X1.inner |
NA |
NA
NA
Jens Hainmueller and Alexis Diamond
Abadie, A., Diamond, A., Hainmueller, J. (2014). Comparative Politics and the Synthetic Control Method. American Journal of Political Science 59(2): 495-510.
Synthetic : An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie, A., Diamond, A., Hainmueller, J. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie A, Diamond A, Hainmueller J (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association 105 (490) 493–505.
Abadie, A. and Gardeazabal, J. (2003) Economic Costs of Conflict: A Case Study of the Basque Country American Economic Review 93 (1) 113–132.
synth, dataprep, gaps.plot, path.plot, synth.tab
Implements the synthetic control method for causal inference in comparative case studies as developed in Abadie and Gardeazabal (2003) and Abadie, Diamond, and Hainmueller (2010, 2011, 2014). synth estimates the effect of an intervention by comparing the evolution of an aggregate outcome for a unit affected by the intervention to the evolution of the same aggregate outcome for a synthetic control group.
synth constructs this synthetic control group by searching for a weighted combination of control units chosen to approximate the unit affected by the intervention in terms of characteristics that are predictive of the outcome. The evolution of the outcome for the resulting synthetic control group is an estimate of the counterfactual of what would have been observed for the affected unit in the absence of the intervention.
synth can also be used to conduct a variety of placebo and permutation tests that produce informative inference regardless of the number of available comparison units and the number of available time periods. See Abadie and Gardeazabal (2003), Abadie, Diamond, and Hainmueller (2010, 2011, 2014) for details.
synth requires the user to supply four matrices as its main arguments. These matrices are named X0, X1, Z1, and Z0 accordingly. X1 and X0 contain the predictor values for the treated unit and the control units respectively. Z1 and Z0 contain the outcome variable for the pre-intervention period for the treated unit and the control units respectively. The pre-intervention period refers to the time period prior to the intervention, over which the mean squared prediction error (MSPE) should be minimized. The MSPE refers to the squared deviations between the outcome for the treated unit and the synthetic control unit summed over all pre-intervention periods specified in Z1 and Z0.
Creating the matrices X1, X0, Z1, and Z0 from a (panel) dataset can be tedious. Therefore, the Synth package offers a preparatory function called dataprep that allows the user to easily create all inputs required for synth. By first calling dataprep, the user creates a single list object called data.prep.obj that contains all essential data elements to run synth.
A usual sequence of commands to implement the synthetic control method is to first call dataprep to prepare the data, then call synth to construct the synthetic control group, and finally summarize results using the functions synth.tab, path.plot, or gaps.plot.
An example of this sequence is provided in the documentation to dataprep. This procedure is strongly recommended. Alternatively, the user may provide their own preprocessed data matrices and load them into synth via the X0, X1, Z1, and Z0 arguments. In this case, no data.prep.obj should be specified.
The output from synth is a list object that contains the weights on predictors (solution.V) and weights on control units (solution.W) that define contributions to the synthetic control unit.
synth(data.prep.obj = NULL, X1 = NULL, X0 = NULL, Z0 = NULL, Z1 = NULL, custom.v = NULL, optimxmethod = c("Nelder-Mead", "BFGS"), genoud = FALSE, quadopt = "ipop", quadopt_inner = NULL, quadopt_outer = NULL, cvxr_pars = list(), cvxr_pars_inner = NULL, cvxr_pars_outer = NULL, torch_pars = list(), torch_pars_inner = NULL, torch_pars_outer = NULL, Margin.ipop = 5e-04, Sigf.ipop = 5, Bound.ipop = 10, verbose = FALSE, ...)synth(data.prep.obj = NULL, X1 = NULL, X0 = NULL, Z0 = NULL, Z1 = NULL, custom.v = NULL, optimxmethod = c("Nelder-Mead", "BFGS"), genoud = FALSE, quadopt = "ipop", quadopt_inner = NULL, quadopt_outer = NULL, cvxr_pars = list(), cvxr_pars_inner = NULL, cvxr_pars_outer = NULL, torch_pars = list(), torch_pars_inner = NULL, torch_pars_outer = NULL, Margin.ipop = 5e-04, Sigf.ipop = 5, Bound.ipop = 10, verbose = FALSE, ...)
data.prep.obj |
The object produced by |
X1 |
Matrix of treated predictor data. Rows correspond to predictors, columns to a single treated unit. |
X0 |
Matrix of control units’ predictor data. Rows correspond to predictors, columns to control units (>=2). |
Z1 |
Matrix of treated outcome data for the pre-treatment periods over which MSPE is minimized. |
Z0 |
Matrix of control units’ outcome data for the pre-treatment periods over which MSPE is minimized. |
custom.v |
Vector of weights for predictors supplied by the user. Uses |
optimxmethod |
Character vector specifying optimization algorithms to be used. Permissible values are all optimization algorithms currently implemented in the function |
genoud |
Logical flag. If |
quadopt |
Character specifying the routine for quadratic optimization over W weights. One of When |
quadopt_inner |
Optional character. If non- |
quadopt_outer |
Optional character. If non- |
cvxr_pars |
Optional named list of tuning parameters forwarded to the CVXR backend when |
torch_pars |
Optional named list of tuning parameters forwarded to the torch backend when |
cvxr_pars_inner, cvxr_pars_outer, torch_pars_inner, torch_pars_outer
|
Optional per-stage overrides for backend tuning. Each defaults to |
Margin.ipop |
Setting for the |
Sigf.ipop |
Setting for the |
Bound.ipop |
Setting for the |
verbose |
Logical flag. If |
... |
As proposed in Abadie and Gardeazabal (2003) and Abadie, Diamond, and Hainmueller (2010), synth searches for the set of weights that generate the best-fitting convex combination of control units. The predictor weight matrix V is chosen among positive definite diagonal matrices such that MSPE is minimized for the pre-intervention period.
Alternatively, the user may supply a vector of V weights based on a subjective assessment of the predictive power of the variables in X1 and X0. In this case, specify custom.V in synth, and the optimization over V matrices is bypassed.
solution.v |
Vector of predictor weights. |
solution.w |
Vector of weights across control units. |
loss.v |
MSPE from optimization over V and W weights. |
loss.w |
Loss from optimization over W weights. |
custom.v |
If specified, returns the user-supplied weight vector. |
rgV.optim |
Results from |
Jens Hainmueller and Alexis Diamond
Abadie, A., Diamond, A., Hainmueller, J. (2014). Comparative Politics and the Synthetic Control Method. American Journal of Political Science.
Abadie, A., Diamond, A., Hainmueller, J. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software, 42(13), 1–17.
Abadie, A., Diamond, A., Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505.
Abadie, A., and Gardeazabal, J. (2003). Economic Costs of Conflict: A Case Study of the Basque Country. American Economic Review, 93(1), 113–132.
dataprep, gaps.plot, path.plot, synth.tab
data(synth.data) dataprep.out <- dataprep(foo = synth.data, predictors = c("X1","X2","X3"), predictors.op = "mean", dependent = "Y", unit.variable = "unit.num", time.variable = "year", special.predictors = list( list("Y",1991,"mean"), list("Y",1985,"mean"), list("Y",1980,"mean")), treatment.identifier = 7, controls.identifier = c(29,2,13,17,32,38), time.predictors.prior = 1984:1989, time.optimize.ssr = 1984:1990, unit.names.variable = "name", time.plot = 1984:1996) synth.out <- synth(dataprep.out) path.plot(dataprep.res = dataprep.out, synth.res = synth.out) gaps.plot(dataprep.res = dataprep.out, synth.res = synth.out) ## Not run: ## Alternative quadopt backends (Synth 1.2-0+; both in Suggests:). ## Defaults to ipop. CVXR uses OSQP by default (a CVXR Imports, so ## always available); pass cvxr_pars = list(solver = "SCS") or ## "ECOS" (the latter needs ECOSolveR). Torch uses Frank-Wolfe ## simplex least squares with exact line search and can run on ## GPU/MPS via the device argument. ## ## On the Basque example all three backends produce essentially the ## same synthetic control: identical pre-MSPE (0.0089), post-MSPE ## within 0.001, and ATT estimates within 0.001 of each other. The ## CVXR and torch backends are slower because they are invoked on ## every fn.V() evaluation inside optimx's V-search. # install.packages("CVXR") synth.cvxr <- synth(dataprep.out, quadopt = "cvxr") # install.packages("torch"); torch::install_torch() synth.torch <- synth(dataprep.out, quadopt = "torch") ## To keep V-search fast (ipop) and use a modern solver only for ## the final W solve, set quadopt_outer alone: synth.fast.cvxr <- synth(dataprep.out, quadopt_outer = "cvxr") ## End(Not run)data(synth.data) dataprep.out <- dataprep(foo = synth.data, predictors = c("X1","X2","X3"), predictors.op = "mean", dependent = "Y", unit.variable = "unit.num", time.variable = "year", special.predictors = list( list("Y",1991,"mean"), list("Y",1985,"mean"), list("Y",1980,"mean")), treatment.identifier = 7, controls.identifier = c(29,2,13,17,32,38), time.predictors.prior = 1984:1989, time.optimize.ssr = 1984:1990, unit.names.variable = "name", time.plot = 1984:1996) synth.out <- synth(dataprep.out) path.plot(dataprep.res = dataprep.out, synth.res = synth.out) gaps.plot(dataprep.res = dataprep.out, synth.res = synth.out) ## Not run: ## Alternative quadopt backends (Synth 1.2-0+; both in Suggests:). ## Defaults to ipop. CVXR uses OSQP by default (a CVXR Imports, so ## always available); pass cvxr_pars = list(solver = "SCS") or ## "ECOS" (the latter needs ECOSolveR). Torch uses Frank-Wolfe ## simplex least squares with exact line search and can run on ## GPU/MPS via the device argument. ## ## On the Basque example all three backends produce essentially the ## same synthetic control: identical pre-MSPE (0.0089), post-MSPE ## within 0.001, and ATT estimates within 0.001 of each other. The ## CVXR and torch backends are slower because they are invoked on ## every fn.V() evaluation inside optimx's V-search. # install.packages("CVXR") synth.cvxr <- synth(dataprep.out, quadopt = "cvxr") # install.packages("torch"); torch::install_torch() synth.torch <- synth(dataprep.out, quadopt = "torch") ## To keep V-search fast (ipop) and use a modern solver only for ## the final W solve, set quadopt_outer alone: synth.fast.cvxr <- synth(dataprep.out, quadopt_outer = "cvxr") ## End(Not run)
synth() from a panel data frameA friendlier wrapper around dataprep. Takes a long-format
panel data frame plus the names of the unit, time, and outcome columns,
and returns a dataprep-shaped list ready to pass to
synth and the inference functions.
Defaults:
controls = NULL uses every panel unit other than the treated one.
predictors = NULL fits the synthetic control on the outcome alone (via special_predictors or the implicit pre-period match).
plot_periods = NULL uses the full panel range.
pre_periods = NULL uses every panel time strictly before treatment_time.
synth_data(panel, outcome, treated, controls = NULL, unit_col, time_col, treatment_time, predictors = NULL, predictors.op = "mean", special_predictors = list(), pre_periods = NULL, plot_periods = NULL, unit_names_col = NULL)synth_data(panel, outcome, treated, controls = NULL, unit_col, time_col, treatment_time, predictors = NULL, predictors.op = "mean", special_predictors = list(), pre_periods = NULL, plot_periods = NULL, unit_names_col = NULL)
panel |
Long-format |
outcome |
Name of the outcome column (character). |
treated |
The treated unit. Either a numeric id (matched against |
controls |
Optional vector of control units. Same id-or-name convention as |
unit_col |
Name of the unit-id column (character). Must be numeric. |
time_col |
Name of the time column (character). Must be numeric. |
treatment_time |
Single numeric value: the first post-treatment period. Used as the SSR-window cutoff and as the default treatment date for downstream inference. |
predictors |
Character vector of predictor column names. |
predictors.op |
Aggregation op for |
special_predictors |
List of |
pre_periods |
Optional integer vector of pre-treatment time values. Defaults to every |
plot_periods |
Optional integer vector of times for the full plot horizon. Defaults to the full panel range. |
unit_names_col |
Optional column name carrying readable unit labels. Required when |
A dataprep-shaped list (the same structure returned by dataprep). The list also carries tag$synth_data_treatment_time and an attr(., "synth_data_call") for debugging.
dataprep for the long-form constructor; synth, synth_inference, generate_placebos for what to do next.
## Not run: data(basque) # Equivalent to the long dataprep() example, in one call: dp <- synth_data( panel = basque, outcome = "gdpcap", unit_col = "regionno", time_col = "year", treated = 17, # Basque country controls = c(2:16, 18), treatment_time = 1970, predictors = c("school.illit", "school.prim", "school.med", "school.high", "school.post.high", "invest"), special_predictors = list( list("gdpcap", 1960:1969, "mean"), list("sec.agriculture", seq(1961, 1969, 2), "mean") ), unit_names_col = "regionname" ) fit <- synth(dp) inf <- synth_inference(fit, dp, method = "conformal", alpha = 0.10) ## End(Not run)## Not run: data(basque) # Equivalent to the long dataprep() example, in one call: dp <- synth_data( panel = basque, outcome = "gdpcap", unit_col = "regionno", time_col = "year", treated = 17, # Basque country controls = c(2:16, 18), treatment_time = 1970, predictors = c("school.illit", "school.prim", "school.med", "school.high", "school.post.high", "invest"), special_predictors = list( list("gdpcap", 1960:1969, "mean"), list("sec.agriculture", seq(1961, 1969, 2), "mean") ), unit_names_col = "regionname" ) fit <- synth(dp) inf <- synth_inference(fit, dp, method = "conformal", alpha = 0.10) ## End(Not run)
Computes a prediction band around the synthetic counterfactual produced by synth and dataprep. Two methods are supported: split-conformal intervals (the default) and parametric Gaussian intervals.
synth_inference(synth.res = NULL, dataprep.res = NULL, method = c("conformal", "parametric"), alpha = 0.05, treatment_time = NULL)synth_inference(synth.res = NULL, dataprep.res = NULL, method = c("conformal", "parametric"), alpha = 0.05, treatment_time = NULL)
synth.res |
Output list from |
dataprep.res |
Output list from |
method |
One of |
alpha |
Miscoverage level. The band targets nominal |
treatment_time |
Optional first post-treatment period (a value of |
With method = "conformal", the half-width of the band is the order statistic at rank of the absolute pre-treatment residuals (Chernozhukov, Wuthrich, and Zhu 2021). The adjustment delivers exact finite-sample coverage under exchangeability. When the requested level is infeasible at this calibration sample size, the function emits a warning, and conformal_q is Inf.
With method = "parametric", the half-width is qnorm(1 - alpha/2) times the standard deviation of pre-period residuals. This assumes residuals are i.i.d. Gaussian.
Both methods produce constant-width bands. They do not separately quantify uncertainty about the synthetic weights or decompose in-sample versus out-of-sample uncertainty. Users who need period-varying intervals or that decomposition should see the scpi package.
Validity in both methods is approximate when outcomes are autocorrelated.
An object of S3 class c("synth_<method>", "synth_inference"), a list with components:
method |
The method used (echoed). |
alpha |
Miscoverage level (echoed). |
time |
The full plot horizon ( |
pre_idx, post_idx
|
Integer indices into |
treated, synthetic
|
Numeric vectors over |
effect |
|
intervals |
Numeric matrix with |
conformal_q |
(conformal only) The |
sigma_pre |
(parametric only) Standard deviation of pre-period residuals. |
pre_mspe, post_mspe, mspe_ratio
|
Mean squared prediction error in the pre- and post-treatment periods, and their ratio. |
Jens Hainmueller and Alexis Diamond
Chernozhukov, V., Wuthrich, K., and Zhu, Y. (2021). An exact and robust conformal inference method for counterfactual and synthetic controls. Journal of the American Statistical Association 116 (536) 1849–1864.
Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies. Journal of the American Statistical Association 105 (490) 493–505.
Cattaneo, M. D., Feng, Y., Palomba, F., and Titiunik, R. (2025). Uncertainty quantification in synthetic controls with staggered treatment adoption. Review of Economics and Statistics.
synth, dataprep, plot.synth_inference, generate_placebos.
The SCtools package provides additional placebo-based inference; the scpi package provides period-varying CFPT prediction intervals that decompose in-sample and out-of-sample uncertainty.
## Not run: data(synth.data) d <- dataprep( foo = synth.data, predictors = c("X1", "X2", "X3"), predictors.op = "mean", dependent = "Y", unit.variable = "unit.num", time.variable = "year", treatment.identifier = 7, controls.identifier = c(29, 2, 13, 17, 32, 38), time.predictors.prior = 1984:1989, time.optimize.ssr = 1984:1990, unit.names.variable = "name", time.plot = 1984:1996 ) fit <- synth(d) inf <- synth_inference(fit, d, method = "conformal", alpha = 0.10) print(inf) plot(inf) ## End(Not run)## Not run: data(synth.data) d <- dataprep( foo = synth.data, predictors = c("X1", "X2", "X3"), predictors.op = "mean", dependent = "Y", unit.variable = "unit.num", time.variable = "year", treatment.identifier = 7, controls.identifier = c(29, 2, 13, 17, 32, 38), time.predictors.prior = 1984:1989, time.optimize.ssr = 1984:1990, unit.names.variable = "name", time.plot = 1984:1996 ) fit <- synth(d) inf <- synth_inference(fit, d, method = "conformal", alpha = 0.10) print(inf) plot(inf) ## End(Not run)
This artificial panel data set is used to demonstrate the use of the Synthetic Control Method.
data(synth.data)data(synth.data)
A dataframe made up of 8 units: 1 treated (no 7) and 7 control (no. 2,7,13,17,29,32,36,38) , 3 predictors (X1, X2, X3), 21 time periods (1980 - 2000), a unit.names.variable column ("names") and an outcome variable column (Y). All columns have column names.
This function is called after dataprep and synth in order to create tables summarizing the results of the
run of the synthetic control method. The result tables can be latexed directly.
synth.tab(synth.res = NA, dataprep.res = NA, round.digit = 3)synth.tab(synth.res = NA, dataprep.res = NA, round.digit = 3)
synth.res |
The list resulting from the call to |
dataprep.res |
The list resulting from the call to |
round.digit |
Integer for rounding in tables. |
NA
tab.v |
The matrix that contains the table of V-weights and respective variable names. |
tab.w |
The matrix that contains the table of W-weights and respective unit numbers and possibly names. |
tab.loss |
The matrix that contains the table of W-loss and V-loss |
Jens Hainmueller and Alexis Dimaond
Abadie, A., Diamond, A., Hainmueller, J. (2014). Comparative Politics and the Synthetic Control Method. American Journal of Political Science 59(2): 495-510.
Synthetic : An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie, A., Diamond, A., Hainmueller, J. (2011). Synth: An R Package for Synthetic Control Methods in Comparative Case Studies. Journal of Statistical Software 42 (13) 1–17.
Abadie A, Diamond A, Hainmueller J (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program. Journal of the American Statistical Association 105 (490) 493–505.
Abadie, A. and Gardeazabal, J. (2003) Economic Costs of Conflict: A Case Study of the Basque Country American Economic Review 93 (1) 113–132.
synth, dataprep, gaps.plot, path.plot