| Title: | Kernel-Based Regularized Least Squares |
|---|---|
| Description: | Implements Kernel-based Regularized Least Squares (KRLS), a machine learning method to fit multidimensional functions y = f(x) for regression and classification problems without relying on linearity or additivity assumptions. KRLS finds the best fitting function by minimizing the squared loss of a Tikhonov regularization problem, using Gaussian kernels as radial basis functions. For further details see Hainmueller and Hazlett (2014, <doi:10.1093/pan/mpt019>). |
| Authors: | Jens Hainmueller [aut, cre], Chad Hazlett [aut] |
| Maintainer: | Jens Hainmueller <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.7-0 |
| Built: | 2026-06-05 19:54:05 UTC |
| Source: | https://github.com/j-hai/krls |
Pick the Gaussian kernel bandwidth that maximizes
the variance of the off-diagonal entries of the kernel matrix K.
The idea is that this choice makes the columns of K most
informative: at very small or very large , the
off-diagonal entries collapse toward 0 or 1 respectively and carry
little discriminating signal.
This is the bandwidth-selection convention used in
kbal::b_maxvarK and the GPSS package; in KRLS 1.7+ it is also
the default when sigma = NULL in krls.
b_maxvarK_nystrom is the Nystrom-aware variant: it evaluates
the variance of the entries of the n by m cross-kernel
between observations and landmarks
instead of forming the full n by n kernel.
b_maxvarK(X_proc, search_lower = 1e-6, search_upper = NULL, tol = .Machine$double.eps^0.25) b_maxvarK_nystrom(X_proc, Z_proc, search_lower = 1e-6, search_upper = NULL, tol = .Machine$double.eps^0.25)b_maxvarK(X_proc, search_lower = 1e-6, search_upper = NULL, tol = .Machine$double.eps^0.25) b_maxvarK_nystrom(X_proc, Z_proc, search_lower = 1e-6, search_upper = NULL, tol = .Machine$double.eps^0.25)
X_proc |
A numeric matrix in the same kernel-ready form that |
Z_proc |
Landmark matrix for the Nystrom variant. Same column structure as
|
search_lower, search_upper
|
Lower and upper bounds of the bandwidth search interval passed to
|
tol |
Tolerance forwarded to |
A list with components
sigma |
the selected bandwidth. |
var_K (or var_C) |
the value of the off-diagonal variance at the selected bandwidth. |
search_lower, search_upper
|
the bounds used for the search. |
Hazlett, C. (2020). Kernel Balancing: A Flexible Non-Parametric Weighting Procedure for Estimating Causal Effects. Statistica Sinica.
Internal function that is called by krls to computes first differences for binary predictors in the X matrix. It would normally not be called by the user directly.
fdskrls(object,...)fdskrls(object,...)
object |
Object from call to |
... |
additional arguments to be passed to lower level functions |
A object of class krls where the derivatives, average derivatives, and the varinaces of the average derivatives are
replaced with the first differences for binary predictors. The binaryindicator is also updated and set to TRUE for binary predictors.
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
Given a N by D numeric data matrix, this function computes the N by N distance matrix with the pairwise distances between the rows of the data matrix as measured by a Gaussian Kernel.
gausskernel(X = NULL, sigma = NULL)gausskernel(X = NULL, sigma = NULL)
X |
N by N numeric data matrix. |
sigma |
Positive scalar that specifies the bandwidth of the Gaussian kernel (see details). |
Given two D dimensional vectors and . The Gaussian kernel is defined as
where is the Euclidean distance given by
and is the bandwidth of the kernel.
Note that the Gaussian kernel is a measure of similarity between and . It evalues to 1 if the and are identical, and approaches 0 as and move further apart.
The function relies on the dist function in the stats package for an initial estimate of the euclidean distance.
An N by N numeric distance matrix that contains the pairwise distances between the rows in X.
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
dist function in the stats package.
X <- matrix(rnorm(6),ncol=2) gausskernel(X=X,sigma=1)X <- matrix(rnorm(6),ncol=2) gausskernel(X=X,sigma=1)
Returns the landmark coordinate matrix used by an approx = "nystrom"
fit. Landmarks are stored internally in the standardized X-space the model
operates in; by default this accessor un-standardizes them so the returned
matrix is in the original X units and can be passed back through
krls(..., landmarks = ...) on a comparable dataset without a
standardize-twice bug.
get_landmarks(fit, scale = c("original", "standardized"))get_landmarks(fit, scale = c("original", "standardized"))
fit |
A fitted |
scale |
String selecting the coordinate system of the returned matrix.
|
An m by D numeric matrix of landmark coordinates with column
names inherited from fit$X.
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
## Not run: fit <- krls(X, y, approx = "nystrom", nystrom_m = 30) Z <- get_landmarks(fit) # original-scale m x D matrix # Reuse the same landmarks on a sensitivity-check refit with a new y: fit2 <- krls(X, y_alt, approx = "nystrom", landmarks = Z) ## End(Not run)## Not run: fit <- krls(X, y, approx = "nystrom", nystrom_m = 30) Z <- get_landmarks(fit) # original-scale m x D matrix # Reuse the same landmarks on a sensitivity-check refit with a new y: fit2 <- krls(X, y_alt, approx = "nystrom", landmarks = Z) ## End(Not run)
Function implements Kernel-Based Regularized Least Squares (KRLS), a machine learning method described in Hainmueller and Hazlett (2014) that allows users to solve regression and classification problems without manual specification search and strong functional form assumptions. KRLS finds the best fitting function by minimizing a Tikhonov regularization problem with a squared loss, using Gaussian Kernels as radial basis functions. KRLS reduces misspecification bias since it learns the functional form from the data. Yet, it nevertheless allows for interpretability and inference in ways similar to ordinary regression models. In particular, KRLS provides closed-form estimates for the predicted values, variances, and the pointwise partial derivatives that characterize the marginal effects of each independent variable at each data point in the covariate space. The distribution of pointwise marginal effects can be used to examine effect heterogeneity and or interactions.
krls(X = NULL, y = NULL, whichkernel = "gaussian", lambda = NULL, sigma = NULL, derivative = TRUE, binary= TRUE, vcov=TRUE, print.level = 1,L=NULL,U=NULL,tol=NULL,eigtrunc=NULL,data=NULL, approx = c("auto", "none", "nystrom"), nystrom_m = NULL, landmarks = NULL, landmark_method = c("random", "kmeans"), nystrom_eps = sqrt(.Machine$double.eps), landmark_seed = NULL, lambda_method = c("loo", "gcv"), cat_columns = NULL)krls(X = NULL, y = NULL, whichkernel = "gaussian", lambda = NULL, sigma = NULL, derivative = TRUE, binary= TRUE, vcov=TRUE, print.level = 1,L=NULL,U=NULL,tol=NULL,eigtrunc=NULL,data=NULL, approx = c("auto", "none", "nystrom"), nystrom_m = NULL, landmarks = NULL, landmark_method = c("random", "kmeans"), nystrom_eps = sqrt(.Machine$double.eps), landmark_seed = NULL, lambda_method = c("loo", "gcv"), cat_columns = NULL)
X |
For the matrix interface (the original): an N by D numeric data matrix that contains the values of D predictor variables for For the formula interface (added in 1.2-0): a two-sided formula of the form |
y |
N by 1 data numeric matrix or vector that contains the values of the response variable for all observations. This vector may not contain missing values. Ignored when |
data |
For the formula interface: a |
whichkernel |
String vector that specifies which kernel should be used. Must be one of |
lambda |
A positive scalar that specifies the |
sigma |
A positive scalar that specifies the bandwidth of the Gaussian kernel (see Default changed in 1.7-0. When |
derivative |
Logical that specifies whether pointwise partial derivatives should be computed. Currently, derivatives are only implemented for the Gaussian Kernel. |
binary |
Logical that specifies whether first-differences instead of pointwise partial derivatives should be computed for binary predictors. Ignored unless |
vcov |
Logical that specifies whether variance-covariance matrix for the choice coefficients c and fitted values should be computed. Note that |
print.level |
Positive integer that determines the level of printing. Set to 0 for no printing and 2 for more printing. |
L |
Non-negative scalar that determines the lower bound of the search window for the leave-one-out optimization to find |
U |
Positive scalar that determines the upper bound of the search window for the leave-one-out optimization to find |
tol |
Positive scalar that determines the tolerance used in the optimization routine used to find |
eigtrunc |
Positive scalar that determines how much eignvalues should be trunacted for finding the upper bound of the search window in the algorithm outlined in |
approx |
String selecting the approximation regime.
|
nystrom_m |
Integer number of landmark points to use when |
landmarks |
Optional landmark specification for |
landmark_method |
String selecting the auto-selection rule for landmarks when
|
nystrom_eps |
Positive scalar controlling the relative-ridge stabilization of the
landmark kernel W. Eigenvalues of W below
|
landmark_seed |
Optional integer seed used only for landmark selection when
|
cat_columns |
Optional categorical-column declaration (new in 1.7-0). One of:
Declared columns are one-hot encoded with all levels (no reference
cell) and multiplied by There is no autodetection. Pass |
lambda_method |
String selecting the objective minimized to choose
|
krls implements the Kernel-based Regularized Least Squares (KRLS) estimator as described in Hainmueller and Hazlett (2014). Please consult this reference for any details.
Kernel-based Regularized Least Squares (KRLS) arises as a Tikhonov minimization problem with a squared loss. Assume we have data of the from where i indexes observations, is the outcome and is a D-dimensional vector of predictor values. Then KRLS searches over a space of functions and chooses the best fitting function according to the rule:
where is a loss function that computes how ‘wrong’ the function
is at each observation i and is the regularizer that measures the complexity of the function according to the norm . is the scalar regularization parameter that governs the tradeoff between model fit and complexity. By default, is chosen by minimizing the sum of the squared leave-one-out errors, but it can also be specified by the user in the lambda argument to implement other approaches.
Under fairly general conditions, the function that minimizes the regularized loss
within the hypothesis space established by the choice of a (positive semidefinite) kernel function is of the form
where the kernel function measures the distance
between two observations and and is the choice coefficient for each observation . Let be the by kernel matrix with all pairwise distances and be the by vector of choice coefficients for all observations then in matrix notation the space is .
Accordingly, the krls function solves the following minimization problem
which is convex in and solved by where is the identity matrix. Note that this linear solution provides a flexible fitted response surface that typically reduces misspecification bias because it can learn a wide range of nonlinear and or nonadditive functions of the predictors.
If vcov=TRUE is specified, exact KRLS also computes the variance-covariance matrix for the choice coefficients and fitted values based on a variance estimator developed in Hainmueller and Hazlett (2014). Note that both exact matrices are N by N and therefore this results in increased memory and computing time. Under approx = "nystrom", vcov=TRUE computes conditional approximate variances in the m-landmark feature space and does not store the full fitted-value covariance matrix.
By default, krls uses the Gaussian Kernel (whichkernel = "gaussian") given by
where is the Euclidean distance. The kernel bandwidth is set to , the number of dimensions, by default, but the user can also specify other values using the sigma argument to implement other approaches.
If derivative=TRUE is specified, krls also computes the pointwise partial derivatives of the fitted function wrt to each predictor using the estimators developed in Hainmueller and Hazlett (2014). These can be used to examine the marginal effects of each predictor and how the marginal effects vary across the covariate space. Average derivatives are also computed with variances when vcov=TRUE. Note that the derivative=TRUE option results in increased computing time and is only supported for the Gaussian kernel, i.e. when whichkernel = "gaussian". For the exact path, derivative=TRUE requires vcov=TRUE; under approx = "nystrom", derivative=TRUE, vcov=FALSE returns derivative point estimates without standard errors.
If binary=TRUE is also specified, the function will identify binary predictors and return first differences for these predictors instead of partial derivatives. First differences are computed going from the minimum to the maximum value of each binary predictor. Note that first differences are more appropriate to summarize the effects for binary predictors (see Hainmueller and Hazlett (2014) for details).
A few other kernels are also implemented, but derivatives are currently not supported for these: "linear": , "poly1", "poly2", "poly3", "poly4" are polynomial kernels based on where is the order.
A list object of class krls with the following elements:
K |
N by N matrix of pairwise kernel distances between observations. |
coeffs |
N by 1 vector of choice coefficients c. |
Le |
scalar with sum of squared leave-one-out errors. |
fitted |
N by 1 vector of fitted values. |
X |
original N by D predictor data matrix. |
y |
original N by 1 matrix of values of the outcome variable. |
sigma |
scalar with value of bandwidth, |
lambda |
scalar with value of regularization parameter, |
R2 |
scalar with value of R-squared |
vcov.c |
Variance covariance matrix for choice coefficients (N by N for exact KRLS; m by m conditional approximate covariance under |
vcov.fitted |
N by N variance covariance matrix for fitted values under exact KRLS ( |
derivatives |
N by D matrix of pointwise partial derivatives based on the Gaussian kernel ( |
avgderivatives |
1 by D matrix of average derivative based on the Gaussian kernel ( |
var.avgderivatives |
1 by D matrix of variances for average derivative based on gaussian kernel ( |
binaryindicator |
1 by D matrix that indicates for each predictor if it is treated as binary or not (evaluates to FALSE unless |
The function requires the storage of a N by N kernel matrix and can therefore exceed the memory limits for very large datasets.
Setting derivative=FALSE and vcov=FALSE is useful to reduce computing time if pointwise partial derivatives and or variance covariance matrices are not needed.
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
Jeremy Ferwerda, Jens Hainmueller, Chad J. Hazlett (2017). Kernel-Based Regularized Least Squares in R (KRLS) and Stata (krls). Journal of Statistical Software, 79(3), 1-26. doi:10.18637/jss.v079.i03
Hainmueller, J. and Hazlett, C. (2014). Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach. Political Analysis, 22(2)
Rifkin, R. 2002. Everything Old is New Again: A fresh look at historical approaches in machine learning. Thesis, MIT. September, 2002.
Evgeniou, T., Pontil, M., and Poggio, T. (2000). Regularization networks and support vector machines. Advances In Computational Mathematics, 13(1):1-50.
Schoelkopf, B., Herbrich, R. and Smola, A.J. (2001) A generalized representer theorem. In 14th Annual Conference on Computational Learning Theory, pages 416-426.
Kimeldorf, G.S. Wahba, G. 1971. Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33:82-95.
predict.krls for fitted values and predictions. summary.krls for summary of the fit. plot.krls for plots of the fit.
# Linear example # set up data N <- 200 x1 <- rnorm(N) x2 <- rbinom(N,size=1,prob=.2) y <- x1 + .5*x2 + rnorm(N,0,.15) X <- cbind(x1,x2) # fit model krlsout <- krls(X=X,y=y) # summarize marginal effects and contribution of each variable summary(krlsout) # plot marginal effects and conditional expectation plots plot(krlsout) # non-linear example # set up data N <- 200 x1 <- rnorm(N) x2 <- rbinom(N,size=1,prob=.2) y <- x1^3 + .5*x2 + rnorm(N,0,.15) X <- cbind(x1,x2) # fit model krlsout <- krls(X=X,y=y) # summarize marginal effects and contribution of each variable summary(krlsout) # plot marginal effects and conditional expectation plots plot(krlsout) ## 2D example: # predictor data X <- matrix(seq(-3,3,.1)) # true function Ytrue <- sin(X) # add noise Y <- sin(X) + rnorm(length(X),sd=.3) # approximate function using KRLS out <- krls(y=Y,X=X) # get fitted values and ses fit <- predict(out,newdata=X,se.fit=TRUE) # results par(mfrow=c(2,1)) plot(y=Ytrue,x=X,type="l",col="red",ylim=c(-1.2,1.2),lwd=2,main="f(x)") points(y=fit$fit,X,col="blue",pch=19) arrows(y1=fit$fit+1.96*fit$se.fit, y0=fit$fit-1.96*fit$se.fit, x1=X,x0=X,col="blue",length=0) legend("bottomright",legend=c("true f(x)=sin(x)","KRLS fitted f(x)"), lty=c(1,NA),pch=c(NA,19),lwd=c(2,NA),col=c("red","blue"),cex=.8) plot(y=cos(X),x=X,type="l",col="red",ylim=c(-1.2,1.2),lwd=2,main="df(x)/dx") points(y=out$derivatives,X,col="blue",pch=19) legend("bottomright",legend=c("true df(x)/dx=cos(x)","KRLS fitted df(x)/dx"), lty=c(1,NA),pch=c(NA,19),lwd=c(2,NA),col=c("red","blue"),,cex=.8) ## 3D example # plot true function par(mfrow=c(1,2)) f<-function(x1,x2){ sin(x1)*cos(x2)} x1 <- x2 <-seq(0,2*pi,.2) z <-outer(x1,x2,f) persp(x1, x2, z,theta=30,main="true f(x1,x2)=sin(x1)cos(x2)") # approximate function with KRLS # data and outcomes X <- cbind(sample(x1,200,replace=TRUE),sample(x2,200,replace=TRUE)) y <- f(X[,1],X[,2])+ runif(nrow(X)) # fit surface krlsout <- krls(X=X,y=y) # plot fitted surface ff <- function(x1i,x2i,krlsout){predict(object=krlsout,newdata=cbind(x1i,x2i))$fit} z <- outer(x1,x2,ff,krlsout=krlsout) persp(x1, x2, z,theta=30,main="KRLS fitted f(x1,x2)")# Linear example # set up data N <- 200 x1 <- rnorm(N) x2 <- rbinom(N,size=1,prob=.2) y <- x1 + .5*x2 + rnorm(N,0,.15) X <- cbind(x1,x2) # fit model krlsout <- krls(X=X,y=y) # summarize marginal effects and contribution of each variable summary(krlsout) # plot marginal effects and conditional expectation plots plot(krlsout) # non-linear example # set up data N <- 200 x1 <- rnorm(N) x2 <- rbinom(N,size=1,prob=.2) y <- x1^3 + .5*x2 + rnorm(N,0,.15) X <- cbind(x1,x2) # fit model krlsout <- krls(X=X,y=y) # summarize marginal effects and contribution of each variable summary(krlsout) # plot marginal effects and conditional expectation plots plot(krlsout) ## 2D example: # predictor data X <- matrix(seq(-3,3,.1)) # true function Ytrue <- sin(X) # add noise Y <- sin(X) + rnorm(length(X),sd=.3) # approximate function using KRLS out <- krls(y=Y,X=X) # get fitted values and ses fit <- predict(out,newdata=X,se.fit=TRUE) # results par(mfrow=c(2,1)) plot(y=Ytrue,x=X,type="l",col="red",ylim=c(-1.2,1.2),lwd=2,main="f(x)") points(y=fit$fit,X,col="blue",pch=19) arrows(y1=fit$fit+1.96*fit$se.fit, y0=fit$fit-1.96*fit$se.fit, x1=X,x0=X,col="blue",length=0) legend("bottomright",legend=c("true f(x)=sin(x)","KRLS fitted f(x)"), lty=c(1,NA),pch=c(NA,19),lwd=c(2,NA),col=c("red","blue"),cex=.8) plot(y=cos(X),x=X,type="l",col="red",ylim=c(-1.2,1.2),lwd=2,main="df(x)/dx") points(y=out$derivatives,X,col="blue",pch=19) legend("bottomright",legend=c("true df(x)/dx=cos(x)","KRLS fitted df(x)/dx"), lty=c(1,NA),pch=c(NA,19),lwd=c(2,NA),col=c("red","blue"),,cex=.8) ## 3D example # plot true function par(mfrow=c(1,2)) f<-function(x1,x2){ sin(x1)*cos(x2)} x1 <- x2 <-seq(0,2*pi,.2) z <-outer(x1,x2,f) persp(x1, x2, z,theta=30,main="true f(x1,x2)=sin(x1)cos(x2)") # approximate function with KRLS # data and outcomes X <- cbind(sample(x1,200,replace=TRUE),sample(x2,200,replace=TRUE)) y <- f(X[,1],X[,2])+ runif(nrow(X)) # fit surface krlsout <- krls(X=X,y=y) # plot fitted surface ff <- function(x1i,x2i,krlsout){predict(object=krlsout,newdata=cbind(x1i,x2i))$fit} z <- outer(x1,x2,ff,krlsout=krlsout) persp(x1, x2, z,theta=30,main="KRLS fitted f(x1,x2)")
Function conducts leave-one-out optimization to find using a golden search search with caching. This function is called internally by krls. It would normally not be called by the user directly.
lambdasearch(L=NULL, U=NULL, y=NULL, Eigenobject=NULL, tol=NULL, noisy=FALSE, eigtrunc=NULL, lambda_method = c("loo", "gcv"))lambdasearch(L=NULL, U=NULL, y=NULL, Eigenobject=NULL, tol=NULL, noisy=FALSE, eigtrunc=NULL, lambda_method = c("loo", "gcv"))
L |
Non-negative scalar that determines the lower bound of the search window. Default is |
U |
Positive scalar that determines the upper bound of the search window. Default is |
y |
N by 1 matrix of outcomes. |
Eigenobject |
List that contains the eigenvalues and eigenvectors of the kernel matrix K. |
tol |
Positive scalar that determines the tolerance used in the optimization routine used to find |
noisy |
If |
eigtrunc |
Positive scalar value that determines truncation of eigenvalues for lamnda search window. See |
lambda_method |
String selecting the objective minimized to choose |
By default, upper bound is found as follows: Set j to n, decrease by one until the following is longer true: sum(EigenValues / (EigenValues + j)) < 1.
By default, upper bound is found as follows: Get the position, q, of the eigenvalue that is closest to max(Eigenvalue)/1000. Set j to 0, increase in steps of 0.05 until the below is longer true: sum(EigenValues / (EigenValues + j)) > q.
A scalar that contains the that minimizes the sum of squared leave-one-out errors.
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
Internal function that computes Leave-On-Out (LOO) Error for KRLS given a fixed value for lambda (the parameter that governs the tradeoff between model fit and complexity in KRLS).
This function is called internally by krls to find value of lambda that minimizes the LOO error. It would normally not be called by the user directly.
looloss(y = NULL, Eigenobject = NULL, lambda = NULL,eigtrunc=NULL)looloss(y = NULL, Eigenobject = NULL, lambda = NULL,eigtrunc=NULL)
y |
n by 1 vector of outcomes. |
Eigenobject |
Object from call to |
lambda |
Positive scalar value for lamnbda parameter. |
eigtrunc |
Positive scalar value that determines truncation of eigenvalues for lamnda search window. See |
Scalar value for LOO error.
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
Produces two types of plots. The first type of plot shows histograms for the pointwise partial derivatives to examine the heterogeneity in the marginal effects of each predictor (which==1). The second type of plot shows estimates of the conditional expectation functions of for each predictor (which==2). For each plot, the predictor of interest varies from its 1st to its 3rd quartile values, while the other predictors are kept at the means (or other values specified in setx). For binary varibales the are predicted at the max and the min value of the predictor (instead of the range from the 1st to the 3rd quantile).
## S3 method for class 'krls' plot(x,which=c(1:2), main="distributions of pointwise marginal effects", setx="mean",ask = prod(par("mfcol")) < nplots,nvalues=50,probs=c(.25,.75),...)## S3 method for class 'krls' plot(x,which=c(1:2), main="distributions of pointwise marginal effects", setx="mean",ask = prod(par("mfcol")) < nplots,nvalues=50,probs=c(.25,.75),...)
x |
An object of class " |
which |
if a subset of the plots is required, specify a subset of the numbers |
main |
main title for histograms of pointwise partial derivatives. |
setx |
either one of |
ask |
logical; if |
nvalues |
scalar that specifies the number of values at which conditional expectations should be plotted. |
probs |
vector with numbers between 0 and 1 that specify the quantiles that determine the range for of the predictor values for which the conditional expectation should be plotted. By default we vary each predictor from the 1st quartile to the 3rd quartile value. |
... |
additional arguments to be passed to lower level functions |
Notice that the historgrams for the partial derivatives can only be plotted if the KRLS object was computed with krls(,derivatives=TRUE).
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
# non-linear example # set up data N <- 200 x1 <- rnorm(N) x2 <- rbinom(N,size=1,prob=.2) y <- x1^3 + .5*x2 + rnorm(N,0,.15) X <- cbind(x1,x2) # fit model krlsout <- krls(X=X,y=y) # summarize marginal effects and contribution of each variable summary(krlsout) # plot marginal effects and conditional expectation plots plot(krlsout)# non-linear example # set up data N <- 200 x1 <- rnorm(N) x2 <- rbinom(N,size=1,prob=.2) y <- x1^3 + .5*x2 + rnorm(N,0,.15) X <- cbind(x1,x2) # fit model krlsout <- krls(X=X,y=y) # summarize marginal effects and contribution of each variable summary(krlsout) # plot marginal effects and conditional expectation plots plot(krlsout)
Predicted values and standard errors based on krls model object.
## S3 method for class 'krls' predict(object, newdata, se.fit = FALSE , ...)## S3 method for class 'krls' predict(object, newdata, se.fit = FALSE , ...)
object |
Fitted KRLS model, i.e. an object of class |
newdata |
A data frame or matrix with variables values at which to predict the outcome. Number and order of columns in |
se.fit |
logical flag if standard errors should be computed for pointwise predictions. |
... |
additional arguments affecting the predictions produced. |
Function produces predicted values, obtained by evaluating the fitted krls function with
the newdata (ie. the test points). The prediction at a new test point is based on
where is the kernel matrix and thus
is a vector whose j-th entry is (e.g. the distance between the test point and the training point ). The training points are passed to the function with the krls fit in object.
When data are missing in newdata during prediction, the value of each is computed by using an adjusted Euclidean distance in the kernel definition. Assume is D-dimensional but a given pair of observations and have only non-missing dimensions in common. The adjusted Euclidean distance computes the sum of squared differences over the non-missing dimensions, rescales this sum by , and takes the square root. The result corresponds to an assumption that conditional on the observed data, the missing values would not have contributed new information predictive of the outcome.
fit |
M by 1 vector of fitted values for M test points. |
se.fit |
M by 1 vector of standard errors for the fitted values for M test points ( |
vcov.fit |
M by M variance-covariance matrix for the fitted values for M test points ( |
newdata |
M by D data matrix of of M test points with D predictors. |
newdataK |
M by N data matrix for pairwise Gauss Kernel distances between M test points and N training points from krls model fit in |
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
# make up data X <- seq(-3,3,.1) Y <- sin(X) + rnorm(length(X),.1) # fit krls krlsout <- krls(y=Y,X=X) # get in-sample prediction predin <- predict(krlsout,newdata=X,se.fit=TRUE) # get out-of-sample prediction X2 <- runif(5) predout <- predict(krlsout,newdata=X2,se.fit=TRUE) # plot true function and predictions plot(y=sin(X),x=X,type="l",col="red",ylim=c(-1.8,1.8),lwd=2,ylab="f(X)") points(y=predin$fit,x=X,col="blue",pch=19) arrows(y1=predin$fit+2*predin$se.fit, y0=predin$fit-2*predin$se.fit, x1=X,x0=X,col="blue",length=0) points(y=predout$fit,x=X2,col="green",pch=17) arrows(y1=predout$fit+2*predout $se.fit, y0=predout$fit-2*predout $se.fit, x1=X2,x0=X2,col="green",length=0) legend("bottomright", legend=c("true f(x)=sin(X)", "KRLS fitted in-sample", "KRLS fitted out-of-sample"), lty=c(1,NA,NA),pch=c(NA,19,17), lwd=c(2,NA,NA), col=c("red","blue","green"), cex=.8)# make up data X <- seq(-3,3,.1) Y <- sin(X) + rnorm(length(X),.1) # fit krls krlsout <- krls(y=Y,X=X) # get in-sample prediction predin <- predict(krlsout,newdata=X,se.fit=TRUE) # get out-of-sample prediction X2 <- runif(5) predout <- predict(krlsout,newdata=X2,se.fit=TRUE) # plot true function and predictions plot(y=sin(X),x=X,type="l",col="red",ylim=c(-1.8,1.8),lwd=2,ylab="f(X)") points(y=predin$fit,x=X,col="blue",pch=19) arrows(y1=predin$fit+2*predin$se.fit, y0=predin$fit-2*predin$se.fit, x1=X,x0=X,col="blue",length=0) points(y=predout$fit,x=X2,col="green",pch=17) arrows(y1=predout$fit+2*predout $se.fit, y0=predout$fit-2*predout $se.fit, x1=X2,x0=X2,col="green",length=0) legend("bottomright", legend=c("true f(x)=sin(X)", "KRLS fitted in-sample", "KRLS fitted out-of-sample"), lty=c(1,NA,NA),pch=c(NA,19,17), lwd=c(2,NA,NA), col=c("red","blue","green"), cex=.8)
Internal function that computes choice coefficients for KRLS given a fixed value for lambda (the parameter that governs the tradeoff between model fit and complexity in KRLS).
This function is called internally by krls. It would normally not be called by the user directly.
solveforc(y = NULL, Eigenobject = NULL, lambda = NULL,eigtrunc=NULL)solveforc(y = NULL, Eigenobject = NULL, lambda = NULL,eigtrunc=NULL)
y |
n by 1 matrix of outcomes. |
Eigenobject |
Object from call to |
lambda |
Positive scalar value for lamnbda parameter. |
eigtrunc |
Positive scalar value that determines truncation of eigenvalues for lamnda search window. See |
Function relies on fast eigenvalue decomposition method described in method Rifkin and Lippert (2007).
coeffs |
n by 1 one matrix of choice coefficients for KRLS model. |
Le |
n by 1 matrix of errors from leave-one-out validation. |
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
Rifkin, Ryan M. and Lippert, Ross A. (2007). Notes on Regularized Least Squares. MIT-CSAIL-TR-2007-025. CBCL-268
Summarizes average partial derivatives (i.e. marginal effects) and the distribution of the partial derivatives for each predictor. For binary predictors, the marginal effects are the first differences if krls(,derivatives=TRUE,binary=TRUE) was specified.
## S3 method for class 'krls' summary(object, probs=c(.25,.5,.75),...)## S3 method for class 'krls' summary(object, probs=c(.25,.5,.75),...)
object |
Fitted krls model, i.e. an object of class krls |
probs |
numeric vector with numbers between 0 and 1 that specify the quantiles of the pointwise marginal effects for the summary (see the |
... |
additional arguments to be passed to lower level functions |
Notice that the partial derivatives can only be summarized if the krls object was computed with krls(,derivatives=TRUE).
coefficients |
matrix with average partial derivates and or first differences (point estimates, standart errors, t-values, p-values). |
qcoefficients |
matrix with 1st, 2nd, and 3rd quatriles of distribution of pointwise marinal effects. |
Jens Hainmueller (Stanford) and Chad Hazlett (UCLA)
# non-linear example # set up data N <- 200 x1 <- rnorm(N) x2 <- rbinom(N,size=1,prob=.2) y <- x1^3 + .5*x2 + rnorm(N,0,.15) X <- cbind(x1,x2) # fit model krlsout <- krls(X=X,y=y) # summarize marginal effects and contribution of each variable summary(krlsout) # plot marginal effects and conditional expectation plots plot(krlsout)# non-linear example # set up data N <- 200 x1 <- rnorm(N) x2 <- rbinom(N,size=1,prob=.2) y <- x1^3 + .5*x2 + rnorm(N,0,.15) X <- cbind(x1,x2) # fit model krlsout <- krls(X=X,y=y) # summarize marginal effects and contribution of each variable summary(krlsout) # plot marginal effects and conditional expectation plots plot(krlsout)