Linear regression with constraints on the coefficients
solve.QP
can be passed arbitrary linear constraints, so it can certainly be used to model your constraints a+c >= 0
and c >= 0
.
First, we can add a column of 1's to X
to capture the intercept term, and then we can replicate standard linear regression with solve.QP
:
X2 <- cbind(X, 1)
library(quadprog)
solve.QP(t(X2) %*% X2, t(Y) %*% X2, matrix(0, 3, 0), c())$solution
# [1] 0.08614041 0.21433372 -0.13267403
With the sample data from the question, neither constraint is met using standard linear regression.
By modifying both the Amat
and bvec
parameters, we can add our two constraints:
solve.QP(t(X2) %*% X2, t(Y) %*% X2, cbind(c(1, 0, 1), c(0, 0, 1)), c(0, 0))$solution
# [1] 0.0000000 0.1422207 0.0000000
Subject to these constraints, the squared residuals are minimized by setting the a and c coefficients to both equal 0.
You can handle missing values in Y
or X2
just as the lm
function does, by removing the offending observations. You might do something like the following as a pre-processing step:
has.missing <- rowSums(is.na(cbind(Y, X2))) > 0
Y <- Y[!has.missing]
X2 <- X2[!has.missing,]
Constrained Linear Regression in Python
Recent scipy versions include a solver:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.lsq_linear.html#scipy.optimize.lsq_linear
Impose Constraint on Intercept in Linear Regression Using R
A linear model.
m0 <- lm(wt ~ qsec + hp + disp, data = mtcars)
m0
#
# Call:
# lm(formula = wt ~ qsec + hp + disp, data = mtcars)
#
# Coefficients:
# (Intercept) qsec hp disp
# -2.450047 0.201713 0.003466 0.006755
Force the intercept to be zero.
m1 <- lm(wt ~ qsec + hp + disp - 1, data = mtcars)
m1
#
# Call:
# lm(formula = wt ~ qsec + hp + disp - 1, data = mtcars)
#
# Coefficients:
# qsec hp disp
# 0.0842281 0.0002622 0.0072967
You can use nls
to apply limits to the paramaters (in this case the lower limit).
m1n <- nls(wt ~ a + b1 * qsec + b2 * hp + b3 * disp,
data = mtcars,
start = list(a = 1, b1 = 1, b2 = 1, b3 = 1),
lower = c(0, -Inf, -Inf, -Inf), algorithm = "port")
m1n
# Nonlinear regression model
# model: wt ~ a + b1 * qsec + b2 * hp + b3 * disp
# data: mtcars
# a b1 b2 b3
# 0.0000000 0.0842281 0.0002622 0.0072967
# residual sum-of-squares: 4.926
#
# Algorithm "port", convergence message: relative convergence (4)
See here for other example solutions.
Simple linear regression with constraint
This works nicely,
from scipy.optimize import minimize
# Define the Model
model = lambda b, X: b[0] * X[:,0] + b[1] * X[:,1] + b[2] * X[:,2]
# The objective Function to minimize (least-squares regression)
obj = lambda b, Y, X: np.sum(np.abs(Y-model(b, X))**2)
# Bounds: b[0], b[1], b[2] >= 0
bnds = [(0, None), (0, None), (0, None)]
# Constraint: b[0] + b[1] + b[2] - 1 = 0
cons = [{"type": "eq", "fun": lambda b: b[0]+b[1]+b[2] - 1}]
# Initial guess for b[1], b[2], b[3]:
xinit = np.array([0, 0, 1])
res = minimize(obj, args=(Y, X), x0=xinit, bounds=bnds, constraints=cons)
print(f"b1={res.x[0]}, b2={res.x[1]}, b3={res.x[2]}")
#Save the coefficients for further analysis on goodness of fit
beta1 = res.x[0]
beta2 = res.x[1]
beta3 = res.x[2]
Related Topics
Get Start and End Index of Runs of Values
Ggplot2 Log Transformation for Data and Scales
Combine (Bind) Existing PDF Files in R
Embed Instagram/Youtube into Shiny R App
Extract First N Digits from a String
How to Use Multiple Cores to Make Gganimate Faster
Fill Missing Values in The Data.Frame with The Data from The Same Data Frame
Fast Alternative to Split in R
Remove Certain Words in String from Column in Dataframe in R
Fill in Gaps (E.G. Not Single Cells) of Na Values in Raster Using a Neighborhood Analysis
Spread with Duplicate Identifiers for Rows
How to Add Geo-Spatial Connections on a Ggplot Map
Error: C Stack Usage Is Too Close to The Limit in R
Why Can't One Have Several 'Value.Var' in 'Dcast'
Passing a List of Arguments to a Function with Quasiquotation
R Aggregate Data.Frame with Date Column
Using Recordlinkage to Add a Column with a Number for Each Person