predict.lm with arbitrary coefficients r
Hacking the $coefficients
element does indeed seem to work. Can you show what doesn't work for you?
dd <- data.frame(x=1:5,y=1:5)
m1 <- lm(y~x,dd)
m1$coefficients <- c(-2,1)
m1
## Call:
## lm(formula = y ~ x, data = dd)
##
## Coefficients:
## [1] -2 1
predict(m1,newdata=data.frame(x=7)) ## 5 = -2+1*7
predict.lm(...)
gives the same results.
I would be very careful with this approach, checking each time you do something different with the hacked model.
In general it would be nice if predict
and simulate
methods took a newparams
argument, but they don't in general ...
R predict() dependent variable Y=f(X) using arbitrary coefficients
Changing the coefficients should do it. However, if you're predicting with the original data by just relying on the predict function using your original data if not passed any additional data to predict (i.e. not using newdata=something
) it uses the fitted values from the model object directly.
You can bypass this by telling it to use newdata=your_original_data
in your call to predict.
An example
> dat <- mtcars[1:5,]
> results <- glm(vs ~ mpg, data = dat, family = binomial(link = 'logit'))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
> predict(results)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
-23.66411 -23.66411 186.49174 23.03719 -292.19658
> coefficients(results)
(Intercept) mpg
-2475.4823 116.7532
> results$coefficients[1] <- 0
> predict(results) # uses original fitted values
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
-23.66411 -23.66411 186.49174 23.03719 -292.19658
> predict(results, newdata = dat)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
2451.818 2451.818 2661.974 2498.519 2183.286
Use Dataframe of Coefficients to Make Test Set Predictions in R
I think you are trying to do something like this
set.seed(2021)
test_data <- data.frame(x=rnorm(7), y=rnorm(7))
test_data$xx <- test_data$x * test_data$x
test_data$xy <- test_data$x * test_data$y
print(test_data)
# x y xx xy
# 1 -0.1224600 0.91556637 0.01499645 -0.112120244
# 2 0.5524566 0.01377194 0.30520833 0.007608399
# 3 0.3486495 1.72996316 0.12155648 0.603150795
# 4 0.3596322 -1.08220485 0.12933535 -0.389195760
# 5 0.8980537 -0.27282518 0.80650043 -0.245011659
# 6 -1.9225695 0.18199540 3.69627356 -0.349898808
# 7 0.2617444 1.50854179 0.06851011 0.394852311
coeff <- c(x=-2, y=-1, xx=+3, xy=+2, constant=+7)
predictions <- as.matrix(cbind(test_data,1)) %*% coeff
print(predictions)
# [,1]
# [1,] 6.150102
# [2,] 6.812157
# [3,] 6.143709
# [4,] 6.972555
# [5,] 7.406196
# [6,] 21.052167
# [7,] 5.963204
where -2*-0.1224600 -1*0.91556637 +3*0.01499645 +2*-0.112120244 +7
is 6.150102
predict' gives different results than using manually the coefficients from 'summary'
The difference is really small, and I think is just due to the accuracy of the coefficients you are using (e.g. the real value of the intercept is -0.17947075338464965610...
not simply -.17947
).
In fact, if you take the coefficients value and apply the formula, the result is equal to predict:
intercept <- model$coefficients[1]
x1Coeff <- model$coefficients[2]
x2Coeff <- model$coefficients[3]
x3Coeff <- model$coefficients[4]
intercept + x1Coeff*x1[121:150] + x2Coeff*x2[121:150] + x3Coeff*x3[121:150]
Coefficient of determination in predict()
RMSE <- function(fitted, true){
sqrt(mean((fitted - true)^2))
}
R2 <- function(fitted, true){
1 - (sum((true - fitted)^2)/sum((true - mean(true))^2))
}
fitted values are in model2, while true values are probably in responce variable from daten2
RMSE(model2, daten2$y)
R2(model2, daten2$y)
The point is: to calculate a goodness of fit metric you need to supply the true outcomes along with the predicted values. "predict.lm" just provides the predicted values
Manually build logistic regression model for prediction in R
I could not find a simple function to do this. There is some code in the predict
function that depends on having a fitted model (like determining the rank of the model). However, we can create a function to make a fake glm object that you can use with predict. Here's my first attempt at such a function
makeglm <- function(formula, family, data=NULL, ...) {
dots <- list(...)
out<-list()
tt <- terms(formula, data=data)
if(!is.null(data)) {
mf <- model.frame(tt, data)
vn <- sapply(attr(tt, "variables")[-1], deparse)
if((yvar <- attr(tt, "response"))>0)
vn <- vn[-yvar]
xlvl <- lapply(data[vn], function(x) if (is.factor(x))
levels(x)
else if (is.character(x))
levels(as.factor(x))
else
NULL)
attr(out, "xlevels") <- xlvl[!vapply(xlvl,is.null,NA)]
attr(tt, "dataClasses") <- sapply(data[vn], stats:::.MFclass)
}
out$terms <- tt
coef <- numeric(0)
stopifnot(length(dots)>1 & !is.null(names(dots)))
for(i in seq_along(dots)) {
if((n<-names(dots)[i]) != "") {
v <- dots[[i]]
if(!is.null(names(v))) {
coef[paste0(n, names(v))] <- v
} else {
stopifnot(length(v)==1)
coef[n] <- v
}
} else {
coef["(Intercept)"] <- dots[[i]]
}
}
out$coefficients <- coef
out$rank <- length(coef)
out$qr <- list(pivot=seq_len(out$rank))
out$family <- if (class(family) == "family") {
family
} else if (class(family) == "function") {
family()
} else {
stop(paste("invalid family class:", class(family)))
}
out$deviance <- 1
out$null.deviance <- 1
out$aic <- 1
class(out) <- c("glm","lm")
out
}
So this function creates an object and passes all the values that predict
and print
expect to find on such an object. Now we can test it out. First, here's some test data
set.seed(15)
dd <- data.frame(
X1=runif(50),
X2=factor(sample(letters[1:4], 50, replace=T)),
X3=rpois(50, 5),
Outcome = sample(0:1, 50, replace=T)
)
And we can fit a standard binomial model with
mymodel<-glm(Outcome~X1+X2+X3, data=dd, family=binomial)
Which gives
Call: glm(formula = Outcome ~ X1 + X2 + X3, family = binomial, data = dd)
Coefficients:
(Intercept) X1 X2b X2c X2d X3
-0.4411 0.8853 1.8384 0.9455 1.5059 -0.1818
Degrees of Freedom: 49 Total (i.e. Null); 44 Residual
Null Deviance: 68.03
Residual Deviance: 62.67 AIC: 74.67
Now let's say we wanted to try out model that we read in a publication on the same data. Here's how we use the makeglm
function
newmodel <- makeglm(Outcome~X1+X2+X3, binomial, data=dd,
-.5, X1=1, X2=c(b=1.5, c=1, d=1.5), X3=-.15)
The first parameter is the formula of the model. This defines the response and the covariates just like you would when running glm
. Next you specify the family like you would with glm()
. And you need to pass a data frame so R can sniff the correct data types for each of the variables involved. This will also identify all of the factor variables and their levels using the data.frame. So this can be new data that's coded just like the fitted data.frame or it can be the original one.
Now we start to specify the coefficients to use in our model. The coefficients will be filled using the names of the parameters. The unnamed parameter will be used as the intercept. If you have a factor, you need to give coefficients to all the levels via named parameters. Here I just decided to round off the fitted estimates to "nice" numbers.
Now I can use our newmodel
with predict.
predict(mymodel, type="response")
# 1 2 3 4 5
# 0.4866398 0.3553439 0.6564668 0.7819917 0.3008108
predict(newmodel, newdata=dd, type="response")
# 1 2 3 4 5
# 0.5503572 0.4121811 0.7143200 0.7942776 0.3245525
Here I call predict on the original model and on the new model using the old data with my specified coefficients. We can see the estimate of probability have changed a bit.
Now I haven't thoroughly tested this function so use at your own risk. I don't do as much error checking as I probably should. Maybe someone else does know of a better way.
Set one or more of coefficients to a specific integer
You can use the offset term in the formula and include the desired coefficient and variable therein:
df<-data.frame(aa=1:6,bb=2:7,cc=c(4,2,7,5,8,3))
lm(cc ~ aa + offset(647*bb), data = df)
So this is regressing cc
on aa
plus the fixed term bb * 647
. For more than one given coefficient, add the appropriate additional offset()
terms.
Related Topics
Generate Random Integers Between Two Values with a Given Probability Using R
How to Import Only One Function from Another Package, Without Loading the Entire Namespace
R System Functions Always Returns Error 127
Get the Vector of Values from Different Columns of a Matrix
Arranging Arrows Between Points Nicely in Ggplot2
Glmulti and Liner Mixed Models
Ggplot Set Scale_Color_Gradientn Manually
Ggplot2: More Complex Faceting
How to Create an Infix %Between% Operator
Iteratively Constructed Dataframe in R
Linear Model with 'Lm': How to Get Prediction Variance of Sum of Predicted Values
Display Frequency Instead of Count with Geom_Bar() in Ggplot
Check to See If a Value Is Within a Range
R - Reading Lines from a .Txt-File After a Specific Line