Predict() with Arbitrary Coefficients in R

predict.lm with arbitrary coefficients r

Hacking the $coefficients element does indeed seem to work. Can you show what doesn't work for you?

dd <- data.frame(x=1:5,y=1:5)
m1 <- lm(y~x,dd)
m1$coefficients <- c(-2,1)
m1
## Call:
## lm(formula = y ~ x, data = dd)
##
## Coefficients:
## [1] -2 1

predict(m1,newdata=data.frame(x=7)) ## 5 = -2+1*7

predict.lm(...) gives the same results.

I would be very careful with this approach, checking each time you do something different with the hacked model.

In general it would be nice if predict and simulate methods took a newparams argument, but they don't in general ...

R predict() dependent variable Y=f(X) using arbitrary coefficients

Changing the coefficients should do it. However, if you're predicting with the original data by just relying on the predict function using your original data if not passed any additional data to predict (i.e. not using newdata=something) it uses the fitted values from the model object directly.

You can bypass this by telling it to use newdata=your_original_data in your call to predict.

An example

> dat <- mtcars[1:5,]
> results <- glm(vs ~ mpg, data = dat, family = binomial(link = 'logit'))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
> predict(results)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
-23.66411 -23.66411 186.49174 23.03719 -292.19658
> coefficients(results)
(Intercept) mpg
-2475.4823 116.7532
> results$coefficients[1] <- 0
> predict(results) # uses original fitted values
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
-23.66411 -23.66411 186.49174 23.03719 -292.19658
> predict(results, newdata = dat)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
2451.818 2451.818 2661.974 2498.519 2183.286

Use Dataframe of Coefficients to Make Test Set Predictions in R

I think you are trying to do something like this

set.seed(2021)
test_data <- data.frame(x=rnorm(7), y=rnorm(7))
test_data$xx <- test_data$x * test_data$x
test_data$xy <- test_data$x * test_data$y
print(test_data)
# x y xx xy
# 1 -0.1224600 0.91556637 0.01499645 -0.112120244
# 2 0.5524566 0.01377194 0.30520833 0.007608399
# 3 0.3486495 1.72996316 0.12155648 0.603150795
# 4 0.3596322 -1.08220485 0.12933535 -0.389195760
# 5 0.8980537 -0.27282518 0.80650043 -0.245011659
# 6 -1.9225695 0.18199540 3.69627356 -0.349898808
# 7 0.2617444 1.50854179 0.06851011 0.394852311

coeff <- c(x=-2, y=-1, xx=+3, xy=+2, constant=+7)
predictions <- as.matrix(cbind(test_data,1)) %*% coeff
print(predictions)
# [,1]
# [1,] 6.150102
# [2,] 6.812157
# [3,] 6.143709
# [4,] 6.972555
# [5,] 7.406196
# [6,] 21.052167
# [7,] 5.963204

where -2*-0.1224600 -1*0.91556637 +3*0.01499645 +2*-0.112120244 +7 is 6.150102

predict' gives different results than using manually the coefficients from 'summary'

The difference is really small, and I think is just due to the accuracy of the coefficients you are using (e.g. the real value of the intercept is -0.17947075338464965610... not simply -.17947).

In fact, if you take the coefficients value and apply the formula, the result is equal to predict:

intercept <- model$coefficients[1]
x1Coeff <- model$coefficients[2]
x2Coeff <- model$coefficients[3]
x3Coeff <- model$coefficients[4]

intercept + x1Coeff*x1[121:150] + x2Coeff*x2[121:150] + x3Coeff*x3[121:150]

Coefficient of determination in predict()

RMSE <- function(fitted, true){
sqrt(mean((fitted - true)^2))
}

R2 <- function(fitted, true){
1 - (sum((true - fitted)^2)/sum((true - mean(true))^2))
}

fitted values are in model2, while true values are probably in responce variable from daten2

RMSE(model2, daten2$y)

R2(model2, daten2$y)

The point is: to calculate a goodness of fit metric you need to supply the true outcomes along with the predicted values. "predict.lm" just provides the predicted values

Manually build logistic regression model for prediction in R

I could not find a simple function to do this. There is some code in the predict function that depends on having a fitted model (like determining the rank of the model). However, we can create a function to make a fake glm object that you can use with predict. Here's my first attempt at such a function

makeglm <- function(formula, family, data=NULL, ...) {
dots <- list(...)
out<-list()
tt <- terms(formula, data=data)
if(!is.null(data)) {
mf <- model.frame(tt, data)
vn <- sapply(attr(tt, "variables")[-1], deparse)

if((yvar <- attr(tt, "response"))>0)
vn <- vn[-yvar]
xlvl <- lapply(data[vn], function(x) if (is.factor(x))
levels(x)
else if (is.character(x))
levels(as.factor(x))
else
NULL)
attr(out, "xlevels") <- xlvl[!vapply(xlvl,is.null,NA)]
attr(tt, "dataClasses") <- sapply(data[vn], stats:::.MFclass)
}
out$terms <- tt
coef <- numeric(0)
stopifnot(length(dots)>1 & !is.null(names(dots)))
for(i in seq_along(dots)) {
if((n<-names(dots)[i]) != "") {
v <- dots[[i]]
if(!is.null(names(v))) {
coef[paste0(n, names(v))] <- v
} else {
stopifnot(length(v)==1)
coef[n] <- v
}
} else {
coef["(Intercept)"] <- dots[[i]]
}
}
out$coefficients <- coef
out$rank <- length(coef)
out$qr <- list(pivot=seq_len(out$rank))
out$family <- if (class(family) == "family") {
family
} else if (class(family) == "function") {
family()
} else {
stop(paste("invalid family class:", class(family)))
}
out$deviance <- 1
out$null.deviance <- 1
out$aic <- 1
class(out) <- c("glm","lm")
out
}

So this function creates an object and passes all the values that predict and print expect to find on such an object. Now we can test it out. First, here's some test data

set.seed(15)
dd <- data.frame(
X1=runif(50),
X2=factor(sample(letters[1:4], 50, replace=T)),
X3=rpois(50, 5),
Outcome = sample(0:1, 50, replace=T)
)

And we can fit a standard binomial model with

mymodel<-glm(Outcome~X1+X2+X3, data=dd, family=binomial)

Which gives

Call:  glm(formula = Outcome ~ X1 + X2 + X3, family = binomial, data = dd)

Coefficients:
(Intercept) X1 X2b X2c X2d X3
-0.4411 0.8853 1.8384 0.9455 1.5059 -0.1818

Degrees of Freedom: 49 Total (i.e. Null); 44 Residual
Null Deviance: 68.03
Residual Deviance: 62.67 AIC: 74.67

Now let's say we wanted to try out model that we read in a publication on the same data. Here's how we use the makeglm function

newmodel <- makeglm(Outcome~X1+X2+X3, binomial, data=dd, 
-.5, X1=1, X2=c(b=1.5, c=1, d=1.5), X3=-.15)

The first parameter is the formula of the model. This defines the response and the covariates just like you would when running glm. Next you specify the family like you would with glm(). And you need to pass a data frame so R can sniff the correct data types for each of the variables involved. This will also identify all of the factor variables and their levels using the data.frame. So this can be new data that's coded just like the fitted data.frame or it can be the original one.

Now we start to specify the coefficients to use in our model. The coefficients will be filled using the names of the parameters. The unnamed parameter will be used as the intercept. If you have a factor, you need to give coefficients to all the levels via named parameters. Here I just decided to round off the fitted estimates to "nice" numbers.

Now I can use our newmodel with predict.

predict(mymodel, type="response")
# 1 2 3 4 5
# 0.4866398 0.3553439 0.6564668 0.7819917 0.3008108

predict(newmodel, newdata=dd, type="response")

# 1 2 3 4 5
# 0.5503572 0.4121811 0.7143200 0.7942776 0.3245525

Here I call predict on the original model and on the new model using the old data with my specified coefficients. We can see the estimate of probability have changed a bit.

Now I haven't thoroughly tested this function so use at your own risk. I don't do as much error checking as I probably should. Maybe someone else does know of a better way.

Set one or more of coefficients to a specific integer

You can use the offset term in the formula and include the desired coefficient and variable therein:

df<-data.frame(aa=1:6,bb=2:7,cc=c(4,2,7,5,8,3))

lm(cc ~ aa + offset(647*bb), data = df)

So this is regressing cc on aa plus the fixed term bb * 647. For more than one given coefficient, add the appropriate additional offset() terms.



Related Topics



Leave a reply



Submit