Predict() with Arbitrary Coefficients in R

predict.lm with arbitrary coefficients r

Hacking the $coefficients element does indeed seem to work. Can you show what doesn't work for you?

dd <- data.frame(x=1:5,y=1:5)
m1 <- lm(y~x,dd)
m1$coefficients <- c(-2,1)
m1
## Call:
## lm(formula = y ~ x, data = dd)
## 
## Coefficients:
## [1]  -2   1

predict(m1,newdata=data.frame(x=7))  ## 5  = -2+1*7

predict.lm(...) gives the same results.

I would be very careful with this approach, checking each time you do something different with the hacked model.

In general it would be nice if predict and simulate methods took a newparams argument, but they don't in general ...

R predict() dependent variable Y=f(X) using arbitrary coefficients

Changing the coefficients should do it. However, if you're predicting with the original data by just relying on the predict function using your original data if not passed any additional data to predict (i.e. not using newdata=something) it uses the fitted values from the model object directly.

You can bypass this by telling it to use newdata=your_original_data in your call to predict.

An example

> dat <- mtcars[1:5,]
> results <- glm(vs ~ mpg, data = dat, family = binomial(link = 'logit'))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
> predict(results)
        Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout 
        -23.66411         -23.66411         186.49174          23.03719        -292.19658 
> coefficients(results)
(Intercept)         mpg 
 -2475.4823    116.7532 
> results$coefficients[1] <- 0
> predict(results) # uses original fitted values
        Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout 
        -23.66411         -23.66411         186.49174          23.03719        -292.19658 
> predict(results, newdata = dat)
        Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive Hornet Sportabout 
         2451.818          2451.818          2661.974          2498.519          2183.286

Use Dataframe of Coefficients to Make Test Set Predictions in R

I think you are trying to do something like this

set.seed(2021)
test_data <- data.frame(x=rnorm(7), y=rnorm(7))
test_data$xx <- test_data$x * test_data$x 
test_data$xy <- test_data$x * test_data$y
print(test_data) 
#            x           y         xx           xy
# 1 -0.1224600  0.91556637 0.01499645 -0.112120244
# 2  0.5524566  0.01377194 0.30520833  0.007608399
# 3  0.3486495  1.72996316 0.12155648  0.603150795
# 4  0.3596322 -1.08220485 0.12933535 -0.389195760
# 5  0.8980537 -0.27282518 0.80650043 -0.245011659
# 6 -1.9225695  0.18199540 3.69627356 -0.349898808
# 7  0.2617444  1.50854179 0.06851011  0.394852311

coeff <- c(x=-2, y=-1, xx=+3, xy=+2, constant=+7)
predictions <- as.matrix(cbind(test_data,1)) %*% coeff
print(predictions)
#           [,1]
# [1,]  6.150102
# [2,]  6.812157
# [3,]  6.143709
# [4,]  6.972555
# [5,]  7.406196
# [6,] 21.052167
# [7,]  5.963204

where -2*-0.1224600 -1*0.91556637 +3*0.01499645 +2*-0.112120244 +7 is 6.150102

predict' gives different results than using manually the coefficients from 'summary'

The difference is really small, and I think is just due to the accuracy of the coefficients you are using (e.g. the real value of the intercept is -0.17947075338464965610... not simply -.17947).

In fact, if you take the coefficients value and apply the formula, the result is equal to predict:

intercept <- model$coefficients[1]
x1Coeff <- model$coefficients[2]
x2Coeff <- model$coefficients[3]
x3Coeff <- model$coefficients[4]

intercept + x1Coeff*x1[121:150] + x2Coeff*x2[121:150] + x3Coeff*x3[121:150]

Coefficient of determination in predict()

RMSE <- function(fitted, true){
  sqrt(mean((fitted - true)^2))
}

R2 <- function(fitted, true){
 1 - (sum((true - fitted)^2)/sum((true - mean(true))^2))
}

fitted values are in model2, while true values are probably in responce variable from daten2

RMSE(model2, daten2$y)

R2(model2, daten2$y)

The point is: to calculate a goodness of fit metric you need to supply the true outcomes along with the predicted values. "predict.lm" just provides the predicted values

Manually build logistic regression model for prediction in R

I could not find a simple function to do this. There is some code in the predict function that depends on having a fitted model (like determining the rank of the model). However, we can create a function to make a fake glm object that you can use with predict. Here's my first attempt at such a function

makeglm <- function(formula, family, data=NULL, ...) {
    dots <- list(...)
    out<-list()
    tt <- terms(formula, data=data)
    if(!is.null(data)) {
        mf <- model.frame(tt, data)
        vn <- sapply(attr(tt, "variables")[-1], deparse)

        if((yvar <- attr(tt, "response"))>0)
            vn <- vn[-yvar]
            xlvl <- lapply(data[vn], function(x) if (is.factor(x))
           levels(x)
        else if (is.character(x))
           levels(as.factor(x))
        else
            NULL)
        attr(out, "xlevels") <- xlvl[!vapply(xlvl,is.null,NA)]
        attr(tt, "dataClasses") <- sapply(data[vn], stats:::.MFclass)
    }
    out$terms <- tt
    coef <- numeric(0)
    stopifnot(length(dots)>1 & !is.null(names(dots)))
    for(i in seq_along(dots)) {
        if((n<-names(dots)[i]) != "") {
            v <- dots[[i]]
            if(!is.null(names(v))) {
                coef[paste0(n, names(v))] <- v
            } else {
                stopifnot(length(v)==1)
                coef[n] <- v
            }
        } else {
            coef["(Intercept)"] <- dots[[i]]
        }   
    }
    out$coefficients <- coef
    out$rank <- length(coef)
    out$qr <- list(pivot=seq_len(out$rank))
    out$family <- if (class(family) == "family") {
        family
    } else if (class(family) == "function") {
        family()
    } else {
        stop(paste("invalid family class:", class(family)))
    }
    out$deviance <- 1
    out$null.deviance <- 1
    out$aic <- 1
    class(out) <- c("glm","lm")
    out
}

So this function creates an object and passes all the values that predict and print expect to find on such an object. Now we can test it out. First, here's some test data

set.seed(15)
dd <- data.frame(
    X1=runif(50),
    X2=factor(sample(letters[1:4], 50, replace=T)),
    X3=rpois(50, 5),
    Outcome = sample(0:1, 50, replace=T)
)

And we can fit a standard binomial model with

mymodel<-glm(Outcome~X1+X2+X3, data=dd, family=binomial)

Which gives

Call:  glm(formula = Outcome ~ X1 + X2 + X3, family = binomial, data = dd)

Coefficients:
(Intercept)           X1          X2b          X2c          X2d           X3  
    -0.4411       0.8853       1.8384       0.9455       1.5059      -0.1818  

Degrees of Freedom: 49 Total (i.e. Null);  44 Residual
Null Deviance:      68.03 
Residual Deviance: 62.67    AIC: 74.67

Now let's say we wanted to try out model that we read in a publication on the same data. Here's how we use the makeglm function

newmodel <- makeglm(Outcome~X1+X2+X3, binomial, data=dd, 
    -.5, X1=1, X2=c(b=1.5, c=1, d=1.5), X3=-.15)

The first parameter is the formula of the model. This defines the response and the covariates just like you would when running glm. Next you specify the family like you would with glm(). And you need to pass a data frame so R can sniff the correct data types for each of the variables involved. This will also identify all of the factor variables and their levels using the data.frame. So this can be new data that's coded just like the fitted data.frame or it can be the original one.

Now we start to specify the coefficients to use in our model. The coefficients will be filled using the names of the parameters. The unnamed parameter will be used as the intercept. If you have a factor, you need to give coefficients to all the levels via named parameters. Here I just decided to round off the fitted estimates to "nice" numbers.

Now I can use our newmodel with predict.

predict(mymodel, type="response")
#         1         2         3         4         5
# 0.4866398 0.3553439 0.6564668 0.7819917 0.3008108

predict(newmodel, newdata=dd, type="response")

#         1         2         3         4         5
# 0.5503572 0.4121811 0.7143200 0.7942776 0.3245525

Here I call predict on the original model and on the new model using the old data with my specified coefficients. We can see the estimate of probability have changed a bit.

Now I haven't thoroughly tested this function so use at your own risk. I don't do as much error checking as I probably should. Maybe someone else does know of a better way.

Set one or more of coefficients to a specific integer

You can use the offset term in the formula and include the desired coefficient and variable therein:

df<-data.frame(aa=1:6,bb=2:7,cc=c(4,2,7,5,8,3))

lm(cc ~ aa + offset(647*bb), data = df)

So this is regressing cc on aa plus the fixed term bb * 647. For more than one given coefficient, add the appropriate additional offset() terms.