Using Predict() and Table() in R

using predict() and table() in r

You used the wrong parameter name in predict. It should be newdata=, not data=. So the reason you get 49511 elements is that the default for predict when you don't specify new data is to output the predicted values for the data you created the model with. Hence you're getting the predicted values for your original data.

Use Predict on data.table with Linear Regression

You are predicting onto the entire new data set each time. If you want to predict only on the new data for each group you need to subset the "newdata" by group.

This is an instance where .BY will be useful. Here are two possibilities

a <- DT[,predict(lm(y ~ v1 + v2), new[.BY]), by = group]

b <- new[,predict(lm(y ~ v1 + v2, data = DT[.BY]), newdata=.SD),by = group]

both of which give identical results

identical(a,b)
# [1] TRUE
a
# group V1
#1: a -2.525502
#2: a 3.319445
#3: a 4.340253
#4: b -14.588933
#5: b 11.280766
#6: b -1.132324

How to add a prediction column with linear models stored in data.table or tibble?

With tidyverse, we use map2 to loop through the 'model', corresponding 'x' values, pass the new data in predict as a data.frame or tibble

library(tidyverse)
model_dt %>%
mutate(pred_y = map2_dbl(model, x, ~ predict.lm(.x, tibble(x = .y))))
# A tibble: 2 x 4
# id model x pred_y
# <dbl> <list> <dbl> <dbl>
#1 1 <lm> 3 1.6
#2 2 <lm> 3 2.60

Or with the data.table (object) with Map

model_dt[,  pred_y := unlist(Map(function(mod, y) 
predict.lm(mod, data.frame(x = y)), model, x)), id][]
# id model x pred_y
#1: 1 <lm> 3 1.6
#2: 2 <lm> 3 2.6

Predict with linear models from a table using a new dataframe

To use list columns, iterate over them with purrr::map (or lapply) or variants. Expand columns with tidyr::unnest when you want.

library(tidyverse)
df <- data_frame(y = rep(seq(0, 240, by = 40), each = 7),
x = rep(1:7, times = 7),
vol = c(300, 380, 430, 460, 480, 485, 489,
350, 445, 505, 540, 565, 580, 585,
380, 490, 560, 605, 635, 650, 655,
400, 525, 605, 655, 690, 710, 715,
415, 555, 655, 710, 740, 760, 765,
420, 570, 680, 740, 775, 800, 805,
422, 580, 695, 765, 805, 830, 835))

df.1 <- df %>%
nest(-y) %>%
mutate(mods = map(data, ~lm(vol ~ poly(x, 5), data = .x)),
preds = map(mods, predict, newdata = data.frame(x = seq(1, 7, 0.001))))

df.1
#> # A tibble: 7 x 4
#> y data mods preds
#> <dbl> <list> <list> <list>
#> 1 0 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 2 40 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 3 80 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 4 120 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 5 160 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 6 200 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 7 240 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>

Why can't I use `predict` inside a data.table?

It appear that the presence of the "split" variable, which was used to split the original dataset, was giving problems. Removing it from the regression appears to solve the issue.

reg1 = lm(price ~ . - V1 - split, train)

R table to assess model performance--observed versus predicted class

If i correctly understood your Question, it seems you just want a Confusion Matrix..

Of course they are not difficult to calculate manually, but there are (at least) a dozen built-in functions across the various R Packages that handle all of this for you--the data processing, table formatting, error checking, etc. The bulit-in function i use below also calculates classification error.

The package mda has a built-in function called confusion. You use like so:

> library(mda)
> data(iris)
> iris_fit = fda(Species ~., data=iris)

> CM = confusion(predict(iris_fit, iris), iris$Species)
> # observed classification (true) is column-wise;
> # predicted is row-wise
> CM

true
predicted setosa versicolor virginica
setosa 50 0 0
versicolor 0 48 1
virginica 0 2 49

attr(,"error")
[1] 0.02

Again, there are many more functions from among the third-party packages on CRAN, to calculate the Confusion Matrix.

A quick search of the R Package space using the sos, gave these results:

> library(sos)

> findFn("confusion", maxPages=5, sortby="MaxScore")

i deliberately limited this earch to just the top 5 pages of results (87 individual functions returned). From these results, other R Packages which have a confusion matrix function:

  • zmisclassification.matrix in package fpc

  • panr.confusion in package pamr

  • confusion in package DAAG

Using predicted values to make predictions in data.table

You are looking to iteratively update rows of a data.table with values computed from rows updated in a previous iteration. While it is generally better to find an explicit formulation of the problem making the updates independent and it is possible in your case using a helper column holding the cumprod of param1 and a rolling join (dt[dt[...], ..., roll=TRUE]) I will show how to do iterative updates of a data.table efficiently using data.table::set, as the former is not always easy/possible:

setkey(dt, cat, date) # sort by cat first then by date in have the reference value used for each calculation in the row above
val_col_nr <- which(colnames(dt)=="val") # set requires a column number
dt[is.na(val), # we want to compute new values for val where val currently is NA
# .I is a vector the row numbers (in dt) of each row in .SD
for (ii in .I) set(dt, i=ii, j=val_col_nr, value=dt[ii,param1]*dt[ii-1L,val]),
by=cat] # for every 'cat'

You can use identical(dt, setkey(dt_out,cat,date)) to check the result.

Please do also note that it generally a bad idea to use names of base functions (cat in your case) as variable names (even in a distinct namespace).



Related Topics



Leave a reply



Submit