using predict() and table() in r
You used the wrong parameter name in predict
. It should be newdata=
, not data=
. So the reason you get 49511 elements is that the default for predict
when you don't specify new data is to output the predicted values for the data you created the model with. Hence you're getting the predicted values for your original data.
Use Predict on data.table with Linear Regression
You are predicting onto the entire new
data set each time. If you want to predict only on the new data for each group you need to subset the "newdata" by group.
This is an instance where .BY
will be useful. Here are two possibilities
a <- DT[,predict(lm(y ~ v1 + v2), new[.BY]), by = group]
b <- new[,predict(lm(y ~ v1 + v2, data = DT[.BY]), newdata=.SD),by = group]
both of which give identical results
identical(a,b)
# [1] TRUE
a
# group V1
#1: a -2.525502
#2: a 3.319445
#3: a 4.340253
#4: b -14.588933
#5: b 11.280766
#6: b -1.132324
How to add a prediction column with linear models stored in data.table or tibble?
With tidyverse
, we use map2
to loop through the 'model', corresponding 'x' values, pass the new data in predict
as a data.frame
or tibble
library(tidyverse)
model_dt %>%
mutate(pred_y = map2_dbl(model, x, ~ predict.lm(.x, tibble(x = .y))))
# A tibble: 2 x 4
# id model x pred_y
# <dbl> <list> <dbl> <dbl>
#1 1 <lm> 3 1.6
#2 2 <lm> 3 2.60
Or with the data.table
(object) with Map
model_dt[, pred_y := unlist(Map(function(mod, y)
predict.lm(mod, data.frame(x = y)), model, x)), id][]
# id model x pred_y
#1: 1 <lm> 3 1.6
#2: 2 <lm> 3 2.6
Predict with linear models from a table using a new dataframe
To use list columns, iterate over them with purrr::map
(or lapply
) or variants. Expand columns with tidyr::unnest
when you want.
library(tidyverse)
df <- data_frame(y = rep(seq(0, 240, by = 40), each = 7),
x = rep(1:7, times = 7),
vol = c(300, 380, 430, 460, 480, 485, 489,
350, 445, 505, 540, 565, 580, 585,
380, 490, 560, 605, 635, 650, 655,
400, 525, 605, 655, 690, 710, 715,
415, 555, 655, 710, 740, 760, 765,
420, 570, 680, 740, 775, 800, 805,
422, 580, 695, 765, 805, 830, 835))
df.1 <- df %>%
nest(-y) %>%
mutate(mods = map(data, ~lm(vol ~ poly(x, 5), data = .x)),
preds = map(mods, predict, newdata = data.frame(x = seq(1, 7, 0.001))))
df.1
#> # A tibble: 7 x 4
#> y data mods preds
#> <dbl> <list> <list> <list>
#> 1 0 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 2 40 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 3 80 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 4 120 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 5 160 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 6 200 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 7 240 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
Why can't I use `predict` inside a data.table?
It appear that the presence of the "split" variable, which was used to split the original dataset, was giving problems. Removing it from the regression appears to solve the issue.
reg1 = lm(price ~ . - V1 - split, train)
R table to assess model performance--observed versus predicted class
If i correctly understood your Question, it seems you just want a Confusion Matrix..
Of course they are not difficult to calculate manually, but there are (at least) a dozen built-in functions across the various R Packages that handle all of this for you--the data processing, table formatting, error checking, etc. The bulit-in function i use below also calculates classification error.
The package mda has a built-in function called confusion. You use like so:
> library(mda)
> data(iris)
> iris_fit = fda(Species ~., data=iris)
> CM = confusion(predict(iris_fit, iris), iris$Species)
> # observed classification (true) is column-wise;
> # predicted is row-wise
> CM
true
predicted setosa versicolor virginica
setosa 50 0 0
versicolor 0 48 1
virginica 0 2 49
attr(,"error")
[1] 0.02
Again, there are many more functions from among the third-party packages on CRAN, to calculate the Confusion Matrix.
A quick search of the R Package space using the sos, gave these results:
> library(sos)
> findFn("confusion", maxPages=5, sortby="MaxScore")
i deliberately limited this earch to just the top 5 pages of results (87 individual functions returned). From these results, other R Packages which have a confusion matrix function:
zmisclassification.matrix in package fpc
panr.confusion in package pamr
confusion in package DAAG
Using predicted values to make predictions in data.table
You are looking to iteratively update rows of a data.table with values computed from rows updated in a previous iteration. While it is generally better to find an explicit formulation of the problem making the updates independent and it is possible in your case using a helper column holding the cumprod
of param1
and a rolling join (dt[dt[...], ..., roll=TRUE]
) I will show how to do iterative updates of a data.table efficiently using data.table::set
, as the former is not always easy/possible:
setkey(dt, cat, date) # sort by cat first then by date in have the reference value used for each calculation in the row above
val_col_nr <- which(colnames(dt)=="val") # set requires a column number
dt[is.na(val), # we want to compute new values for val where val currently is NA
# .I is a vector the row numbers (in dt) of each row in .SD
for (ii in .I) set(dt, i=ii, j=val_col_nr, value=dt[ii,param1]*dt[ii-1L,val]),
by=cat] # for every 'cat'
You can use identical(dt, setkey(dt_out,cat,date))
to check the result.
Please do also note that it generally a bad idea to use names of base functions (cat
in your case) as variable names (even in a distinct namespace).
Related Topics
Extract English Words from a Text in R
Solve Homogenous System Ax = 0 for Any M * N Matrix a in R (Find Null Space Basis for A)
Grouped Bar Graph Custom Colours
Reshape Data from Long to Wide Format - More Than One Variable
How to Edit Column Names in Datatable Function When Running R Shiny App
Visual Bug When Changing Robinson Projection's Central Meridian with Ggplot2
Code Folding for Individual Chunks in R Markdown
Usage of Dot/Period in R Functions
R: "Make" Not Found When Installing a R-Package from Local Tar.Gz
R Table Function - How to Remove 0 Counts
How to Annotate Ggplot2 Qplot Outside of Legend and Plotarea? (Similar to Mtext())
How to Display Line Numbers for Code Chunks in Rmarkdown HTML and PDF
How to Pass R Variable into SQLdf
Standard Eval with Ggplot2 Without 'Aes_String()'
Use 'J' to Select the Join Column of 'X' and All Its Non-Join Columns