R data.table add new column with values from other columns by referencing
With fcase
:
cols <- unique(dt$Label)
dt[,newCol:=eval(parse(text=paste('fcase(',paste0("Label=='",cols,"',Col_",cols,collapse=','),')')))][]
Label Col_A Col_B Col_C newCol
<char> <num> <num> <num> <num>
1: A 2 1 2 2
2: B 3 4 0 4
3: C 5 3 4 4
4: A 0 5 1 0
5: B 2 2 5 2
6: C 7 0 6 6
7: A 6 7 7 6
8: B 8 5 3 5
9: C 9 8 0 0
Create columns from column of list in data.table
Try this:
DT2 <- DT[ , as.list(quantile(x,probs=probs)),by=y]
setnames(DT2, c("y", paste0("q", seq(10, 100, by=10))))
# y q10 q20 q30 q40 q50 q60 q70 q80
# 1: b -1.281704 -0.8402934 -0.5251957 -0.2595748 -0.001625739 0.2526686 0.5251940 0.8379979
# 2: c -1.269750 -0.8323597 -0.5133207 -0.2478633 0.003413041 0.2598378 0.5353759 0.8477539
# 3: a -1.281899 -0.8389189 -0.5224092 -0.2573562 0.001186281 0.2542550 0.5244238 0.8401411
# q90 q100
# 1: 1.284773 3.856234
# 2: 1.283465 4.322815
# 3: 1.273615 3.921410
How to do operations on list columns in an R data.table to output another list column?
Another solution using mapply
:
dt[, absvals := mapply(listcol, numericcol, FUN = function(x, y) abs(x-y))]
#output
dt
numericcol listcol absvals
1: 42 1,22, 3 41,20,39
2: 42 6 36
3: 42 1 41
4: 42 12 30
5: 42 5, 6,1123 37, 36,1081
6: 42 3 39
7: 42 42 0
8: 42 1 41
create list from columns of data table expression
Get the data in long format and then aggregate by group.
library(data.table)
dt_long <- melt(dt, measure.vars = c('a', 'b'))
dt_long[, .N, .(variable, value)]
# variable value N
#1: a 1 2
#2: a 2 1
#3: a 3 1
#4: a 7 1
#5: b 4 3
#6: b 5 1
#7: b 6 1
In tidyverse
-
library(dplyr)
library(tidyr)
dt %>%
pivot_longer(cols = everything()) %>%
count(name, value)
R data table - create a new column where each element is a list of values
If we need a list
column in the dataset, wrap it with list
DT[, UniqueCats := list(list(sort(unique(Category)))) , by = UserID]
str(DT)
#Classes ‘data.table’ and 'data.frame': 4 obs. of 6 variables:
# $ UserID : chr "aaa" "bbb" "aaa" "aaa"
# $ Time : chr "7:50" "5:05" "8:40" "10:00"
# $ ArticleID : chr "x" "x" "y" "z"
# $ Category : chr "sports" "sports" "politics" "sports"
# $ NumOfReading: int 1 1 2 3
# $ UniqueCats :List of 4
# ..$ : chr "politics" "sports"
# ..$ : chr "sports"
# ..$ : chr "politics" "sports"
# ..$ : chr "politics" "sports"
We can also create a string column by concatenating the elements together with paste
DT[, uniqueCats := toString(sort(unique(Category))), by = UserID]
How to create a new column in data.table based on values of other columns
Another option is to use indexing to find the rows that fits the condition and update those rows only:
#for each group of ID and Cycle,
#find the row indices where Cycle_Date equals the last Positive_Test_Date
idxDT <- DT[, .I[Cycle_Date==Positive_Test_Date[.N]], .(ID, Cycle)]
#for those row indices, set the LH_Date to be Cycle_Date
#(NA rows or excluded rows defaults to NA by design in data.table)
DT[idxDT$V1, LH_Date := Cycle_Date]
idxDT
looks like this and idxDT$V1
extracts the column V1
:
ID Cycle V1
1: 1 1 2
2: 1 1 NA
3: 1 2 7
4: 1 2 NA
5: 2 1 9
6: 2 1 NA
7: 2 2 14
8: 2 2 NA
.I
contains the row index within a data.table. From ?.I
:
.I is an integer vector equal to seq_len(nrow(x)). While grouping, it holds for each item in the group, its row location in x. This is useful to subset in j; e.g. DT[, .I[which.max(somecol)], by=grp].
output:
ID Cycle Cycle_Day Cycle_Date Positive_Test_Date LH_Date
1: 1 1 1 3/28/2019 <NA> <NA>
2: 1 1 2 3/29/2019 <NA> 3/29/2019
3: 1 1 3 3/30/2019 <NA> <NA>
4: 1 1 NA <NA> 3/29/2019 <NA>
5: 1 2 1 4/23/2019 <NA> <NA>
6: 1 2 2 4/24/2019 <NA> <NA>
7: 1 2 3 4/25/2019 <NA> 4/25/2019
8: 1 2 NA <NA> 4/25/2019 <NA>
9: 2 1 1 3/18/2019 <NA> 3/18/2019
10: 2 1 2 3/19/2019 <NA> <NA>
11: 2 1 3 3/20/2019 <NA> <NA>
12: 2 1 NA <NA> 3/18/2019 <NA>
13: 2 2 1 4/23/2019 <NA> <NA>
14: 2 2 2 4/24/2019 <NA> 4/24/2019
15: 2 2 3 4/25/2019 <NA> <NA>
16: 2 2 NA <NA> 4/24/2019 <NA>
data:
library(data.table)
DT <- fread("ID Cycle Cycle_Day Cycle_Date Positive_Test_Date
1 1 1 3/28/2019 NA
1 1 2 3/29/2019 NA
1 1 3 3/30/2019 NA
1 1 NA NA 3/29/2019
1 2 1 4/23/2019 NA
1 2 2 4/24/2019 NA
1 2 3 4/25/2019 NA
1 2 NA NA 4/25/2019
2 1 1 3/18/2019 NA
2 1 2 3/19/2019 NA
2 1 3 3/20/2019 NA
2 1 NA NA 3/18/2019
2 2 1 4/23/2019 NA
2 2 2 4/24/2019 NA
2 2 3 4/25/2019 NA
2 2 NA NA 4/24/2019")
Using a List to Fetch Columns from a DataTable
You can replace:
Dim arrayOfObjects()() As Object = DT.AsEnumerable().Select(Function(b) {b("x1"), b("x2"), b("x3")}).ToArray()
With:
Dim mystr As String = "x1,x2,x3"
Dim tarCols As String() = mystr.Split({","}, StringSplitOptions.RemoveEmptyEntries)
' Shortcut
' Dim tarCols = { "x1", "x2", "x3" }
Dim arrayOfObjects As Object()() = dt.DefaultView.ToTable(False, tarCols).
AsEnumerable().Select(Function(x) x.ItemArray).ToArray()
To extract the values of any given one or more DataColumn
and create that jagged array.
How to Filter Data Table Rows with condition on column of Type list() in R
You can use sapply
function to check if any of the values in vals
is in Product
for each row:
vals = c("UG12210","UG10000-WISD")
dt[Period %chin% "2018-Q1" & sapply(Product, function(v) any(vals %chin% v))]
# Id Period Product
# 1: 1000797366 2018-Q1 UG10000-WISD
# 2: 1000797366 2018-Q1 NX11100,UG10000-WISD,UG12210
# 3: 1000797366 2018-Q1 UG10000-WISD,UG12210
# 4: 1000797366 2018-Q1 UG10000-WISD,UG12210
# 5: 1000797366 2018-Q1 UG12210
Related Topics
Parallel Processing in R Limited
Differencebetween Short (&,|) and Long (&&, ||) Forms of And, or Logical Operators in R
Rename Columns by Pattern in R
Remove Columns of Dataframe Based on Conditions in R
Blend of Na.Omit and Na.Pass Using Aggregate
Ggplot2 Scale_X_Log10() Destroys/Doesn't Apply for Function Plotted via Stat_Function()
Tricks to Override Plot.Factor
Merge Plm Fitted Values to Dataset
How to Make Shiny's Input$Var Consumable for Dplyr::Summarise()
How to Select Rows According to Column Value Conditions
Rselenium, Chrome, How to Set Download Directory, File Download Error
R Sum Every K Columns in Matrix
Display Duplicate Records in Data.Frame and Omit Single Ones
Stacked Bar Chart, Reorder by Total (Sum Up of Values) Instead of Value Ggplot2 + Dplyr
How to Color Entire Background in Ggplot2 When Using Coord_Fixed