Create Columns from Column of List in Data.Table

R data.table add new column with values from other columns by referencing

With fcase:

cols <- unique(dt$Label)
dt[,newCol:=eval(parse(text=paste('fcase(',paste0("Label=='",cols,"',Col_",cols,collapse=','),')')))][]

Label Col_A Col_B Col_C newCol
<char> <num> <num> <num> <num>
1: A 2 1 2 2
2: B 3 4 0 4
3: C 5 3 4 4
4: A 0 5 1 0
5: B 2 2 5 2
6: C 7 0 6 6
7: A 6 7 7 6
8: B 8 5 3 5
9: C 9 8 0 0

Create columns from column of list in data.table

Try this:

DT2 <- DT[ , as.list(quantile(x,probs=probs)),by=y]
setnames(DT2, c("y", paste0("q", seq(10, 100, by=10))))

# y q10 q20 q30 q40 q50 q60 q70 q80
# 1: b -1.281704 -0.8402934 -0.5251957 -0.2595748 -0.001625739 0.2526686 0.5251940 0.8379979
# 2: c -1.269750 -0.8323597 -0.5133207 -0.2478633 0.003413041 0.2598378 0.5353759 0.8477539
# 3: a -1.281899 -0.8389189 -0.5224092 -0.2573562 0.001186281 0.2542550 0.5244238 0.8401411
# q90 q100
# 1: 1.284773 3.856234
# 2: 1.283465 4.322815
# 3: 1.273615 3.921410

How to do operations on list columns in an R data.table to output another list column?

Another solution using mapply:

dt[, absvals := mapply(listcol, numericcol, FUN = function(x, y) abs(x-y))]

#output
dt
numericcol listcol absvals
1: 42 1,22, 3 41,20,39
2: 42 6 36
3: 42 1 41
4: 42 12 30
5: 42 5, 6,1123 37, 36,1081
6: 42 3 39
7: 42 42 0
8: 42 1 41

create list from columns of data table expression

Get the data in long format and then aggregate by group.

library(data.table)

dt_long <- melt(dt, measure.vars = c('a', 'b'))
dt_long[, .N, .(variable, value)]

# variable value N
#1: a 1 2
#2: a 2 1
#3: a 3 1
#4: a 7 1
#5: b 4 3
#6: b 5 1
#7: b 6 1

In tidyverse -

library(dplyr)
library(tidyr)

dt %>%
pivot_longer(cols = everything()) %>%
count(name, value)

R data table - create a new column where each element is a list of values

If we need a list column in the dataset, wrap it with list

DT[, UniqueCats := list(list(sort(unique(Category)))) , by = UserID]
str(DT)
#Classes ‘data.table’ and 'data.frame': 4 obs. of 6 variables:
# $ UserID : chr "aaa" "bbb" "aaa" "aaa"
# $ Time : chr "7:50" "5:05" "8:40" "10:00"
# $ ArticleID : chr "x" "x" "y" "z"
# $ Category : chr "sports" "sports" "politics" "sports"
# $ NumOfReading: int 1 1 2 3
# $ UniqueCats :List of 4
# ..$ : chr "politics" "sports"
# ..$ : chr "sports"
# ..$ : chr "politics" "sports"
# ..$ : chr "politics" "sports"

We can also create a string column by concatenating the elements together with paste

DT[, uniqueCats := toString(sort(unique(Category))), by = UserID]

How to create a new column in data.table based on values of other columns

Another option is to use indexing to find the rows that fits the condition and update those rows only:

#for each group of ID and Cycle, 
#find the row indices where Cycle_Date equals the last Positive_Test_Date
idxDT <- DT[, .I[Cycle_Date==Positive_Test_Date[.N]], .(ID, Cycle)]

#for those row indices, set the LH_Date to be Cycle_Date
#(NA rows or excluded rows defaults to NA by design in data.table)
DT[idxDT$V1, LH_Date := Cycle_Date]

idxDT looks like this and idxDT$V1 extracts the column V1:

   ID Cycle V1
1: 1 1 2
2: 1 1 NA
3: 1 2 7
4: 1 2 NA
5: 2 1 9
6: 2 1 NA
7: 2 2 14
8: 2 2 NA

.I contains the row index within a data.table. From ?.I:

.I is an integer vector equal to seq_len(nrow(x)). While grouping, it holds for each item in the group, its row location in x. This is useful to subset in j; e.g. DT[, .I[which.max(somecol)], by=grp].

output:

    ID Cycle Cycle_Day Cycle_Date Positive_Test_Date   LH_Date
1: 1 1 1 3/28/2019 <NA> <NA>
2: 1 1 2 3/29/2019 <NA> 3/29/2019
3: 1 1 3 3/30/2019 <NA> <NA>
4: 1 1 NA <NA> 3/29/2019 <NA>
5: 1 2 1 4/23/2019 <NA> <NA>
6: 1 2 2 4/24/2019 <NA> <NA>
7: 1 2 3 4/25/2019 <NA> 4/25/2019
8: 1 2 NA <NA> 4/25/2019 <NA>
9: 2 1 1 3/18/2019 <NA> 3/18/2019
10: 2 1 2 3/19/2019 <NA> <NA>
11: 2 1 3 3/20/2019 <NA> <NA>
12: 2 1 NA <NA> 3/18/2019 <NA>
13: 2 2 1 4/23/2019 <NA> <NA>
14: 2 2 2 4/24/2019 <NA> 4/24/2019
15: 2 2 3 4/25/2019 <NA> <NA>
16: 2 2 NA <NA> 4/24/2019 <NA>

data:

library(data.table)
DT <- fread("ID Cycle Cycle_Day Cycle_Date Positive_Test_Date
1 1 1 3/28/2019 NA
1 1 2 3/29/2019 NA
1 1 3 3/30/2019 NA
1 1 NA NA 3/29/2019
1 2 1 4/23/2019 NA
1 2 2 4/24/2019 NA
1 2 3 4/25/2019 NA
1 2 NA NA 4/25/2019
2 1 1 3/18/2019 NA
2 1 2 3/19/2019 NA
2 1 3 3/20/2019 NA
2 1 NA NA 3/18/2019
2 2 1 4/23/2019 NA
2 2 2 4/24/2019 NA
2 2 3 4/25/2019 NA
2 2 NA NA 4/24/2019")

Using a List to Fetch Columns from a DataTable

You can replace:

Dim arrayOfObjects()() As Object = DT.AsEnumerable().Select(Function(b) {b("x1"), b("x2"), b("x3")}).ToArray()

With:

Dim mystr As String = "x1,x2,x3"

Dim tarCols As String() = mystr.Split({","}, StringSplitOptions.RemoveEmptyEntries)

' Shortcut
' Dim tarCols = { "x1", "x2", "x3" }

Dim arrayOfObjects As Object()() = dt.DefaultView.ToTable(False, tarCols).
AsEnumerable().Select(Function(x) x.ItemArray).ToArray()

To extract the values of any given one or more DataColumn and create that jagged array.

How to Filter Data Table Rows with condition on column of Type list() in R

You can use sapply function to check if any of the values in vals is in Product for each row:

vals = c("UG12210","UG10000-WISD")

dt[Period %chin% "2018-Q1" & sapply(Product, function(v) any(vals %chin% v))]

# Id Period Product
# 1: 1000797366 2018-Q1 UG10000-WISD
# 2: 1000797366 2018-Q1 NX11100,UG10000-WISD,UG12210
# 3: 1000797366 2018-Q1 UG10000-WISD,UG12210
# 4: 1000797366 2018-Q1 UG10000-WISD,UG12210
# 5: 1000797366 2018-Q1 UG12210


Related Topics



Leave a reply



Submit