How to 'Unlist' a Column in a Data.Table

How to 'unlist' a column in a data.table

Promoting my comment to an answer. Using:

dt1[,.(colB = unlist(colB)), by = setdiff(names(dt1), 'colB')]

gives:

   colA colC colD colB
1: A1 C1 D1 B1
2: A2 C2 D2 B2a
3: A2 C2 D2 B2b
4: A3 C3 D3 B3

Or as an alternative (a slight variation of @Frank's proposal):

dt1[rep(dt1[,.I], lengths(colB))][, colB := unlist(dt1$colB)][]

How do you delete a column by name in data.table?

Any of the following will remove column foo from the data.table df3:

# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]

df3[, c("foo","bar"):=NULL] # remove two columns

myVar = "foo"
df3[, (myVar):=NULL] # lookup myVar contents

# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]

# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]

data.table also supports the following syntax:

## Method 3 (could then assign to df3, 
df3[, !"foo"]

though if you were actually wanting to remove column "foo" from df3 (as opposed to just printing a view of df3 minus column "foo") you'd really want to use Method 1 instead.

(Do note that if you use a method relying on grep() or grepl(), you need to set pattern="^foo$" rather than "foo", if you don't want columns with names like "fool" and "buffoon" (i.e. those containing foo as a substring) to also be matched and removed.)

Less safe options, fine for interactive use:

The next two idioms will also work -- if df3 contains a column matching "foo" -- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar", you'll end up with a zero-row data.table.

As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo". For programming purposes (or if you are wanting to actually remove the column(s) from df3 rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.

# Method 4:
df3[, .SD, .SDcols = !patterns("^foo$")]

Lastly there are approaches using with=FALSE, though data.table is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:

# Method 5a (like Method 3)
df3[, !"foo", with=FALSE]
# Method 5b (like Method 4)
df3[, !grep("^foo$", names(df3)), with=FALSE]
# Method 5b (another like Method 4)
df3[, !grepl("^foo$", names(df3)), with=FALSE]

Unlist nested list columns in data.table

Not sure it is more "canonical" but here is a way to modify l so you can use by=a, considering you know the type of your data in list (with some improvements, thanks to @DavidArenburg):

dt[lengths(l) == 0, l := NA_integer_][, .(nm = names(unlist(l)), ul = unlist(l)), by = a]

# a nm ul
#1: a c1 6
#2: a c2 4
#3: b x 2
#4: b y 4
#5: b z 3
#6: c NA NA

unlist a list of data.tables with list index

as @akrun points outs idcol is available in data.tables from v.1.9.6

rbindlist(l1, idcol = 'g')

How to unlist objects in data.table that is made of data.table and lists made of data.table in R (example given)?

Here is a solution that answers your question using data.table

given_y[, unlist(.SD[[1]], recursive = F), by = country]

# country city age
# 1: abc Del 20
# 2: abc Mum 30
# 3: abc Kol 45
# 4: xyz Del 30
# 5: xyz Mum 45

Unlist data frame column preserving information from other column

Here, the idea is to first get the length of each list element using sapply and then use rep to replicate the col1 with that length

 l1 <- sapply(myDataFrame$col2, length)
unlist.col1 <- rep(myDataFrame$col1, l1)
unlist.col1
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"

Or as suggested by @Ananda Mahto, the above could be also done with vapply

   with(myDataFrame, rep(col1, vapply(col2, length, 1L)))
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"

Unlist a data.frame column

For unlisting list-columns, you need to call unnest from the tidyr package.

unnest(dataframe, nameofcolumns)

Best,

Colin

How to unlist a list and keep the names of the top level as a new variable in R

dplyrs bind_rows should do the trick:

library(dplyr)

bind_rows(my_list, .id = "Ticker")

This returns

# A tibble: 30 x 3
Ticker date Value
<chr> <date> <int>
1 Ticker1 2021-01-01 1
2 Ticker1 2021-01-02 2
3 Ticker1 2021-01-03 3
4 Ticker1 2021-01-04 4
5 Ticker1 2021-01-05 5
6 Ticker1 2021-01-06 6
7 Ticker1 2021-01-07 7
8 Ticker1 2021-01-08 8
9 Ticker1 2021-01-09 9
10 Ticker1 2021-01-10 10
# ... with 20 more rows

Removing columns from a data.table in R based on conditions

dt = data.table("col1" = "a", "col2" = "b", "col3" = "c", 
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)

not0 = function(x) is.numeric(x) && !anyNA(x) && all(x!=0)
dt[, .(
## your categorical columns
col1, col2, col3, col4, col5,
## new column pasted from non-0 numeric columns
new = as.numeric(paste0(unlist(.SD), collapse=""))
),
## this filters columns to be provided in .SD column subset
.SDcols = not0,
## we group by each row so it will handle input of multiple rows
by = .(row=seq_len(nrow(dt)))
][, row:=NULL ## this removes extra grouping column
][] ## this prints
# col1 col2 col3 col4 col5 new
#1: a b c d e 9799

Alternatively if you want to update in place existing table

is0 = function(x) is.numeric(x) && !anyNA(x) && all(x==0)
## remove columns that has 0
dt[, which(sapply(dt, is0)) := NULL]

## add new column
dt[, new := as.numeric(
paste0(unlist(.SD), collapse="")
), .SDcols=is.numeric, by=.(row=seq_len(nrow(dt)))
][]
# col1 col2 col3 col4 col5 col6 col8 col10 new
#1: a b c d e 9 7 99 9799


Related Topics



Leave a reply



Submit