How to 'unlist' a column in a data.table
Promoting my comment to an answer. Using:
dt1[,.(colB = unlist(colB)), by = setdiff(names(dt1), 'colB')]
gives:
colA colC colD colB
1: A1 C1 D1 B1
2: A2 C2 D2 B2a
3: A2 C2 D2 B2b
4: A3 C3 D3 B3
Or as an alternative (a slight variation of @Frank's proposal):
dt1[rep(dt1[,.I], lengths(colB))][, colB := unlist(dt1$colB)][]
How do you delete a column by name in data.table?
Any of the following will remove column foo
from the data.table df3
:
# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]
df3[, c("foo","bar"):=NULL] # remove two columns
myVar = "foo"
df3[, (myVar):=NULL] # lookup myVar contents
# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]
# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]
data.table also supports the following syntax:
## Method 3 (could then assign to df3,
df3[, !"foo"]
though if you were actually wanting to remove column "foo"
from df3
(as opposed to just printing a view of df3
minus column "foo"
) you'd really want to use Method 1 instead.
(Do note that if you use a method relying on grep()
or grepl()
, you need to set pattern="^foo$"
rather than "foo"
, if you don't want columns with names like "fool"
and "buffoon"
(i.e. those containing foo
as a substring) to also be matched and removed.)
Less safe options, fine for interactive use:
The next two idioms will also work -- if df3
contains a column matching "foo"
-- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar"
, you'll end up with a zero-row data.table.
As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo"
. For programming purposes (or if you are wanting to actually remove the column(s) from df3
rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.
# Method 4:
df3[, .SD, .SDcols = !patterns("^foo$")]
Lastly there are approaches using with=FALSE
, though data.table
is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:
# Method 5a (like Method 3)
df3[, !"foo", with=FALSE]
# Method 5b (like Method 4)
df3[, !grep("^foo$", names(df3)), with=FALSE]
# Method 5b (another like Method 4)
df3[, !grepl("^foo$", names(df3)), with=FALSE]
Unlist nested list columns in data.table
Not sure it is more "canonical" but here is a way to modify l
so you can use by=a
, considering you know the type of your data in list (with some improvements, thanks to @DavidArenburg):
dt[lengths(l) == 0, l := NA_integer_][, .(nm = names(unlist(l)), ul = unlist(l)), by = a]
# a nm ul
#1: a c1 6
#2: a c2 4
#3: b x 2
#4: b y 4
#5: b z 3
#6: c NA NA
unlist a list of data.tables with list index
as @akrun points outs idcol
is available in data.tables
from v.1.9.6
rbindlist(l1, idcol = 'g')
How to unlist objects in data.table that is made of data.table and lists made of data.table in R (example given)?
Here is a solution that answers your question using data.table
given_y[, unlist(.SD[[1]], recursive = F), by = country]
# country city age
# 1: abc Del 20
# 2: abc Mum 30
# 3: abc Kol 45
# 4: xyz Del 30
# 5: xyz Mum 45
Unlist data frame column preserving information from other column
Here, the idea is to first get the length of each list element using sapply
and then use rep
to replicate the col1
with that length
l1 <- sapply(myDataFrame$col2, length)
unlist.col1 <- rep(myDataFrame$col1, l1)
unlist.col1
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Or as suggested by @Ananda Mahto, the above could be also done with vapply
with(myDataFrame, rep(col1, vapply(col2, length, 1L)))
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Unlist a data.frame column
For unlisting list-columns, you need to call unnest
from the tidyr
package.
unnest(dataframe, nameofcolumns)
Best,
Colin
How to unlist a list and keep the names of the top level as a new variable in R
dplyr
s bind_rows
should do the trick:
library(dplyr)
bind_rows(my_list, .id = "Ticker")
This returns
# A tibble: 30 x 3
Ticker date Value
<chr> <date> <int>
1 Ticker1 2021-01-01 1
2 Ticker1 2021-01-02 2
3 Ticker1 2021-01-03 3
4 Ticker1 2021-01-04 4
5 Ticker1 2021-01-05 5
6 Ticker1 2021-01-06 6
7 Ticker1 2021-01-07 7
8 Ticker1 2021-01-08 8
9 Ticker1 2021-01-09 9
10 Ticker1 2021-01-10 10
# ... with 20 more rows
Removing columns from a data.table in R based on conditions
dt = data.table("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
not0 = function(x) is.numeric(x) && !anyNA(x) && all(x!=0)
dt[, .(
## your categorical columns
col1, col2, col3, col4, col5,
## new column pasted from non-0 numeric columns
new = as.numeric(paste0(unlist(.SD), collapse=""))
),
## this filters columns to be provided in .SD column subset
.SDcols = not0,
## we group by each row so it will handle input of multiple rows
by = .(row=seq_len(nrow(dt)))
][, row:=NULL ## this removes extra grouping column
][] ## this prints
# col1 col2 col3 col4 col5 new
#1: a b c d e 9799
Alternatively if you want to update in place existing table
is0 = function(x) is.numeric(x) && !anyNA(x) && all(x==0)
## remove columns that has 0
dt[, which(sapply(dt, is0)) := NULL]
## add new column
dt[, new := as.numeric(
paste0(unlist(.SD), collapse="")
), .SDcols=is.numeric, by=.(row=seq_len(nrow(dt)))
][]
# col1 col2 col3 col4 col5 col6 col8 col10 new
#1: a b c d e 9 7 99 9799
Related Topics
Truncate Decimal to Specified Places
Create Combinations of a Binary Vector
Let Ggplot2 Histogram Show Classwise Percentages on Y Axis
Combine Multiple PDF Plots into One File
How to Find the Package Name in R for a Specific Function
Making Multiple Style References in Google Maps API
Reading and Scanning Ms Word .Doc Files in R
Cast String Directly to Idatetime
R: Interpolation of Nas by Group
Repeat the Re-Sampling Function for 1000 Times? Using Lapply
Addsma Not Drawn on Graph When Called from Function
Why Does As.Matrix Add Extra Spaces When Converting Numeric to Character
How to Retrieve the Client's Current Time and Time Zone When Using Shiny
Draw Bloxplots in R Given 25,50,75 Percentiles and Min and Max Values