Reshape Long Structured Data.Table into a Wide Structure Using Data.Table Functionality

Reshape long structured data.table into a wide structure using data.table functionality?

I'll pick an example with unequal groups so that it's easier to illustrate for the general case:

A <- data.table(x=c(1,1,1,2,2), y=c(1,2,3,1,2), v=(1:5)/5)
> A
x y v
1: 1 1 0.2
2: 1 2 0.4
3: 1 3 0.6
4: 2 1 0.8
5: 2 2 1.0

The first step is to get the number of elements/entries for each group of "x" to be the same. Here, for x=1 there are 3 values of y, but only 2 for x=2. So, we'll have to fix that first with NA for x=2, y=3.

setkey(A, x, y)
A[CJ(unique(x), unique(y))]

Now, to get it to wide format, we should group by "x" and use as.list on v as follows:

out <- A[CJ(unique(x), unique(y))][, as.list(v), by=x]
x V1 V2 V3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0 NA

Now, you can set the names of the reshaped columns using reference with setnames as follows:

setnames(out, c("x", as.character(unique(A$y)))

x 1 2 3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0 NA

How can I reshape a data.table (long into wide) without doing a function like sum or mean?

You can use getanID to create a unique .id for the grouping variable id. Then, try with dcast.data.table (or simply dcast from versions 1.9.5 and beyond) and if needed change the column names using setnames

 library(splitstackshape)
res <- dcast(getanID(DT, 'id'), id~.id,value.var='score')
setnames(res, 2:3, paste0('score', 1:2))[]
# id score1 score2
#1: 1 5 4
#2: 2 5 NA
#3: 3 6 NA

Or using only data.table

 dcast(DT[, .id:=paste0('score', 1:.N), by=id],
id~.id, value.var='score')
# id score1 score2
#1: 1 5 4
#2: 2 5 NA
#3: 3 6 NA

Or from the code you were using (less number of characters)

cSplit(DT[, toString(score), by=id], 'V1', ',')
# id V1_1 V1_2
#1: 1 5 4
#2: 2 5 NA
#3: 3 6 NA

Reshape long to wide with two columns to expand in R data.table [R]

You may use dcast -

library(data.table)

setDT(data_sample)
dcast(data_sample, code~rowid(code), value.var = c('name', 'numberdata'))

# code name_1 name_2 numberdata_1 numberdata_2
#1: 1 bill bob 100 400
#2: 2 rob john 300 -500
#3: 3 max joe -200 -400
#4: 4 mitch bart 300 100

Reshaping 2 column data.table from long to wide

Maybe the base R function unstack is the cleanest solution:

unstack(df)
A B C
1 3 4 5
2 1 2 3

Note that this returns a data.frame rather than a data.table, so if you want a data.table at the end:

df2 <- setDT(unstack(df))

will return a data.table.

How can one transform a data table using data.table

dcast() is a data.table function, and I think this is elegant enough:

dcast(da, Gr ~ mmoSze)

Only problem is that you get NA instead of 0, but then you can do:

res = dcast(da, Gr ~ mmoSze)
res[is.na(res)] <- 0

Reshape the data from long to wide format

The crucial point here is to identify which rows belong to which group. The answers by Ronak and akrun both use rleid(Service_tier) assuming that a change in Service_tier indicate the begin of a new group.

This might be suggested by the sample dataset but cannot be taken as guaranteed. IMHO, Service_tier is rather an attribute than a key. As a matter of fact, the OP is testing for NN == EE in his code snippet to switch to a new group.

In the data.table solutions below, grouping is determined by cumsum(shift(NN == EE, fill = TRUE)) which tests for equality fo NN and EE, lags the result to the next row where the next group starts, and enumerates the groups by counting TRUE using cumsum().

In the simplified version (without reshaping), the hops are aggregated by the toString() function:

library(data.table)
setDT(d)[, .(SN = first(SN), hops = toString(NN), Service_tier = first(Service_tier)),
by = .(grp = cumsum(shift(NN == EE, fill = TRUE)))][]
   grp SN       hops Service_tier
1: 1 A B, C economy
2: 2 P Q, S, R regular
3: 3 H I, J, K, L economy

For reshaping from long to wide format, dcast() is used:

library(data.table)
library(magrittr) # piping used to improve readability
w <- setDT(d)[, .(SN = first(SN), hops = NN, Service_tier = first(Service_tier)),
by = .(grp = cumsum(shift(NN == EE, fill = TRUE)))] %>%
dcast(grp + ... ~ rowid(grp, prefix = "hop"), value.var = "hops", fill = "") %>%
setcolorder(c(1:2, 4:ncol(.), 3))

w
   grp SN hop1 hop2 hop3 hop4 Service_tier
1: 1 A B C economy
2: 2 P Q S R regular
3: 3 H I J K L economy

setcolorder() is used to rearrange columns in the order expected by the OP. This is done in-place, i.e., without copying the whole data object.

Data

library(data.table)

d <- fread("SN NN EE Service_tier
A B C economy
B C C economy
P Q R regular
Q S R regular
S R R regular
H I L economy
I J L economy
J K L economy
K L L economy")

reshape dataframe from wide to long in R

Using data.table:

library(data.table)
setDT(mydata)
result <- melt(mydata, id=c('id', 'name'),
measure.vars = patterns(fixed='fixed_', current='current_'),
variable.name = 'year')
years <- as.numeric(gsub('.+_(\\d+)', '\\1', grep('fixed', names(mydata), value = TRUE)))
result[, year:=years[year]]
result[, id:=seq(.N), by=.(name)]
result
## id name year fixed current
## 1: 1 A 2020 2300 3000
## 2: 2 A 2019 2100 3100
## 3: 3 A 2018 2600 3200
## 4: 4 A 2017 2600 3300
## 5: 5 A 2016 1900 3400

This should be very fast but your data-set is not very big tbh.

Note that this assumes the fixed and current columns are in the same order and associated with the same year(s). So if there is a fixed_2020 as the first fixed_* column, there is also a current_2020 as the first current_* column, and so on. Otherwise, the year column will correctly associate with fixed but not current



Related Topics



Leave a reply



Submit