Reshape long structured data.table into a wide structure using data.table functionality?
I'll pick an example with unequal groups so that it's easier to illustrate for the general case:
A <- data.table(x=c(1,1,1,2,2), y=c(1,2,3,1,2), v=(1:5)/5)
> A
x y v
1: 1 1 0.2
2: 1 2 0.4
3: 1 3 0.6
4: 2 1 0.8
5: 2 2 1.0
The first step is to get the number of elements/entries for each group of "x" to be the same. Here, for x=1 there are 3 values of y, but only 2 for x=2. So, we'll have to fix that first with NA for x=2, y=3.
setkey(A, x, y)
A[CJ(unique(x), unique(y))]
Now, to get it to wide format, we should group by "x" and use as.list
on v
as follows:
out <- A[CJ(unique(x), unique(y))][, as.list(v), by=x]
x V1 V2 V3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0 NA
Now, you can set the names of the reshaped columns using reference with setnames
as follows:
setnames(out, c("x", as.character(unique(A$y)))
x 1 2 3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0 NA
How can I reshape a data.table (long into wide) without doing a function like sum or mean?
You can use getanID
to create a unique .id
for the grouping variable id
. Then, try with dcast.data.table
(or simply dcast
from versions 1.9.5 and beyond) and if needed change the column names using setnames
library(splitstackshape)
res <- dcast(getanID(DT, 'id'), id~.id,value.var='score')
setnames(res, 2:3, paste0('score', 1:2))[]
# id score1 score2
#1: 1 5 4
#2: 2 5 NA
#3: 3 6 NA
Or using only data.table
dcast(DT[, .id:=paste0('score', 1:.N), by=id],
id~.id, value.var='score')
# id score1 score2
#1: 1 5 4
#2: 2 5 NA
#3: 3 6 NA
Or from the code you were using (less number of characters)
cSplit(DT[, toString(score), by=id], 'V1', ',')
# id V1_1 V1_2
#1: 1 5 4
#2: 2 5 NA
#3: 3 6 NA
Reshape long to wide with two columns to expand in R data.table [R]
You may use dcast
-
library(data.table)
setDT(data_sample)
dcast(data_sample, code~rowid(code), value.var = c('name', 'numberdata'))
# code name_1 name_2 numberdata_1 numberdata_2
#1: 1 bill bob 100 400
#2: 2 rob john 300 -500
#3: 3 max joe -200 -400
#4: 4 mitch bart 300 100
Reshaping 2 column data.table from long to wide
Maybe the base R function unstack
is the cleanest solution:
unstack(df)
A B C
1 3 4 5
2 1 2 3
Note that this returns a data.frame rather than a data.table, so if you want a data.table at the end:
df2 <- setDT(unstack(df))
will return a data.table.
How can one transform a data table using data.table
dcast()
is a data.table
function, and I think this is elegant enough:
dcast(da, Gr ~ mmoSze)
Only problem is that you get NA
instead of 0
, but then you can do:
res = dcast(da, Gr ~ mmoSze)
res[is.na(res)] <- 0
Reshape the data from long to wide format
The crucial point here is to identify which rows belong to which group. The answers by Ronak and akrun both use rleid(Service_tier)
assuming that a change in Service_tier
indicate the begin of a new group.
This might be suggested by the sample dataset but cannot be taken as guaranteed. IMHO, Service_tier
is rather an attribute than a key. As a matter of fact, the OP is testing for NN == EE
in his code snippet to switch to a new group.
In the data.table solutions below, grouping is determined by cumsum(shift(NN == EE, fill = TRUE))
which tests for equality fo NN
and EE
, lags the result to the next row where the next group starts, and enumerates the groups by counting TRUE
using cumsum()
.
In the simplified version (without reshaping), the hops are aggregated by the toString()
function:
library(data.table)
setDT(d)[, .(SN = first(SN), hops = toString(NN), Service_tier = first(Service_tier)),
by = .(grp = cumsum(shift(NN == EE, fill = TRUE)))][]
grp SN hops Service_tier
1: 1 A B, C economy
2: 2 P Q, S, R regular
3: 3 H I, J, K, L economy
For reshaping from long to wide format, dcast()
is used:
library(data.table)
library(magrittr) # piping used to improve readability
w <- setDT(d)[, .(SN = first(SN), hops = NN, Service_tier = first(Service_tier)),
by = .(grp = cumsum(shift(NN == EE, fill = TRUE)))] %>%
dcast(grp + ... ~ rowid(grp, prefix = "hop"), value.var = "hops", fill = "") %>%
setcolorder(c(1:2, 4:ncol(.), 3))
w
grp SN hop1 hop2 hop3 hop4 Service_tier
1: 1 A B C economy
2: 2 P Q S R regular
3: 3 H I J K L economy
setcolorder()
is used to rearrange columns in the order expected by the OP. This is done in-place, i.e., without copying the whole data object.
Data
library(data.table)
d <- fread("SN NN EE Service_tier
A B C economy
B C C economy
P Q R regular
Q S R regular
S R R regular
H I L economy
I J L economy
J K L economy
K L L economy")
reshape dataframe from wide to long in R
Using data.table
:
library(data.table)
setDT(mydata)
result <- melt(mydata, id=c('id', 'name'),
measure.vars = patterns(fixed='fixed_', current='current_'),
variable.name = 'year')
years <- as.numeric(gsub('.+_(\\d+)', '\\1', grep('fixed', names(mydata), value = TRUE)))
result[, year:=years[year]]
result[, id:=seq(.N), by=.(name)]
result
## id name year fixed current
## 1: 1 A 2020 2300 3000
## 2: 2 A 2019 2100 3100
## 3: 3 A 2018 2600 3200
## 4: 4 A 2017 2600 3300
## 5: 5 A 2016 1900 3400
This should be very fast but your data-set is not very big tbh.
Note that this assumes the fixed and current columns are in the same order and associated with the same year(s). So if there is a fixed_2020
as the first fixed_*
column, there is also a current_2020
as the first current_*
column, and so on. Otherwise, the year
column will correctly associate with fixed
but not current
Related Topics
R Lubridate Converting Seconds to Date
Faster Way to Find the First True Value in a Vector
Percentage Histogram with Facet_Wrap
Constrain Multiple Sliderinput in Shiny to Sum to 100
How to Find Previous Sunday in R
Plotting Continuous and Discrete Series in Ggplot with Facet
R: Merge Based on Multiple Conditions (With Non-Equal Criteria)
Rotate Labels in a Chorddiagram (R Circlize)
How to Reset All Options() Arguments to Their Default Values
How to Pass the "..." Parameters in the Parent Function to Its Two Children Functions in R
Write a Data Frame to CSV File Without Column Header in R
String Split on Last Comma in R
Multiple Filled.Contour Plots in One Graph Using with Par(Mfrow=C())
How Can Library() Accept Both Quoted and Unquoted Strings
R: Selecting Subset Without Copying
Fast Way of Getting Index of Match in List
R: Creating a Map of Selected Canadian Provinces and U.S. States