Reshaping data frame in R
reshape always seems tricky to me too, but it always seems to work with a little trial and error. Here's what I ended up finding:
> x
unique_id seq response detailed.name treatment
1 a N1 123.23 dN1 T1
2 a N2 231.12 dN2 T1
3 a N3 231.23 dN3 T1
4 b N1 343.23 dN1 T2
5 b N2 281.13 dN2 T2
6 b N3 901.23 dN3 T2
> x2 <- melt(x, c("seq", "detailed.name", "treatment"), "response")
> x2
seq detailed.name treatment variable value
1 N1 dN1 T1 response 123.23
2 N2 dN2 T1 response 231.12
3 N3 dN3 T1 response 231.23
4 N1 dN1 T2 response 343.23
5 N2 dN2 T2 response 281.13
6 N3 dN3 T2 response 901.23
> cast(x2, seq + detailed.name ~ treatment)
seq detailed.name T1 T2
1 N1 dN1 123.23 343.23
2 N2 dN2 231.12 281.13
3 N3 dN3 231.23 901.23
Your original data was already in long format, but not in the long format that melt/cast uses. So I re-melted it. The second argument (id.vars) is list of things not to melt. The third argument (measure.vars) is the list of things that vary.
Then, the cast uses a formula. Left of the tilde are the things that stay as they are, and right of the tilde are the columns that are used to condition the value column.
More or less...!
reshape dataframe from wide to long in R
Using data.table
:
library(data.table)
setDT(mydata)
result <- melt(mydata, id=c('id', 'name'),
measure.vars = patterns(fixed='fixed_', current='current_'),
variable.name = 'year')
years <- as.numeric(gsub('.+_(\\d+)', '\\1', grep('fixed', names(mydata), value = TRUE)))
result[, year:=years[year]]
result[, id:=seq(.N), by=.(name)]
result
## id name year fixed current
## 1: 1 A 2020 2300 3000
## 2: 2 A 2019 2100 3100
## 3: 3 A 2018 2600 3200
## 4: 4 A 2017 2600 3300
## 5: 5 A 2016 1900 3400
This should be very fast but your data-set is not very big tbh.
Note that this assumes the fixed and current columns are in the same order and associated with the same year(s). So if there is a fixed_2020
as the first fixed_*
column, there is also a current_2020
as the first current_*
column, and so on. Otherwise, the year
column will correctly associate with fixed
but not current
Reshaping data to wide format in R
Create a row number column for each id
and reshape the data to wide format.
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
mutate(col = row_number()) %>%
ungroup %>%
pivot_wider(names_from = col, values_from = x:stop)
# A tibble: 10 x 41
# id x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10
# <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 A B C D E F G H I J
# 2 2 A B C D E F G H I J
# 3 3 A B C D E F G H I J
# 4 4 A B C D E F G H I J
# 5 5 A B C D E F G H I J
# 6 6 A B C D E F G H I J
# 7 7 A B C D E F G H I J
# 8 8 A B C D E F G H I J
# 9 9 A B C D E F G H I J
#10 10 A B C D E F G H I J
# … with 30 more variables: y_1 <chr>, y_2 <chr>, y_3 <chr>,
# y_4 <chr>, y_5 <chr>, y_6 <chr>, y_7 <chr>, y_8 <chr>, y_9 <chr>,
# y_10 <chr>, start_1 <date>, start_2 <date>, start_3 <date>,
# start_4 <date>, start_5 <date>, start_6 <date>, start_7 <date>,
# start_8 <date>, start_9 <date>, start_10 <date>, stop_1 <date>,
# stop_2 <date>, stop_3 <date>, stop_4 <date>, stop_5 <date>,
# stop_6 <date>, stop_7 <date>, stop_8 <date>, stop_9 <date>,
# stop_10 <date>
Reshaping segmented dataframe in R
R is for wusses. Let's just write C.
reblock <- function (data, x, y) {
cols <- as.list(data) # ncol items, each length nrow
reblocked <- as.data.frame(matrix(NA, 0, x))
rn <- names(data)[seq_len(x)]
names(reblocked) <- rn
while (nrow(data) >= y) {
rows <- data[seq_len(y), ]
while (ncol(rows) >= x) {
names(rows)[seq_len(x)] <- rn
reblocked <- rbind(reblocked, rows[seq_len(x)])
rows <- rows[-seq_len(x)]
}
# remove x,y block
data <- data[-seq_len(y), ]
}
reblocked
}
tmp <- data.frame(
a = rep(1:4, each = 6),
b = rep(letters[1:4], each = 6),
c = rep(5:8, each = 6),
d = rep(letters[5:8], each = 6)
)
reblock(tmp, 2 ,6)
Reshaping data.frame from wide to long format
reshape()
takes a while to get used to, just as melt
/cast
. Here is a solution with reshape, assuming your data frame is called d
:
reshape(d,
direction = "long",
varying = list(names(d)[3:7]),
v.names = "Value",
idvar = c("Code", "Country"),
timevar = "Year",
times = 1950:1954)
Related Topics
Interpolate/Extend Quarterly to Monthly Series
Get the Column Number in R Given the Column Name
How to Automatically Include All 2-Way Interactions in a Glm Model in R
Use Ls() or Objects() to Get Objects of Class Data.Frame
Enter New Column Names as String in Dplyr's Rename Function
How to Not Display Number as Exponent
How to Create a Different Report for Each Subset of a Data Frame with R Markdown
Clustering Very Large Dataset in R
Dynamically Adjust Height And/Or Width of Shiny-Plotly Output Based on Window Size
How to Run Lm Regression for Every Column in R
Does the Ternary Operator Exist in R
R Not Finding Package Even After Package Installation
How to Add an Inset (Subplot) to "Topright" of an R Plot
How to Properly Document a S3 Method of a Generic from a Different Package, Using Roxygen
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
How to Remove Empty Data Frames from a List
Join Data.Table on Exact Date or If Not the Case on the Nearest Less Than Date