Reshaping Data Frame in R

Reshaping data frame in R

reshape always seems tricky to me too, but it always seems to work with a little trial and error. Here's what I ended up finding:

> x
  unique_id seq response detailed.name treatment
1         a  N1   123.23           dN1        T1
2         a  N2   231.12           dN2        T1
3         a  N3   231.23           dN3        T1
4         b  N1   343.23           dN1        T2
5         b  N2   281.13           dN2        T2
6         b  N3   901.23           dN3        T2

> x2 <- melt(x, c("seq", "detailed.name", "treatment"), "response")
> x2
  seq detailed.name treatment variable  value
1  N1           dN1        T1 response 123.23
2  N2           dN2        T1 response 231.12
3  N3           dN3        T1 response 231.23
4  N1           dN1        T2 response 343.23
5  N2           dN2        T2 response 281.13
6  N3           dN3        T2 response 901.23

> cast(x2, seq + detailed.name ~ treatment)
  seq detailed.name     T1     T2
1  N1           dN1 123.23 343.23
2  N2           dN2 231.12 281.13
3  N3           dN3 231.23 901.23

Your original data was already in long format, but not in the long format that melt/cast uses. So I re-melted it. The second argument (id.vars) is list of things not to melt. The third argument (measure.vars) is the list of things that vary.

Then, the cast uses a formula. Left of the tilde are the things that stay as they are, and right of the tilde are the columns that are used to condition the value column.

More or less...!

reshape dataframe from wide to long in R

Using data.table:

library(data.table)
setDT(mydata)
result <- melt(mydata, id=c('id', 'name'), 
                 measure.vars = patterns(fixed='fixed_', current='current_'), 
                 variable.name = 'year')
years <- as.numeric(gsub('.+_(\\d+)', '\\1', grep('fixed', names(mydata), value = TRUE)))
result[, year:=years[year]]
result[, id:=seq(.N), by=.(name)]
result
##    id name year fixed current
## 1:  1    A 2020  2300    3000
## 2:  2    A 2019  2100    3100
## 3:  3    A 2018  2600    3200
## 4:  4    A 2017  2600    3300
## 5:  5    A 2016  1900    3400

This should be very fast but your data-set is not very big tbh.

Note that this assumes the fixed and current columns are in the same order and associated with the same year(s). So if there is a fixed_2020 as the first fixed_* column, there is also a current_2020 as the first current_* column, and so on. Otherwise, the year column will correctly associate with fixed but not current

Reshaping data to wide format in R

Create a row number column for each id and reshape the data to wide format.

library(dplyr)
library(tidyr)

df %>%
  group_by(id) %>%
  mutate(col = row_number()) %>%
  ungroup %>%
  pivot_wider(names_from = col, values_from = x:stop)

# A tibble: 10 x 41
#      id x_1   x_2   x_3   x_4   x_5   x_6   x_7   x_8   x_9   x_10 
#   <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1     1 A     B     C     D     E     F     G     H     I     J    
# 2     2 A     B     C     D     E     F     G     H     I     J    
# 3     3 A     B     C     D     E     F     G     H     I     J    
# 4     4 A     B     C     D     E     F     G     H     I     J    
# 5     5 A     B     C     D     E     F     G     H     I     J    
# 6     6 A     B     C     D     E     F     G     H     I     J    
# 7     7 A     B     C     D     E     F     G     H     I     J    
# 8     8 A     B     C     D     E     F     G     H     I     J    
# 9     9 A     B     C     D     E     F     G     H     I     J    
#10    10 A     B     C     D     E     F     G     H     I     J    
# … with 30 more variables: y_1 <chr>, y_2 <chr>, y_3 <chr>,
#   y_4 <chr>, y_5 <chr>, y_6 <chr>, y_7 <chr>, y_8 <chr>, y_9 <chr>,
#   y_10 <chr>, start_1 <date>, start_2 <date>, start_3 <date>,
#   start_4 <date>, start_5 <date>, start_6 <date>, start_7 <date>,
#   start_8 <date>, start_9 <date>, start_10 <date>, stop_1 <date>,
#   stop_2 <date>, stop_3 <date>, stop_4 <date>, stop_5 <date>,
#   stop_6 <date>, stop_7 <date>, stop_8 <date>, stop_9 <date>,
#   stop_10 <date>

Reshaping segmented dataframe in R

R is for wusses. Let's just write C.

reblock <- function (data, x, y) {
  cols <- as.list(data) # ncol items, each length nrow
  reblocked <- as.data.frame(matrix(NA, 0, x))
  rn <- names(data)[seq_len(x)]
  names(reblocked) <- rn
  
  while (nrow(data) >= y) {
    rows <- data[seq_len(y), ]
    while (ncol(rows) >= x) {
      names(rows)[seq_len(x)] <- rn
      reblocked <- rbind(reblocked, rows[seq_len(x)])
      rows <- rows[-seq_len(x)]
    }
    # remove x,y block
    data <- data[-seq_len(y), ]
  } 

  reblocked
}

tmp <- data.frame(
         a = rep(1:4, each = 6), 
         b = rep(letters[1:4], each = 6), 
         c = rep(5:8, each = 6), 
         d = rep(letters[5:8], each = 6)
       )
reblock(tmp, 2 ,6)

Reshaping data.frame from wide to long format

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

reshape(d, 
        direction = "long",
        varying = list(names(d)[3:7]),
        v.names = "Value",
        idvar = c("Code", "Country"),
        timevar = "Year",
        times = 1950:1954)

Reshaping Data Frame in R