Reshape R Data with User Entries in Rows, Collapsing for Each User

How to reshape this data into a useable format?

Here is my attempt to figure out what you need. Modify it to your will if something is not precise. I used 3 libraries, but don't worry. In R these are very often come together and are good to know for future anyway. I could have written the code with basic R but that would have meant much longer code

input.csv

,1971,1971,1971,1972,1972,1972
,var1,var2,var3,var1,var2,var3
person1,37,2,1,65,5,3
person2,65,2,1,123,3,1
person3,23,3,1,13,6,2

Code to modify representation

library(reshape2)
library(tidyr)
library(dplyr)

input = read.table("input.csv", sep=",", na.strings="", header=T)[-1,]
converted_input = input %>%
  tidyr::gather(year, value, -X) %>%
  dplyr::mutate(
    var=paste0("var", as.numeric(gsub("^X.*", "0", gsub(".*\\.([0-9])$", "\\1", year)))+1),
    year=gsub("X([^.]+).*", "\\1", year)) %>%
  reshape2::dcast(X + year ~ var, value.var="value") %>%
  dplyr::rename(person=X)

print(converted_input)

Final result

 person year var1 var2 var3
person1 1971   37    2    1
person1 1972   65    5    3
person2 1971   65    2    1
person2 1972  123    3    1
person3 1971   23    3    1
person3 1972   13    6    2

reshape dataframe from columns to rows and collapse cell values

I would use dplyr rather than reshape.

library(dplyr)
library(tidyr)

Data <- data.frame(a=c(100,0,78),b=c(0,137,117),c=c(111,17,91))

Data %>%
  gather(Column, Value) %>%
  filter(Value != 0) %>%
  group_by(Column) %>%
  summarize(Value=paste0(Value,collapse=', '))

The gather function is similar to melt in reshape. The group_by function tells later functions that you want to seperate based off of values in Column. Finally summarize calculates whatever summary we want for each of the groups. In this case, paste all the terms together.

Which should give you:

# A tibble: 3 × 2
  Column       Value
   <chr>       <chr>
1      a     100, 78
2      b    137, 117
3      c 111, 17, 91

Reshaping from wide to long data while collapsing variable values for same IDs in R

Here's one solution, using dplyr and tidyr:

library(dplyr)
library(tidyr)

d <- read.table(
text='PMID;Variable;Value
1;MH;Humans
1;MH;Male
1;MH;Middle Aged
1;RN;Aldosterone
1;RN;Renin
2;MH;Accidents, Traffic
2;MH;Male
2;RN;Antivenins
3;MH;Humans
3;MH;Crotulus
3;MH;Young Adult', 
header=TRUE, sep=';', stringsAsFactors=FALSE)

d %>% 
  group_by(PMID, Variable) %>% 
  summarise(Value=paste(gsub(' ', '_', Value), collapse=', ')) %>% 
  spread(Variable, Value)

## Source: local data frame [3 x 3]
## Groups: PMID [3]
## 
## # A tibble: 3 x 3
##    PMID                            MH                  RN
## * <int>                         <chr>               <chr>
## 1     1     Humans, Male, Middle_Aged  Aldosterone, Renin
## 2     2      Accidents,_Traffic, Male          Antivenins
## 3     3 Humans, Crotulus, Young_Adult                <NA>

Collapse every series of four rows in a data frame into a single vector, overwriting missing values

How about this:

library(dplyr)
library(tidyr)
df <- df %>% mutate(obs = rep(1:(nrow(.)/4), each=4))
df <- df %>% 
  pivot_longer(-obs, names_to="var", values_to="vals") %>% 
  na.omit() %>% 
  group_by(obs) %>% 
  mutate(col = seq_along(obs)) %>% 
  select(obs, col, vals) %>% 
  pivot_wider(names_from="col", names_prefix="V", values_from="vals")
df
# # A tibble: 3 x 7
# # Groups:   obs [3]
#     obs V1    V2        V3                  V4      V5          V6             
#   <int> <chr> <chr>     <chr>               <chr>   <chr>       <chr>          
# 1     1 Buy   Completed 2021-02-11 20:49:19 0.11057 Fee1.00 USD Total199.00 USD
# 2     2 Buy   Completed 2021-02-11 20:48:03 82.146  Fee0.50 USD Total100.00 USD
# 3     3 Buy   Completed 2021-02-11 20:47:22 30.15   Fee0.64 USD Total127.00 USD

How best to use R to reshape dataframe from long to wide and combine values

library(tidyverse)
df %>%
  group_by(ID, Date) %>%
  summarize(Procedure = paste0(Procedure, collapse = ", ")) %>%
  mutate(col = row_number()) %>%
  ungroup() %>%
  pivot_wider(names_from = col, values_from = c(Date, Procedure))

This currently requires some reordering afterwards, which could be done like in this answer: https://stackoverflow.com/a/60400134/6851825

# A tibble: 4 x 7
  ID    Date_1 Date_2 Date_3 Procedure_1                Procedure_2        Procedure_3          
  <chr> <chr>  <chr>  <chr>  <chr>                      <chr>              <chr>                
1 A66   2/2/01 NA     NA     Sedation, Excision         NA                 NA                   
2 D55   1/1/01 NA     NA     Sedation, Excision, Biopsy NA                 NA                   
3 G88   5/5/01 6/6/01 7/7/01 Sedation, Biopsy           Sedation, Excision Sedation, Re-excision
4 T44   3/3/01 4/4/01 NA     Sedation, Biopsy           Sedation, Excision NA

R: melt data to collapse 3 columns into 1 column and double that for each row

With tidyr and dplyr,

library(tidyverse)

        # gather colors into long key and value columns
df1 %>% gather(color, v, white_v:others_v) %>% 
    # drop "_v" endings; use regex if you prefer
    separate(color, 'color', extra = 'drop') %>% 
    # add a vector of 1s to spread
    mutate(n = 1) %>%    # more robust: count(id, count, color, v)
    # spread labels and 1s to wide form
    spread(color, n, fill = 0)

##    id count     v others pink white
## 1   1     1 0.400      0    0     1
## 2   1     1 0.500      0    1     0
## 3   1     1 0.600      1    0     0
## 4   1     2 0.500      0    1     1
## 5   1     2 0.747      1    0     0
## 6   1     3 0.570      0    1     0
## 7   1     3 0.870      1    0     1
## 8   2     1 1.200      1    0     0
## 9   2     1 1.500      0    0     1
## 10  2     1 2.500      0    1     0

How to reshape data from long to wide format

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")