Reshape R Data with User Entries in Rows, Collapsing for Each User

How to reshape this data into a useable format?

Here is my attempt to figure out what you need. Modify it to your will if something is not precise. I used 3 libraries, but don't worry. In R these are very often come together and are good to know for future anyway. I could have written the code with basic R but that would have meant much longer code

input.csv

,1971,1971,1971,1972,1972,1972
,var1,var2,var3,var1,var2,var3
person1,37,2,1,65,5,3
person2,65,2,1,123,3,1
person3,23,3,1,13,6,2

Code to modify representation

library(reshape2)
library(tidyr)
library(dplyr)

input = read.table("input.csv", sep=",", na.strings="", header=T)[-1,]
converted_input = input %>%
tidyr::gather(year, value, -X) %>%
dplyr::mutate(
var=paste0("var", as.numeric(gsub("^X.*", "0", gsub(".*\\.([0-9])$", "\\1", year)))+1),
year=gsub("X([^.]+).*", "\\1", year)) %>%
reshape2::dcast(X + year ~ var, value.var="value") %>%
dplyr::rename(person=X)

print(converted_input)

Final result

 person year var1 var2 var3
person1 1971 37 2 1
person1 1972 65 5 3
person2 1971 65 2 1
person2 1972 123 3 1
person3 1971 23 3 1
person3 1972 13 6 2

reshape dataframe from columns to rows and collapse cell values

I would use dplyr rather than reshape.

library(dplyr)
library(tidyr)

Data <- data.frame(a=c(100,0,78),b=c(0,137,117),c=c(111,17,91))

Data %>%
gather(Column, Value) %>%
filter(Value != 0) %>%
group_by(Column) %>%
summarize(Value=paste0(Value,collapse=', '))

The gather function is similar to melt in reshape. The group_by function tells later functions that you want to seperate based off of values in Column. Finally summarize calculates whatever summary we want for each of the groups. In this case, paste all the terms together.

Which should give you:

# A tibble: 3 × 2
Column Value
<chr> <chr>
1 a 100, 78
2 b 137, 117
3 c 111, 17, 91

Reshaping from wide to long data while collapsing variable values for same IDs in R

Here's one solution, using dplyr and tidyr:

library(dplyr)
library(tidyr)

d <- read.table(
text='PMID;Variable;Value
1;MH;Humans
1;MH;Male
1;MH;Middle Aged
1;RN;Aldosterone
1;RN;Renin
2;MH;Accidents, Traffic
2;MH;Male
2;RN;Antivenins
3;MH;Humans
3;MH;Crotulus
3;MH;Young Adult',
header=TRUE, sep=';', stringsAsFactors=FALSE)

d %>%
group_by(PMID, Variable) %>%
summarise(Value=paste(gsub(' ', '_', Value), collapse=', ')) %>%
spread(Variable, Value)

## Source: local data frame [3 x 3]
## Groups: PMID [3]
##
## # A tibble: 3 x 3
## PMID MH RN
## * <int> <chr> <chr>
## 1 1 Humans, Male, Middle_Aged Aldosterone, Renin
## 2 2 Accidents,_Traffic, Male Antivenins
## 3 3 Humans, Crotulus, Young_Adult <NA>

Collapse every series of four rows in a data frame into a single vector, overwriting missing values

How about this:

library(dplyr)
library(tidyr)
df <- df %>% mutate(obs = rep(1:(nrow(.)/4), each=4))
df <- df %>%
pivot_longer(-obs, names_to="var", values_to="vals") %>%
na.omit() %>%
group_by(obs) %>%
mutate(col = seq_along(obs)) %>%
select(obs, col, vals) %>%
pivot_wider(names_from="col", names_prefix="V", values_from="vals")
df
# # A tibble: 3 x 7
# # Groups: obs [3]
# obs V1 V2 V3 V4 V5 V6
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 Buy Completed 2021-02-11 20:49:19 0.11057 Fee1.00 USD Total199.00 USD
# 2 2 Buy Completed 2021-02-11 20:48:03 82.146 Fee0.50 USD Total100.00 USD
# 3 3 Buy Completed 2021-02-11 20:47:22 30.15 Fee0.64 USD Total127.00 USD

How best to use R to reshape dataframe from long to wide and combine values

library(tidyverse)
df %>%
group_by(ID, Date) %>%
summarize(Procedure = paste0(Procedure, collapse = ", ")) %>%
mutate(col = row_number()) %>%
ungroup() %>%
pivot_wider(names_from = col, values_from = c(Date, Procedure))

This currently requires some reordering afterwards, which could be done like in this answer: https://stackoverflow.com/a/60400134/6851825

# A tibble: 4 x 7
ID Date_1 Date_2 Date_3 Procedure_1 Procedure_2 Procedure_3
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A66 2/2/01 NA NA Sedation, Excision NA NA
2 D55 1/1/01 NA NA Sedation, Excision, Biopsy NA NA
3 G88 5/5/01 6/6/01 7/7/01 Sedation, Biopsy Sedation, Excision Sedation, Re-excision
4 T44 3/3/01 4/4/01 NA Sedation, Biopsy Sedation, Excision NA

R: melt data to collapse 3 columns into 1 column and double that for each row

With tidyr and dplyr,

library(tidyverse)

# gather colors into long key and value columns
df1 %>% gather(color, v, white_v:others_v) %>%
# drop "_v" endings; use regex if you prefer
separate(color, 'color', extra = 'drop') %>%
# add a vector of 1s to spread
mutate(n = 1) %>% # more robust: count(id, count, color, v)
# spread labels and 1s to wide form
spread(color, n, fill = 0)

## id count v others pink white
## 1 1 1 0.400 0 0 1
## 2 1 1 0.500 0 1 0
## 3 1 1 0.600 1 0 0
## 4 1 2 0.500 0 1 1
## 5 1 2 0.747 1 0 0
## 6 1 3 0.570 0 1 0
## 7 1 3 0.870 1 0 1
## 8 2 1 1.200 1 0 0
## 9 2 1 1.500 0 0 1
## 10 2 1 2.500 0 1 0

How to reshape data from long to wide format

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")


Related Topics



Leave a reply



Submit