How to reshape this data into a useable format?
Here is my attempt to figure out what you need. Modify it to your will if something is not precise. I used 3 libraries, but don't worry. In R these are very often come together and are good to know for future anyway. I could have written the code with basic R but that would have meant much longer code
input.csv
,1971,1971,1971,1972,1972,1972
,var1,var2,var3,var1,var2,var3
person1,37,2,1,65,5,3
person2,65,2,1,123,3,1
person3,23,3,1,13,6,2
Code to modify representation
library(reshape2)
library(tidyr)
library(dplyr)
input = read.table("input.csv", sep=",", na.strings="", header=T)[-1,]
converted_input = input %>%
tidyr::gather(year, value, -X) %>%
dplyr::mutate(
var=paste0("var", as.numeric(gsub("^X.*", "0", gsub(".*\\.([0-9])$", "\\1", year)))+1),
year=gsub("X([^.]+).*", "\\1", year)) %>%
reshape2::dcast(X + year ~ var, value.var="value") %>%
dplyr::rename(person=X)
print(converted_input)
Final result
person year var1 var2 var3
person1 1971 37 2 1
person1 1972 65 5 3
person2 1971 65 2 1
person2 1972 123 3 1
person3 1971 23 3 1
person3 1972 13 6 2
reshape dataframe from columns to rows and collapse cell values
I would use dplyr
rather than reshape.
library(dplyr)
library(tidyr)
Data <- data.frame(a=c(100,0,78),b=c(0,137,117),c=c(111,17,91))
Data %>%
gather(Column, Value) %>%
filter(Value != 0) %>%
group_by(Column) %>%
summarize(Value=paste0(Value,collapse=', '))
The gather function is similar to melt
in reshape
. The group_by
function tells later functions that you want to seperate based off of values in Column
. Finally summarize
calculates whatever summary we want for each of the groups. In this case, paste all the terms together.
Which should give you:
# A tibble: 3 × 2
Column Value
<chr> <chr>
1 a 100, 78
2 b 137, 117
3 c 111, 17, 91
Reshaping from wide to long data while collapsing variable values for same IDs in R
Here's one solution, using dplyr
and tidyr
:
library(dplyr)
library(tidyr)
d <- read.table(
text='PMID;Variable;Value
1;MH;Humans
1;MH;Male
1;MH;Middle Aged
1;RN;Aldosterone
1;RN;Renin
2;MH;Accidents, Traffic
2;MH;Male
2;RN;Antivenins
3;MH;Humans
3;MH;Crotulus
3;MH;Young Adult',
header=TRUE, sep=';', stringsAsFactors=FALSE)
d %>%
group_by(PMID, Variable) %>%
summarise(Value=paste(gsub(' ', '_', Value), collapse=', ')) %>%
spread(Variable, Value)
## Source: local data frame [3 x 3]
## Groups: PMID [3]
##
## # A tibble: 3 x 3
## PMID MH RN
## * <int> <chr> <chr>
## 1 1 Humans, Male, Middle_Aged Aldosterone, Renin
## 2 2 Accidents,_Traffic, Male Antivenins
## 3 3 Humans, Crotulus, Young_Adult <NA>
Collapse every series of four rows in a data frame into a single vector, overwriting missing values
How about this:
library(dplyr)
library(tidyr)
df <- df %>% mutate(obs = rep(1:(nrow(.)/4), each=4))
df <- df %>%
pivot_longer(-obs, names_to="var", values_to="vals") %>%
na.omit() %>%
group_by(obs) %>%
mutate(col = seq_along(obs)) %>%
select(obs, col, vals) %>%
pivot_wider(names_from="col", names_prefix="V", values_from="vals")
df
# # A tibble: 3 x 7
# # Groups: obs [3]
# obs V1 V2 V3 V4 V5 V6
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 Buy Completed 2021-02-11 20:49:19 0.11057 Fee1.00 USD Total199.00 USD
# 2 2 Buy Completed 2021-02-11 20:48:03 82.146 Fee0.50 USD Total100.00 USD
# 3 3 Buy Completed 2021-02-11 20:47:22 30.15 Fee0.64 USD Total127.00 USD
How best to use R to reshape dataframe from long to wide and combine values
library(tidyverse)
df %>%
group_by(ID, Date) %>%
summarize(Procedure = paste0(Procedure, collapse = ", ")) %>%
mutate(col = row_number()) %>%
ungroup() %>%
pivot_wider(names_from = col, values_from = c(Date, Procedure))
This currently requires some reordering afterwards, which could be done like in this answer: https://stackoverflow.com/a/60400134/6851825
# A tibble: 4 x 7
ID Date_1 Date_2 Date_3 Procedure_1 Procedure_2 Procedure_3
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A66 2/2/01 NA NA Sedation, Excision NA NA
2 D55 1/1/01 NA NA Sedation, Excision, Biopsy NA NA
3 G88 5/5/01 6/6/01 7/7/01 Sedation, Biopsy Sedation, Excision Sedation, Re-excision
4 T44 3/3/01 4/4/01 NA Sedation, Biopsy Sedation, Excision NA
R: melt data to collapse 3 columns into 1 column and double that for each row
With tidyr and dplyr,
library(tidyverse)
# gather colors into long key and value columns
df1 %>% gather(color, v, white_v:others_v) %>%
# drop "_v" endings; use regex if you prefer
separate(color, 'color', extra = 'drop') %>%
# add a vector of 1s to spread
mutate(n = 1) %>% # more robust: count(id, count, color, v)
# spread labels and 1s to wide form
spread(color, n, fill = 0)
## id count v others pink white
## 1 1 1 0.400 0 0 1
## 2 1 1 0.500 0 1 0
## 3 1 1 0.600 1 0 0
## 4 1 2 0.500 0 1 1
## 5 1 2 0.747 1 0 0
## 6 1 3 0.570 0 1 0
## 7 1 3 0.870 1 0 1
## 8 2 1 1.200 1 0 0
## 9 2 1 1.500 0 0 1
## 10 2 1 2.500 0 1 0
How to reshape data from long to wide format
Using reshape
function:
reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")
Related Topics
Plotly - Different Colours for Different Surfaces
How to Color Bar Plots When Using ..Prop.. in Ggplot
Install R Packages in Azure Ml
How to Drop Factor Levels While Scraping Data Off Us Census HTML Site
R Read Abbreviated Month Form a Date That Is Not in English
How to Load Any Package in R (Unable to Load Shared Object)
Adding Grouped Mean Values to Column in Data Frame
R Split a Column into Multiple Column by Pattern
Puzzled by Xlim/Ylim Behavior in R
Error When Mapping in Ggmap with API Key (403 Forbidden)
Error Using T.Test() in R - Not Enough 'Y' Observations
Filtering Multiple Columns with Str_Detect
Align Points and Error Bars in Ggplot When Using 'Jitterdodge'
Place 1 Heatmap on Another with Transparency in R
Wordcloud Package: Get "Error in Strwidth(…):Invalid 'Cex' Value"