Reshape Data Long to Wide - Understanding Reshape Parameters

Reshape Data Long to Wide - understanding reshape parameters

You can use the function dcast from package reshape2. It's easier to understand. The left side of the formula is the one that stays long, while the right side is the one that goes wide.

The fun.aggregate is the function to apply in case that there is more than 1 number per case. If you're sure you don't have repeated cases, you can use mean or sum

dcast(data, formula= dogid + home + school ~ month + year + trainingtype,
value.var = 'timeincomp',
fun.aggregate = sum)

I hope it works:

  dogid home school 1_2014_1 2_2014_1 12_2015_2
1 12345 1 1 340 360 0
2 31323 7 3 500 520 440

How to reshape data from long to wide format

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

Reshape data table from wide to long with transpose

A better approach is to use the new pivot_londer and pivot_wider functions from the tidyr package.

Easier convention to use and has convenient text manipulation options built in. In this case removing the "X." that was added to column names.

df <- read.table(header=TRUE, text="Mill   Acid `1_day`  `3_days` `1_week` `2_weeks` `4_weeks` `2_months` `3_months` `6-7_months`
Gävle 0 10.5 12.0 10.9 10.7 10.6 10.1 10 9.81
Gävle 0.5 8.79 10 9.29 9.08 9.39 9.13 9.14 8.86
Gävle 0.75 8.05 8.95 8.33 8.26 8.24 8.22 8.25 7.44
Gävle 1 6.7 7.82 7.77 8.02 8.19 7.79 7.97 6.99
Gävle 1.25 6.52 7.43 7.33 7.11 7.72 7.88 7.91 6.96
Gävle 1.5 6.41 7.25 7.28 6.92 7.63 7.01 7.64 6.7
Obbola 0 10.5 12.0 10.9 10.7 10.6 10.1 10 9.81
Obbola 0.5 8.79 10 9.29 9.08 9.39 9.13 9.14 8.86
Obbola 0.75 8.05 8.95 8.33 8.26 8.24 8.22 8.25 7.44
Obbola 1 6.7 7.82 7.77 8.02 8.19 7.79 7.97 6.99
Obbola 1.25 6.52 7.43 7.33 7.11 7.72 7.88 7.91 6.96
Obbola 1.5 6.41 7.25 7.28 6.92 7.63 7.01 7.64 6.7 ")

library(tidyr)

longdf <- df %>% pivot_longer(-c("Mill", "Acid"), names_to="Time", values_to = "value", names_prefix="X.")

answer <-longdf %>% pivot_wider(id_cols= c("Time", "Acid" ), names_from = "Mill" )

Reshape long to wide adding additional columns

library(data.table)
setDT(df)

melt(df, 1)[, i := paste(variable, 1:.N, sep = "_"),
keyby = .(ID, variable)][, dcast(.SD, ID ~ i),
.SDcols = c("ID", "value", "i")]

> ID X_1 X_2 Y_1 Y_2
1: 1 A B A A
2: 2 C <NA> A <NA>
3: 3 A A K A

There you have:

  • first you melt the data, so you have all X and Y values under one column
  • Then you create a new variable that tells you if that's the first or second X or Y, grouped by ID and variable (so they're meaningful)
  • Then you cast to wide that table, keeping ID as a column, and the new variable as column headers. You dropped the column variable, as you already had it coded in i.

Turn wide to long dataframe with R reshape function, with three levels of duplicates

Using reshape2...

library(reshape2)
dw2 <- melt(dw, id.vars="sbj", value.name="res") #melt to long format

#create new variables by splitting column at dots
dw2[, c("AB", "f", "var")] <- t(as.data.frame((strsplit(as.character(dw2$variable),"\\."))))

#reorder variables
dw2 <- dw2[,c("sbj", "AB", "f", "var", "res")]

dw2
sbj AB f var res
1 A A f1 avg 10
2 B A f1 avg 12
3 C A f1 avg 20
4 D A f1 avg 22
5 A A f1 sd 6
6 B A f1 sd 5
7 C A f1 sd 7
8 D A f1 sd 8
9 A A f2 avg 50
10 B A f2 avg 70
11 C A f2 avg 20
12 D A f2 avg 22
13 A A f2 sd 10
14 B A f2 sd 11
15 C A f2 sd 8
16 D A f2 sd 9
17 A B f1 avg 10
18 B B f1 avg 12
19 C B f1 avg 20
20 D B f1 avg 22
21 A B f1 sd 6
22 B B f1 sd 5
23 C B f1 sd 7
24 D B f1 sd 8
25 A B f2 avg 50
26 B B f2 avg 70
27 C B f2 avg 20
28 D B f2 avg 22
29 A B f2 sd 10
30 B B f2 sd 11
31 C B f2 sd 8
32 D B f2 sd 9

How to Reshape DF with categorical variables from long to wide in R?

First, you have to pivot_longer to get the column names you desire into a column. Then I arranged it by the future column names, so the words would be grouped, like your image, then I used pivot_wider. It drops the animal column, so I put it back, then arranged by id, so they would be in the same observation order as your image.

pivot_longer(df, cols = color:country, names_to = "variable", 
values_to = "value") %>% # column names to rows
arrange(variable, value) %>% # organize future column names
pivot_wider(!variable, names_from = value, values_from = animal,
values_fn = list(animal = length), values_fill = 0) %>%
left_join(distinct(df[,c(1,5)])) %>% # add animals back
select(id, animal, everything()) %>% # rearrange columns
arrange(id) # reorder observations

Sample Image

Update based on your comment - ordered by color, fruit, then country

Added mutate and modified the first arrange and pivot_wider:

pivot_longer(df,cols = color:country, names_to = "variable", 
values_to = "value") %>% # future col names to rows
mutate(ordering = ifelse(variable == "color", 1, # create organizer variable
ifelse(variable == "fruit", 2, 3))) %>%
arrange(ordering, value) %>% # organize future column order
pivot_wider(!c(variable,ordering), # make it wide
names_from = value,
values_from = animal,
values_fn = list(animal = length),
values_fill = 0) %>%
left_join(distinct(df[,c(1,5)])) %>% # add the animals back
select(id, animal, everything()) %>% # move animals to 2nd position
arrange(id) # reorder observations

Check it out:
Sample Image

reshape from wide to long group of variables

If you've only got two locations, you can just chuck them in regex, accounting for the fact that they could be at the beginning or end of the name:

library(tidyverse)

df_wide %>%
gather(variable, value, -Month) %>%
mutate(location = sub('.*(Cabo|Acapulco).*', '\\1', variable),
variable = sub('_?(Cabo|Acapulco)_?', '', variable)) %>%
spread(variable, value)
#> # A tibble: 24 x 6
#> Month location BED_BUGS BU_PCT LOS_AVG TOTAL_OCCUPIED
#> * <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Acapulco 3 0.6260116 4.307667 6498
#> 2 1 Cabo 5 0.6470034 5.223000 19216
#> 3 2 Acapulco 0 0.6777457 4.247500 6566
#> 4 2 Cabo 3 0.6167027 5.893571 17095
#> 5 3 Acapulco 1 0.6348126 4.327742 6809
#> 6 3 Cabo 5 0.6372108 5.229677 19556
#> 7 4 Acapulco 6 0.6548170 4.220000 6797
#> 8 4 Cabo 4 0.6357912 5.356667 18883
#> 9 5 Acapulco 5 0.6409659 4.162903 6875
#> 10 5 Cabo 2 0.6449006 5.344194 19792
#> # ... with 14 more rows

reshape from long to wide of non-categorical values

You could do:

a = aggregate(V1~V2,transform(df_long,V2 = cumsum(grepl("id",V1))),paste,collapse=',')[,2]
read.csv(text=a,header = FALSE,fill = TRUE)
V1 V2 V3 V4 V5
1 id A b b d d
2 id B kh kk ip
3 id C 99
4 id D
5 id E

since you need to transform it back, then you should do:

 f<-read.csv(text = with(df_long,tapply(V1,cumsum(grepl("id",V1)),paste0,collapse=",")), 
header = FALSE, fill = TRUE,stringsAsFactors = F,na.strings = "")

print(f,na = "")
V1 V2 V3 V4 V5
1 id A b b d d
2 id B kh kk ip
3 id C 99
4 id D
5 id E

Now to transform it back to your long_data, you could do:

with(g <- transform(stack(f),ind = c(row(f))),na.omit(g[order(ind),]))
values ind
1 id A 1
6 b 1
11 b 1
16 d 1
21 d 1
2 id B 2
7 kh 2
12 kk 2
17 ip 2
3 id C 3
8 99 3
4 id D 4
5 id E 5

Reshape Long to Wide Data in R

Given your data is ordered by userID and sessionID, and each row is a unique session, you could do:

library(data.table)

# Transform data into data.frame
df <- data.table(df)
df[, id := sequence(.N), by = c("userID")] # session sequence number per user

# Spread columns
reshape(df, timevar = "id", idvar = "userID", direction = "wide")
# userID date.1 sessionID.1 userType.1 date.2 sessionID.2 userType.2 date.3 sessionID.3 userType.3
#1 100105276 2015-01-01 1452632119 New Visitor 2015-01-02 1452634303 Returning Visitor 2015-01-02 1452637067 Returning Visitor

In this output userType is also included as a variable, but you can always drop them afterwards.



Related Topics



Leave a reply



Submit