Reshape Multiple Value Columns to Wide Format

Reshape multiple value columns to wide format

Your best option is to reshape your data to long format, using melt, and then to dcast:

library(reshape2)

meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)

The first few lines of output:

             expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000

Reshaping from long to wide with multiple columns

pivot_wider may be easier

library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(time = str_c('t', time)) %>%
pivot_wider(names_from = time, values_from = c(age, height))

-output

# A tibble: 2 × 5
PIN age_t1 age_t2 height_t1 height_t2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 84 86 58 58
2 1002 22 24 60 62

With reshape from base R, it may need a sequence column

out <- reshape(transform(df, rn = ave(seq_along(PIN), PIN,
FUN = seq_along)), idvar = "PIN",
direction = "wide", timevar = "time", sep = "_")
out[!startsWith(names(out), 'rn_')]
PIN age_1 height_1 age_2 height_2
1 1001 84 58 86 58
3 1002 22 60 24 62

Reshape dataframe without “timevar” and multiple value columns from long to wide format

Third option using dcast from data.table. We create the missing 'time variable' with rowid(key):

library(data.table)
# convert data to a data.table object
setDT(data)
# reshape
dcast(data, key ~ rowid(key), value.var = c("acitity", "intervall"))

Result

#    key acitity_1    acitity_2 acitity_3   acitity_4 intervall_1 intervall_2 intervall_3 intervall_4
#1: A watering remove weeds cut remove leaf 5 7 6 1
#2: B watering remove weeds cut fertilize 8 4 2 3

Convert data from long format to wide format with multiple measure columns

In order to handle multiple variables like you want, you need to melt the data you have before casting it.

library("reshape2")

dcast(melt(my.df, id.vars=c("ID", "TIME")), ID~variable+TIME)

which gives

  ID X_1 X_2 X_3 X_4 X_5 Y_1 Y_2 Y_3 Y_4 Y_5
1 A 1 4 7 10 13 16 19 22 25 28
2 B 2 5 8 11 14 17 20 23 26 29
3 C 3 6 9 12 15 18 21 24 27 30

EDIT based on comment:

The data frame

num.id = 10 
num.time=10
my.df <- data.frame(ID=rep(LETTERS[1:num.id], num.time),
TIME=rep(1:num.time, each=num.id),
X=1:(num.id*num.time),
Y=(num.id*num.time)+1:(2*length(1:(num.id*num.time))))

gives a different result (all entries are 2) because the ID/TIME combination does not indicate a unique row. In fact, there are two rows with each ID/TIME combinations. reshape2 assumes a single value for each possible combination of the variables and will apply a summary function to create a single variable is there are multiple entries. That is why there is the warning

Aggregation function missing: defaulting to length

You can get something that works if you add another variable which breaks that redundancy.

my.df$cycle <- rep(1:2, each=num.id*num.time)
dcast(melt(my.df, id.vars=c("cycle", "ID", "TIME")), cycle+ID~variable+TIME)

This works because cycle/ID/time now uniquely defines a row in my.df.

Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer() of the tidyr package since version 1.0.0.

This is superior to the previous tidyr strategy of gather() than spread() (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).

library("tidyr")
library("magrittr")

a <- structure(list(ID = 1L,
DateRange1Start = structure(7305, class = "Date"),
DateRange1End = structure(7307, class = "Date"),
Value1 = 4.4,
DateRange2Start = structure(7793, class = "Date"),
DateRange2End = structure(7856, class = "Date"),
Value2 = 6.2,
DateRange3Start = structure(9255, class = "Date"),
DateRange3End = structure(9653, class = "Date"),
Value3 = 3.3),
row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

pivot_longer() (counterpart: pivot_wider()) works similar to gather().
However, it offers additional functionality such as multiple value columns.
With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to.
For multiple value columns, names_to may receive multiple new names.

This is easiest if all column names follow a specific pattern like Start_1, End_1, Start_2, etc.
Therefore, I renamed the columns in the first step.

(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
#> [1] "ID" "DateRangeStart_1" "DateRangeEnd_1"
#> [4] "Value_1" "DateRangeStart_2" "DateRangeEnd_2"
#> [7] "Value_2" "DateRangeStart_3" "DateRangeEnd_3"
#> [10] "Value_3"

pivot_longer(a,
cols = -ID,
names_to = c(".value", "group"),
# names_prefix = "DateRange",
names_sep = "_")
#> # A tibble: 3 x 5
#> ID group DateRangeEnd DateRangeStart Value
#> <int> <chr> <date> <date> <dbl>
#> 1 1 1 1990-01-03 1990-01-01 4.4
#> 2 1 2 1991-07-06 1991-05-04 6.2
#> 3 1 3 1996-06-06 1995-05-05 3.3

Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

spec <- a %>%
build_longer_spec(cols = -ID) %>%
dplyr::transmute(.name = .name,
group = readr::parse_number(name),
.value = stringr::str_extract(name, "Start|End|Value"))

pivot_longer(a, spec = spec)

Created on 2019-03-26 by the reprex package (v0.2.1)

See also: https://tidyr.tidyverse.org/articles/pivot.html

R: Reshaping Multiple Columns from Long to Wide

An option would be to replace the duplicated elements by 'Letter' to NA and then in the reshaped data, remove the columns that are all NA

library(data.table)
out <- dcast(setDT(sample_df)[, lapply(.SD, function(x)
replace(x, duplicated(x), NA)), Letter], Letter ~ rowid(Letter),
value.var = c("Number", "Fruit"))
nm1 <- out[, names(which(!colSums(!is.na(.SD))))]
out[, (nm1) := NULL][]
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
#1: a 1 2 Apple Plum Peach
#2: b 3 4 Pear Peach <NA>

If we want to use the tidyverse approach, a similar option can be used. Note that pivot_wider is from the dev version of tidyr (tidyr_0.8.3.9000)

library(tidyverse)
sample_df %>%
group_by(Letter) %>%
mutate_at(vars(-group_cols()), ~ replace(., duplicated(.), NA)) %>%
mutate(rn = row_number()) %>%
pivot_wider(
names_from = rn,
values_from = c("Number", "Fruit")) %>%
select_if(~ any(!is.na(.)))
# A tibble: 2 x 6
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
# <fct> <dbl> <dbl> <fct> <fct> <fct>
#1 a 1 2 Apple Plum Peach
#2 b 3 4 Pear Peach <NA>

Converting to wide format from long with multiple id and value columns

Here's the way to do this using tidyr. The trick is that you need to do a gather first:

library(tidyr)
df_wide <- df %>%
gather(key, value, V1:V5) %>%
unite("key", key, Week, sep = ".") %>%
spread(key, value)

df_wide
#> Route Address V1.Week1 V1.Week2 V2.Week1 V2.Week2 V3.Week1
#> 1 A 12345_SE_Court 0 0 1 0 0
#> 2 A 33333_NE_Street 0 1 1 0 1
#> 3 B 98765_NW_Drive 1 0 1 1 0
#> 4 C 10293_SW_Road 0 1 0 0 0
#> V3.Week2 V4.Week1 V4.Week2 V5.Week1 V5.Week2
#> 1 1 0 1 0 1
#> 2 1 0 0 0 0
#> 3 0 0 1 1 0
#> 4 0 0 0 1 1

Created on 2018-06-27 by the reprex package (v0.2.0).

Long to wide format by collecting multiple ID columns in R which have different values

Using reshape from base R. The ave identifies the three consequent sequences in example data.

reshape(transform(my.df, t2=with(my.df, ave(TIME, ID, FUN=seq))), idvar=c("ID"),
timevar=c("t2"), direction="wide")
# ID TIME.1 X.1 Y.1 TIME.2 X.2 Y.2 TIME.3 X.3 Y.3
# 1 A 1 1 10 4 4 13 7 7 16
# 2 B 2 2 11 5 5 14 8 8 17
# 3 C 3 3 12 6 6 15 9 9 18


Related Topics



Leave a reply



Submit