Reshape multiple value columns to wide format
Your best option is to reshape your data to long format, using melt
, and then to dcast
:
library(reshape2)
meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)
The first few lines of output:
expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000
Reshaping from long to wide with multiple columns
pivot_wider
may be easier
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(time = str_c('t', time)) %>%
pivot_wider(names_from = time, values_from = c(age, height))
-output
# A tibble: 2 × 5
PIN age_t1 age_t2 height_t1 height_t2
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 84 86 58 58
2 1002 22 24 60 62
With reshape
from base R
, it may need a sequence column
out <- reshape(transform(df, rn = ave(seq_along(PIN), PIN,
FUN = seq_along)), idvar = "PIN",
direction = "wide", timevar = "time", sep = "_")
out[!startsWith(names(out), 'rn_')]
PIN age_1 height_1 age_2 height_2
1 1001 84 58 86 58
3 1002 22 60 24 62
Reshape dataframe without “timevar” and multiple value columns from long to wide format
Third option using dcast
from data.table
. We create the missing 'time variable' with rowid(key)
:
library(data.table)
# convert data to a data.table object
setDT(data)
# reshape
dcast(data, key ~ rowid(key), value.var = c("acitity", "intervall"))
Result
# key acitity_1 acitity_2 acitity_3 acitity_4 intervall_1 intervall_2 intervall_3 intervall_4
#1: A watering remove weeds cut remove leaf 5 7 6 1
#2: B watering remove weeds cut fertilize 8 4 2 3
Convert data from long format to wide format with multiple measure columns
In order to handle multiple variables like you want, you need to melt
the data you have before casting it.
library("reshape2")
dcast(melt(my.df, id.vars=c("ID", "TIME")), ID~variable+TIME)
which gives
ID X_1 X_2 X_3 X_4 X_5 Y_1 Y_2 Y_3 Y_4 Y_5
1 A 1 4 7 10 13 16 19 22 25 28
2 B 2 5 8 11 14 17 20 23 26 29
3 C 3 6 9 12 15 18 21 24 27 30
EDIT based on comment:
The data frame
num.id = 10
num.time=10
my.df <- data.frame(ID=rep(LETTERS[1:num.id], num.time),
TIME=rep(1:num.time, each=num.id),
X=1:(num.id*num.time),
Y=(num.id*num.time)+1:(2*length(1:(num.id*num.time))))
gives a different result (all entries are 2) because the ID
/TIME
combination does not indicate a unique row. In fact, there are two rows with each ID
/TIME
combinations. reshape2
assumes a single value for each possible combination of the variables and will apply a summary function to create a single variable is there are multiple entries. That is why there is the warning
Aggregation function missing: defaulting to length
You can get something that works if you add another variable which breaks that redundancy.
my.df$cycle <- rep(1:2, each=num.id*num.time)
dcast(melt(my.df, id.vars=c("cycle", "ID", "TIME")), cycle+ID~variable+TIME)
This works because cycle
/ID
/time
now uniquely defines a row in my.df
.
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer()
of the tidyr package since version 1.0.0.
This is superior to the previous tidyr strategy of gather()
than spread()
(see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).
library("tidyr")
library("magrittr")
a <- structure(list(ID = 1L,
DateRange1Start = structure(7305, class = "Date"),
DateRange1End = structure(7307, class = "Date"),
Value1 = 4.4,
DateRange2Start = structure(7793, class = "Date"),
DateRange2End = structure(7856, class = "Date"),
Value2 = 6.2,
DateRange3Start = structure(9255, class = "Date"),
DateRange3End = structure(9653, class = "Date"),
Value3 = 3.3),
row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
pivot_longer()
(counterpart: pivot_wider()
) works similar to gather()
.
However, it offers additional functionality such as multiple value columns.
With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to
.
For multiple value columns, names_to
may receive multiple new names.
This is easiest if all column names follow a specific pattern like Start_1
, End_1
, Start_2
, etc.
Therefore, I renamed the columns in the first step.
(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
#> [1] "ID" "DateRangeStart_1" "DateRangeEnd_1"
#> [4] "Value_1" "DateRangeStart_2" "DateRangeEnd_2"
#> [7] "Value_2" "DateRangeStart_3" "DateRangeEnd_3"
#> [10] "Value_3"
pivot_longer(a,
cols = -ID,
names_to = c(".value", "group"),
# names_prefix = "DateRange",
names_sep = "_")
#> # A tibble: 3 x 5
#> ID group DateRangeEnd DateRangeStart Value
#> <int> <chr> <date> <date> <dbl>
#> 1 1 1 1990-01-03 1990-01-01 4.4
#> 2 1 2 1991-07-06 1991-05-04 6.2
#> 3 1 3 1996-06-06 1995-05-05 3.3
Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):
spec <- a %>%
build_longer_spec(cols = -ID) %>%
dplyr::transmute(.name = .name,
group = readr::parse_number(name),
.value = stringr::str_extract(name, "Start|End|Value"))
pivot_longer(a, spec = spec)
Created on 2019-03-26 by the reprex package (v0.2.1)
See also: https://tidyr.tidyverse.org/articles/pivot.html
R: Reshaping Multiple Columns from Long to Wide
An option would be to replace the duplicated
elements by 'Letter' to NA
and then in the reshaped data, remove the columns that are all NA
library(data.table)
out <- dcast(setDT(sample_df)[, lapply(.SD, function(x)
replace(x, duplicated(x), NA)), Letter], Letter ~ rowid(Letter),
value.var = c("Number", "Fruit"))
nm1 <- out[, names(which(!colSums(!is.na(.SD))))]
out[, (nm1) := NULL][]
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
#1: a 1 2 Apple Plum Peach
#2: b 3 4 Pear Peach <NA>
If we want to use the tidyverse
approach, a similar option can be used. Note that pivot_wider
is from the dev version of tidyr
(tidyr_0.8.3.9000
)
library(tidyverse)
sample_df %>%
group_by(Letter) %>%
mutate_at(vars(-group_cols()), ~ replace(., duplicated(.), NA)) %>%
mutate(rn = row_number()) %>%
pivot_wider(
names_from = rn,
values_from = c("Number", "Fruit")) %>%
select_if(~ any(!is.na(.)))
# A tibble: 2 x 6
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
# <fct> <dbl> <dbl> <fct> <fct> <fct>
#1 a 1 2 Apple Plum Peach
#2 b 3 4 Pear Peach <NA>
Converting to wide format from long with multiple id and value columns
Here's the way to do this using tidyr
. The trick is that you need to do a gather
first:
library(tidyr)
df_wide <- df %>%
gather(key, value, V1:V5) %>%
unite("key", key, Week, sep = ".") %>%
spread(key, value)
df_wide
#> Route Address V1.Week1 V1.Week2 V2.Week1 V2.Week2 V3.Week1
#> 1 A 12345_SE_Court 0 0 1 0 0
#> 2 A 33333_NE_Street 0 1 1 0 1
#> 3 B 98765_NW_Drive 1 0 1 1 0
#> 4 C 10293_SW_Road 0 1 0 0 0
#> V3.Week2 V4.Week1 V4.Week2 V5.Week1 V5.Week2
#> 1 1 0 1 0 1
#> 2 1 0 0 0 0
#> 3 0 0 1 1 0
#> 4 0 0 0 1 1
Created on 2018-06-27 by the reprex package (v0.2.0).
Long to wide format by collecting multiple ID columns in R which have different values
Using reshape
from base R. The ave
identifies the three consequent seq
uences in example data.
reshape(transform(my.df, t2=with(my.df, ave(TIME, ID, FUN=seq))), idvar=c("ID"),
timevar=c("t2"), direction="wide")
# ID TIME.1 X.1 Y.1 TIME.2 X.2 Y.2 TIME.3 X.3 Y.3
# 1 A 1 1 10 4 4 13 7 7 16
# 2 B 2 2 11 5 5 14 8 8 17
# 3 C 3 3 12 6 6 15 9 9 18
Related Topics
How to Assign Colors to Categorical Variables in Ggplot2 That Have Stable Mapping
How to Add Texture to Fill Colors in Ggplot2
Replace Specific Characters Within Strings
Why Is It Not Advisable to Use Attach() in R, and What Should I Use Instead
Geographic/Geospatial Distance Between 2 Lists of Lat/Lon Points (Coordinates)
Installing Older Version of R Package
Is There a Dplyr Equivalent to Data.Table::Rleid
Cleaning Up Factor Levels (Collapsing Multiple Levels/Labels)
How to Combine Multiple Conditions to Subset a Data-Frame Using "Or"
Reshape Multiple Value Columns to Wide Format
How to Use Greek Symbols in Ggplot2
How to Read Multiple (Excel) Files into R
How to Specify the Size of a Graph in Ggplot2 Independent of Axis Labels