Reshape multiple values at once
In "reshape2", you can use recast
(though in my experience, this isn't a widely known function).
library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
# id transactions_expense transactions_income amount_expense amount_income
# 1 20 25 20 95 100
# 2 30 45 50 250 300
You can also use base R's reshape
:
reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
# id transactions.income amount.income transactions.expense amount.expense
# 1 20 20 100 25 95
# 3 30 50 300 45 250
Or, you can melt
and dcast
, like this (here with "data.table"):
library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")),
id ~ variable + type, value.var = "value")
# id transactions_expense transactions_income amount_expense amount_income
# 1: 20 25 20 95 100
# 2: 30 45 50 250 300
In later versions of dcast.data.table
from "data.table" (1.9.8) you will be able to do this directly. If I understand correctly, what @Arun is trying to implement would be doing the reshaping without first having to melt
the data, which is what happens presently with recast
, which is essentially a wrapper for a melt
+ dcast
sequence of operations.
And, for thoroughness, here's the tidyr
approach:
library(dplyr)
library(tidyr)
mydf %>%
gather(var, val, transactions:amount) %>%
unite(var2, type, var) %>%
spread(var2, val)
# id expense_amount expense_transactions income_amount income_transactions
# 1 20 95 25 100 20
# 2 30 250 45 300 50
Reshaping wide to long with multiple values columns
reshape
does this with the appropriate arguments.
varying
lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names
is the long format equivalents. Between the two, a mapping is created.
From ?reshape
:
Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.
Given these varying
and v.names
arguments, reshape
is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5
for this example data, but that is not safe in general.
Given the values of times
and v.names
, reshape
splits the varying
columns on a .
character (the default sep
argument) to create the columns in the output.
times
specifies values that are to be used in the created var
column, and v.names
are pasted onto these values to get column names in the wide format for mapping to the result.
Finally, idvar
is specified to be the sbj
column, which identifies individual records in the wide format (thanks @thelatemail).
reshape(dw, direction='long',
varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'),
timevar='var',
times=c('f1', 'f2'),
v.names=c('avg', 'sd'),
idvar='sbj')
## sbj blabla var avg sd
## A.f1 A bA f1 10 6
## B.f1 B bB f1 12 5
## C.f1 C bC f1 20 7
## D.f1 D bD f1 22 8
## A.f2 A bA f2 50 10
## B.f2 B bB f2 70 11
## C.f2 C bC f2 20 8
## D.f2 D bD f2 22 9
Reshape multiple value columns to wide format
Your best option is to reshape your data to long format, using melt
, and then to dcast
:
library(reshape2)
meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)
The first few lines of output:
expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000
How to melt / reshape multiple columns at once?
Consider using pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
Tourism_data_current %>%
pivot_longer(cols = -Tourist, names_to = c("Country", ".value"),
names_sep="_", values_drop_na = TRUE) %>%
rename_with(~ str_c('Tour_', .), Rating:Year)
-output
# A tibble: 6 x 4
Tourist Country Tour_Rating Tour_Year
<int> <chr> <dbl> <dbl>
1 1 France 5 2021
2 1 Spain 4 2020
3 2 France 3 2016
4 2 Spain 5 2017
5 3 France 7 2018
6 4 France 4 2021
How to reshape using R for multiple value columns across one gather column
Using tidyr::pivot_longer
which superseded gather
plus some additional data wrangling steps you could do:
library(tidyr)
library(dplyr)
data %>%
pivot_longer(-c(hhid, villageid), names_to = c(".value", "member"),
names_pattern = "(.*)_(.*)") %>%
rename(name = "hh") %>%
mutate(member = paste("hh", member, sep = "_"))
#> # A tibble: 8 × 5
#> hhid villageid member name age
#> <int> <int> <chr> <chr> <int>
#> 1 1 10 hh_1 ab 10
#> 2 1 10 hh_2 pq 17
#> 3 2 12 hh_1 cd 11
#> 4 2 12 hh_2 rs 25
#> 5 3 20 hh_1 ef 8
#> 6 3 20 hh_2 tu 13
#> 7 4 22 hh_1 gh 9
#> 8 4 22 hh_2 vw 3
reshape2: dcast when there are multiple values for one cell but keep this values
This can be done with dcast
(here from data.table
) though you need a row identifier.
library(data.table)
dcast(dt, HLA_Status + rowid(HLA_Status, variable) ~ variable)
# HLA_Status HLA_Status_1 CCL24 SPP1
#1: PC 1 5.698 2.698
#2: PC 2 89.457 9.457
#3: PC 3 78.230 8.230
#4: PP 1 9.645 23.120
#5: PP 2 56.320 36.320
#6: PP 3 7.268 17.268
data
dt <- fread(" HLA_Status variable value
PP CCL24 9.645
PP CCL24 56.32
PP CCL24 7.268
PC CCL24 5.698
PC CCL24 89.457
PC CCL24 78.23
PP SPP1 23.12
PP SPP1 36.32
PP SPP1 17.268
PC SPP1 2.698
PC SPP1 9.457
PC SPP1 8.23")
Reshape long to wide where most columns have multiple values
You can do this with the base function reshape
after adding in a consecutive count by IDnum
. Assuming your data is stored in a data.frame
named df
:
df2 <- within(df, count <- ave(rep(1,nrow(df)),df$IDnum,FUN=cumsum))
Provides a new column of the consecutive count named "time". And now we can reshape
to wide format
reshape(df2,direction="wide",idvar="IDnum",timevar="count")
IDnum zipcode.1 City.1 County.1 State.1 zipcode.2 City.2 County.2 State.2 zipcode.3 City.3 County.3 State.3 zipcode.4 City.4 County.4 State.4
1 10011 36006 Billingsley Autauga AL 36022 Deatsville Autauga AL 36051 Marbury Autauga AL 36051 Prattville Autauga AL
(output truncated, goes all the way to zipcode.12, etc.)
Reshaping data: long to wide; multiple variables, multiple values
You could reshape the data as follows:
Load the data
data <- read.table(text=
"Site Year Day Variable Value Error Unit
1 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
1 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
2 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
2 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
3 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
3 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
4 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
4 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
5 2004 238 General-SolidsTSS 6.430e-01 1e-04 mg/L
5 2004 238 Phosphorus-OrthoP 3.000e-03 1e-04 mg/L
5 2004 238 Phosphorus-TP 4.000e-03 1e-04 mg/L
5 2004 238 Nitrogen-TN 5.000e-02 1e-03 mg/L
5 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
5 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
5 2004 238 General-Alkalinity 6.500e+01 1e-02 mg/L
6 2004 237 General-Alkalinity 5.540e+01 1e-03 mg/L
6 2004 237 General-SolidsTSS 1.292e+01 1e-03 mg/L
6 2004 237 Nitrogen-NO2 2.000e-03 1e-03 mg/L
6 2004 237 Nitrogen-NO3 2.200e-02 1e-03 mg/L
6 2004 237 Nitrogen-TDN 9.000e-02 1e-03 mg/L
6 2004 237 Phosphorus-TDP 4.000e-03 1e-03 mg/L
7 2004 238 General-Alkalinity 4.430e+01 1e-03 mg/L
7 2004 238 General-SolidsTSS 2.340e+00 1e-03 mg/L
7 2004 238 Nitrogen-NO2+NO3 4.800e-02 1e-03 mg/L
7 2004 238 Nitrogen-TDN 2.700e-01 1e-03 mg/L
7 2004 238 Phosphorus-TDP 6.000e-03 1e-03 mg/L
8 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
8 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
9 2010 194 Ca 1.450e+02 1e-01 mg/L
9 2010 194 General-Alkalinity 2.150e+02 5e-01 mg/L
9 2010 194 General-Hardness 4.800e+02 4e-01 mg/L
9 2010 194 SO4 2.540e+02 1e+01 mg/L
9 2010 194 Bi 5.000e-07 1e-06 mg/L
9 2010 194 Sn 2.500e-06 5e-06 mg/L
9 2010 194 Nitrogen-NO2 2.500e-03 5e-03 mg/L
9 2010 194 Nitrogen-NO3 2.500e-03 5e-03 mg/L
9 2010 194 Br 1.000e-02 2e-02 mg/L
9 2010 194 U 2.670e-03 5e-07 mg/L
9 2010 194 Ag 3.000e-06 1e-06 mg/L
9 2010 194 Be 1.300e-05 1e-06 mg/L
9 2010 194 Cd 5.400e-05 1e-06 mg/L
9 2010 194 Sb 8.500e-05 1e-06 mg/L
9 2010 194 Tl 1.700e-05 1e-06 mg/L
9 2010 194 Co 1.250e-03 2e-06 mg/L
9 2010 194 Mo 1.510e-03 5e-06 mg/L
9 2010 194 Pb 6.000e-05 5e-06 mg/L
9 2010 194 V 3.860e-04 5e-06 mg/L
9 2010 194 As 7.900e-04 1e-05 mg/L
9 2010 194 Cr 1.600e-04 1e-05 mg/L
9 2010 194 Li 3.230e-02 1e-05 mg/L", stringsAsFactors=F, header=T)
Cast it with data.table
library(data.table)
data$Variable <- gsub("\\+", "plus", data$Variable) #get rid of `+` for the sake of later pattern matching
setDT(data)
data2 <- dcast(data, Site+Year+Day~Variable, value.var = c("Value", "Error", "Unit"))
and reorder the columns
order_cols <- c()
for(i in unique(data$Variable)){
order_cols <- append(order_cols, grep(paste0(i, "$"), names(data2)))
}
setcolorder(data2, c(1:3, order_cols))
In your original dataset (data
) the column Variable
has 29 unique values. For each level of Variable
, 3 columns are generated (value, error and units) which gives 87 columns. 3 columns stay unchanged by the casting, namely (Site
, Year
and Day
), which means all together you got the result data2
with 90 columns. Finally, Each row represents each site.
R- How to reshape Long to Wide with multiple variables/columns
Some variables are can be better to together
df %>%
pivot_wider(id_cols = c(UserID, Full.Name, DOB, EncounterID), names_from = c(QuestionID, QName, labelnospaces), values_from = responses)
UserID Full.Name DOB EncounterID `505_Intro_Were you given any info?` `506_Care_By using this service..`
<int> <chr> <chr> <int> <chr> <chr>
1 1 John Smith 1-1-90 13 yes yes
2 2 Jane Doe 2-2-80 14 no no
`507_Out_How satisfied are you?`
<chr>
1 vsat
2 unsat
Related Topics
Starting Shiny App After Password Input
How to Replace Na With Mean by Group/Subset
Subscript Letters in Ggplot Axis Label
Count the Number of All Words in a String
Plot Multiple Lines in One Graph
Why Do I Get "Warning Longer Object Length Is Not a Multiple of Shorter Object Length"
Generate a Sequence of the Last Day of the Month Over Two Years
Read a Text File in R Line by Line
Sample from Vector of Varying Length (Including 1)
How to Insert Elements into a Vector
How to Remove All Whitespace from a String
Subset Dataframe by Multiple Logical Conditions of Rows to Remove
Replace Multiple Letters With Accents With Gsub
Usage of '...' (Three-Dots or Dot-Dot-Dot) in Functions
How to Display Only Integer Values on an Axis Using Ggplot2
Converting Decimal to Binary in R
Efficiently Generate a Random Sample of Times and Dates Between Two Dates
Finding Percentage in a Sub-Group Using Group_By and Summarise