Reshape Multiple Values At Once

Reshape multiple values at once

In "reshape2", you can use recast (though in my experience, this isn't a widely known function).

library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
# id transactions_expense transactions_income amount_expense amount_income
# 1 20 25 20 95 100
# 2 30 45 50 250 300

You can also use base R's reshape:

reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
# id transactions.income amount.income transactions.expense amount.expense
# 1 20 20 100 25 95
# 3 30 50 300 45 250

Or, you can melt and dcast, like this (here with "data.table"):

library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")),
id ~ variable + type, value.var = "value")
# id transactions_expense transactions_income amount_expense amount_income
# 1: 20 25 20 95 100
# 2: 30 45 50 250 300

In later versions of dcast.data.table from "data.table" (1.9.8) you will be able to do this directly. If I understand correctly, what @Arun is trying to implement would be doing the reshaping without first having to melt the data, which is what happens presently with recast, which is essentially a wrapper for a melt + dcast sequence of operations.


And, for thoroughness, here's the tidyr approach:

library(dplyr)
library(tidyr)
mydf %>%
gather(var, val, transactions:amount) %>%
unite(var2, type, var) %>%
spread(var2, val)
# id expense_amount expense_transactions income_amount income_transactions
# 1 20 95 25 100 20
# 2 30 250 45 300 50

Reshaping wide to long with multiple values columns

reshape does this with the appropriate arguments.

varying lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names is the long format equivalents. Between the two, a mapping is created.

From ?reshape:

Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.

Given these varying and v.names arguments, reshape is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5 for this example data, but that is not safe in general.

Given the values of times and v.names, reshape splits the varying columns on a . character (the default sep argument) to create the columns in the output.

times specifies values that are to be used in the created var column, and v.names are pasted onto these values to get column names in the wide format for mapping to the result.

Finally, idvar is specified to be the sbj column, which identifies individual records in the wide format (thanks @thelatemail).

reshape(dw, direction='long', 
varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'),
timevar='var',
times=c('f1', 'f2'),
v.names=c('avg', 'sd'),
idvar='sbj')

## sbj blabla var avg sd
## A.f1 A bA f1 10 6
## B.f1 B bB f1 12 5
## C.f1 C bC f1 20 7
## D.f1 D bD f1 22 8
## A.f2 A bA f2 50 10
## B.f2 B bB f2 70 11
## C.f2 C bC f2 20 8
## D.f2 D bD f2 22 9

Reshape multiple value columns to wide format

Your best option is to reshape your data to long format, using melt, and then to dcast:

library(reshape2)

meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)

The first few lines of output:

             expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000

How to melt / reshape multiple columns at once?

Consider using pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
Tourism_data_current %>%
pivot_longer(cols = -Tourist, names_to = c("Country", ".value"),
names_sep="_", values_drop_na = TRUE) %>%
rename_with(~ str_c('Tour_', .), Rating:Year)

-output

# A tibble: 6 x 4
Tourist Country Tour_Rating Tour_Year
<int> <chr> <dbl> <dbl>
1 1 France 5 2021
2 1 Spain 4 2020
3 2 France 3 2016
4 2 Spain 5 2017
5 3 France 7 2018
6 4 France 4 2021

How to reshape using R for multiple value columns across one gather column

Using tidyr::pivot_longer which superseded gather plus some additional data wrangling steps you could do:

library(tidyr)
library(dplyr)

data %>%
pivot_longer(-c(hhid, villageid), names_to = c(".value", "member"),
names_pattern = "(.*)_(.*)") %>%
rename(name = "hh") %>%
mutate(member = paste("hh", member, sep = "_"))
#> # A tibble: 8 × 5
#> hhid villageid member name age
#> <int> <int> <chr> <chr> <int>
#> 1 1 10 hh_1 ab 10
#> 2 1 10 hh_2 pq 17
#> 3 2 12 hh_1 cd 11
#> 4 2 12 hh_2 rs 25
#> 5 3 20 hh_1 ef 8
#> 6 3 20 hh_2 tu 13
#> 7 4 22 hh_1 gh 9
#> 8 4 22 hh_2 vw 3

reshape2: dcast when there are multiple values for one cell but keep this values

This can be done with dcast (here from data.table) though you need a row identifier.

library(data.table)
dcast(dt, HLA_Status + rowid(HLA_Status, variable) ~ variable)
# HLA_Status HLA_Status_1 CCL24 SPP1
#1: PC 1 5.698 2.698
#2: PC 2 89.457 9.457
#3: PC 3 78.230 8.230
#4: PP 1 9.645 23.120
#5: PP 2 56.320 36.320
#6: PP 3 7.268 17.268

data

dt <- fread("    HLA_Status    variable      value
PP CCL24 9.645
PP CCL24 56.32
PP CCL24 7.268
PC CCL24 5.698
PC CCL24 89.457
PC CCL24 78.23
PP SPP1 23.12
PP SPP1 36.32
PP SPP1 17.268
PC SPP1 2.698
PC SPP1 9.457
PC SPP1 8.23")

Reshape long to wide where most columns have multiple values

You can do this with the base function reshape after adding in a consecutive count by IDnum. Assuming your data is stored in a data.frame named df:

df2 <- within(df, count <- ave(rep(1,nrow(df)),df$IDnum,FUN=cumsum)) 

Provides a new column of the consecutive count named "time". And now we can reshape to wide format

reshape(df2,direction="wide",idvar="IDnum",timevar="count") 

IDnum zipcode.1 City.1 County.1 State.1 zipcode.2 City.2 County.2 State.2 zipcode.3 City.3 County.3 State.3 zipcode.4 City.4 County.4 State.4
1 10011 36006 Billingsley Autauga AL 36022 Deatsville Autauga AL 36051 Marbury Autauga AL 36051 Prattville Autauga AL

(output truncated, goes all the way to zipcode.12, etc.)

Reshaping data: long to wide; multiple variables, multiple values

You could reshape the data as follows:

Load the data

data <- read.table(text=
"Site Year Day Variable Value Error Unit
1 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
1 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
2 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
2 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
3 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
3 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
4 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
4 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
5 2004 238 General-SolidsTSS 6.430e-01 1e-04 mg/L
5 2004 238 Phosphorus-OrthoP 3.000e-03 1e-04 mg/L
5 2004 238 Phosphorus-TP 4.000e-03 1e-04 mg/L
5 2004 238 Nitrogen-TN 5.000e-02 1e-03 mg/L
5 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
5 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
5 2004 238 General-Alkalinity 6.500e+01 1e-02 mg/L
6 2004 237 General-Alkalinity 5.540e+01 1e-03 mg/L
6 2004 237 General-SolidsTSS 1.292e+01 1e-03 mg/L
6 2004 237 Nitrogen-NO2 2.000e-03 1e-03 mg/L
6 2004 237 Nitrogen-NO3 2.200e-02 1e-03 mg/L
6 2004 237 Nitrogen-TDN 9.000e-02 1e-03 mg/L
6 2004 237 Phosphorus-TDP 4.000e-03 1e-03 mg/L
7 2004 238 General-Alkalinity 4.430e+01 1e-03 mg/L
7 2004 238 General-SolidsTSS 2.340e+00 1e-03 mg/L
7 2004 238 Nitrogen-NO2+NO3 4.800e-02 1e-03 mg/L
7 2004 238 Nitrogen-TDN 2.700e-01 1e-03 mg/L
7 2004 238 Phosphorus-TDP 6.000e-03 1e-03 mg/L
8 2004 238 Nitrogen-NO3 1.000e-03 2e-03 mg/L
8 2004 238 Nitrogen-NO2 2.500e-03 5e-03 mg/L
9 2010 194 Ca 1.450e+02 1e-01 mg/L
9 2010 194 General-Alkalinity 2.150e+02 5e-01 mg/L
9 2010 194 General-Hardness 4.800e+02 4e-01 mg/L
9 2010 194 SO4 2.540e+02 1e+01 mg/L
9 2010 194 Bi 5.000e-07 1e-06 mg/L
9 2010 194 Sn 2.500e-06 5e-06 mg/L
9 2010 194 Nitrogen-NO2 2.500e-03 5e-03 mg/L
9 2010 194 Nitrogen-NO3 2.500e-03 5e-03 mg/L
9 2010 194 Br 1.000e-02 2e-02 mg/L
9 2010 194 U 2.670e-03 5e-07 mg/L
9 2010 194 Ag 3.000e-06 1e-06 mg/L
9 2010 194 Be 1.300e-05 1e-06 mg/L
9 2010 194 Cd 5.400e-05 1e-06 mg/L
9 2010 194 Sb 8.500e-05 1e-06 mg/L
9 2010 194 Tl 1.700e-05 1e-06 mg/L
9 2010 194 Co 1.250e-03 2e-06 mg/L
9 2010 194 Mo 1.510e-03 5e-06 mg/L
9 2010 194 Pb 6.000e-05 5e-06 mg/L
9 2010 194 V 3.860e-04 5e-06 mg/L
9 2010 194 As 7.900e-04 1e-05 mg/L
9 2010 194 Cr 1.600e-04 1e-05 mg/L
9 2010 194 Li 3.230e-02 1e-05 mg/L", stringsAsFactors=F, header=T)

Cast it with data.table

library(data.table)
data$Variable <- gsub("\\+", "plus", data$Variable) #get rid of `+` for the sake of later pattern matching
setDT(data)
data2 <- dcast(data, Site+Year+Day~Variable, value.var = c("Value", "Error", "Unit"))

and reorder the columns

order_cols <- c()
for(i in unique(data$Variable)){
order_cols <- append(order_cols, grep(paste0(i, "$"), names(data2)))
}
setcolorder(data2, c(1:3, order_cols))

In your original dataset (data) the column Variable has 29 unique values. For each level of Variable, 3 columns are generated (value, error and units) which gives 87 columns. 3 columns stay unchanged by the casting, namely (Site, Year and Day), which means all together you got the result data2 with 90 columns. Finally, Each row represents each site.

R- How to reshape Long to Wide with multiple variables/columns

Some variables are can be better to together

df %>%
pivot_wider(id_cols = c(UserID, Full.Name, DOB, EncounterID), names_from = c(QuestionID, QName, labelnospaces), values_from = responses)

UserID Full.Name DOB EncounterID `505_Intro_Were you given any info?` `506_Care_By using this service..`
<int> <chr> <chr> <int> <chr> <chr>
1 1 John Smith 1-1-90 13 yes yes
2 2 Jane Doe 2-2-80 14 no no
`507_Out_How satisfied are you?`
<chr>
1 vsat
2 unsat


Related Topics



Leave a reply



Submit