Aggregate by multiple columns and reshape from long to wide
Use dcast
or even acast
from reshape2()
package
dcast(dat,Id~Description,mean)
Id Cat Dog
1 10 14.25 14.25
2 11 15.25 15.25
Base R
might be abit longer:
reshape(aggregate(.~Id+Description,dat,mean),direction = "wide",v.names = "Value",idvar = "Id",timevar = "Description")
Id Value.Cat Value.Dog
1 10 14.25 14.25
2 11 15.25 15.25
combine multiple long to wide transformations into one
You can use pivot_wider
directly to cast multiple columns to multiple variables and supplying a function in values_fn
.
tidyr::pivot_wider(dat, names_from = c(Month, City),
values_from = c(var_1, var_2), values_fn = sum)
and using data.table
s dcast
library(data.table)
dcast(setDT(dat), G1+G2+G3~Month+City, value.var = c('var_1', 'var_2'),
fun.aggregate = sum)
Reshape multiple value columns to wide format
Your best option is to reshape your data to long format, using melt
, and then to dcast
:
library(reshape2)
meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)
The first few lines of output:
expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000
Convert data from long format to wide format with multiple measure columns
In order to handle multiple variables like you want, you need to melt
the data you have before casting it.
library("reshape2")
dcast(melt(my.df, id.vars=c("ID", "TIME")), ID~variable+TIME)
which gives
ID X_1 X_2 X_3 X_4 X_5 Y_1 Y_2 Y_3 Y_4 Y_5
1 A 1 4 7 10 13 16 19 22 25 28
2 B 2 5 8 11 14 17 20 23 26 29
3 C 3 6 9 12 15 18 21 24 27 30
EDIT based on comment:
The data frame
num.id = 10
num.time=10
my.df <- data.frame(ID=rep(LETTERS[1:num.id], num.time),
TIME=rep(1:num.time, each=num.id),
X=1:(num.id*num.time),
Y=(num.id*num.time)+1:(2*length(1:(num.id*num.time))))
gives a different result (all entries are 2) because the ID
/TIME
combination does not indicate a unique row. In fact, there are two rows with each ID
/TIME
combinations. reshape2
assumes a single value for each possible combination of the variables and will apply a summary function to create a single variable is there are multiple entries. That is why there is the warning
Aggregation function missing: defaulting to length
You can get something that works if you add another variable which breaks that redundancy.
my.df$cycle <- rep(1:2, each=num.id*num.time)
dcast(melt(my.df, id.vars=c("cycle", "ID", "TIME")), cycle+ID~variable+TIME)
This works because cycle
/ID
/time
now uniquely defines a row in my.df
.
Long to wide on multiple columns by data.table
We can first melt
by specifying the patterns
in measure
to 'long' format and then do the dcast
with the fun.aggregate
as sum
dcast(melt(dt, measure = patterns("^Value", "^Problem"),
value.name = c("Value", "Problem"))[Problem != ""
][, Problem := factor(Problem, levels = c("X", "Y", "Z", "W", "V"))],
Type ~Problem, value.var = "Value", sum, na.rm = TRUE)
# Type X Y Z W V
#1: A 1100 1000 1100 0 0
#2: B 0 700 100 200 200
#3: C 1000 0 500 100 1400
melt
from data.table
can take multiple patterns
in the measure
argument. So, when we say "^Value"
it matches all the columns that have names that start (^
) with "Value" and similarly for "Problem" and create two 'value' columns. In the above, we are naming those columns as 'Value' and 'Problem' with value.name
argument. As the dataset having some blanks, the long format also have the blank elements which we remove with Problem != ""
. The next step is only important if we need to have the columns in a specific order. So, we change the 'Problem' to factor
class and specified the levels
in that order. Now, the melt
part is completed. The long format is now changed to 'wide' with dcast
by specifying the formula, value.var
column and the fun.aggregate
(here it is sum
)
How can I reshape a wide dataset into a long dataset using multiple columns?
Let's try melt first, then pivot:
tmp = df.melt(['ID','Treatment'], var_name='Animal')
tmp['Animal'] = tmp['Animal'].str.extract('^([^_]+)')
tmp['ID'] = tmp.groupby(['Animal','Treatment']).cumcount()
out = (tmp.pivot_table(index=['Animal','ID'], columns=['Treatment'],
values='value')
.add_prefix('Animal_Weight_').reset_index()
)
Output:
Treatment Animal ID Animal_Weight_A Animal_Weight_B
0 Cat 0 10 50
1 Cat 1 20 60
2 Cat 2 30 70
3 Cat 3 40 80
4 Dog 0 20 60
5 Dog 1 30 70
6 Dog 2 40 80
7 Dog 3 50 90
8 Horse 0 100 500
9 Horse 1 200 600
10 Horse 2 300 700
11 Horse 3 400 800
12 Pig 0 1000 650
13 Pig 1 550 450
14 Pig 2 750 500
15 Pig 3 800 600
Reshaping data from long to wide with both sums and counts
From data.table v1.9.6, it is possible to cast multiple value.var columns and also cast by providing multiple fun.aggregate functions. See below:
library(data.table)
df <- data.table(df)
dcast(df, id ~ type, fun = list(length, sum), value.var = c("val"))
id val_length_A val_length_B val_length_C val_sum_A val_sum_B val_sum_C
1: 1 2 1 0 1 2 0
2: 2 1 1 1 0 0 4
How to reshape data from long to wide format
Using reshape
function:
reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")
Reshape from long to wide according to the number of occurrence of one variable
You can try something like this:
df1 %>%
group_by(person, visitID) %>%
summarise(across(matches("v[0-9]+"), list)) %>%
group_by(person) %>%
mutate(visit = seq_len(n()) %>% str_c("visit.", .)) %>%
ungroup() %>%
pivot_wider(
id_cols = person,
names_from = visit,
values_from = c("visitID", matches("v[0-9]+"))
)
replace list
with ~str_c(.x, collapse = ",")
if you want to have it in character style.
Reshape long to wide where most columns have multiple values
You can do this with the base function reshape
after adding in a consecutive count by IDnum
. Assuming your data is stored in a data.frame
named df
:
df2 <- within(df, count <- ave(rep(1,nrow(df)),df$IDnum,FUN=cumsum))
Provides a new column of the consecutive count named "time". And now we can reshape
to wide format
reshape(df2,direction="wide",idvar="IDnum",timevar="count")
IDnum zipcode.1 City.1 County.1 State.1 zipcode.2 City.2 County.2 State.2 zipcode.3 City.3 County.3 State.3 zipcode.4 City.4 County.4 State.4
1 10011 36006 Billingsley Autauga AL 36022 Deatsville Autauga AL 36051 Marbury Autauga AL 36051 Prattville Autauga AL
(output truncated, goes all the way to zipcode.12, etc.)
Related Topics
Vary the Color Gradient on a Scatter Plot Created with Ggplot2
Download Plotly Using Downloadhandler
Create a Histogram for Weighted Values
How to Underline Text in a Plot Title or Label? (Ggplot2)
How to Change the Default Directory in Rstudio (Or R)
Error in Na.Fail.Default: Missing Values in Object - But No Missing Values
Is There a Limit for the Possible Number of Nested Ifelse Statements
How to Print a Variable Inside a for Loop to the Console in Real Time as the Loop Is Running
How to 'Unlist' a Column in a Data.Table
Rscript Could Not Find Function
Error in If/While (Condition):Argument Is Not Interpretable as Logical
Let Ggplot2 Histogram Show Classwise Percentages on Y Axis
R: Adding a "Tool Tip" to Interactive Plot (Plotly)
How to Sum Data.Frame Column Values
Find the Source File Containing R Function Definition
"Object Not Found" Error Within a User Defined Function, Eval() Function
How to Plot Bars and One Line on Two Y-Axes in the Same Chart, with R-Ggplot