Aggregate by Multiple Columns and Reshape from Long to Wide

Aggregate by multiple columns and reshape from long to wide

Use dcast or even acast from reshape2() package

dcast(dat,Id~Description,mean)
Id Cat Dog
1 10 14.25 14.25
2 11 15.25 15.25

Base R might be abit longer:

 reshape(aggregate(.~Id+Description,dat,mean),direction = "wide",v.names  = "Value",idvar = "Id",timevar = "Description")
Id Value.Cat Value.Dog
1 10 14.25 14.25
2 11 15.25 15.25

combine multiple long to wide transformations into one

You can use pivot_wider directly to cast multiple columns to multiple variables and supplying a function in values_fn.

tidyr::pivot_wider(dat, names_from = c(Month, City), 
values_from = c(var_1, var_2), values_fn = sum)

and using data.tables dcast

library(data.table)
dcast(setDT(dat), G1+G2+G3~Month+City, value.var = c('var_1', 'var_2'),
fun.aggregate = sum)

Reshape multiple value columns to wide format

Your best option is to reshape your data to long format, using melt, and then to dcast:

library(reshape2)

meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)

The first few lines of output:

             expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000

Convert data from long format to wide format with multiple measure columns

In order to handle multiple variables like you want, you need to melt the data you have before casting it.

library("reshape2")

dcast(melt(my.df, id.vars=c("ID", "TIME")), ID~variable+TIME)

which gives

  ID X_1 X_2 X_3 X_4 X_5 Y_1 Y_2 Y_3 Y_4 Y_5
1 A 1 4 7 10 13 16 19 22 25 28
2 B 2 5 8 11 14 17 20 23 26 29
3 C 3 6 9 12 15 18 21 24 27 30

EDIT based on comment:

The data frame

num.id = 10 
num.time=10
my.df <- data.frame(ID=rep(LETTERS[1:num.id], num.time),
TIME=rep(1:num.time, each=num.id),
X=1:(num.id*num.time),
Y=(num.id*num.time)+1:(2*length(1:(num.id*num.time))))

gives a different result (all entries are 2) because the ID/TIME combination does not indicate a unique row. In fact, there are two rows with each ID/TIME combinations. reshape2 assumes a single value for each possible combination of the variables and will apply a summary function to create a single variable is there are multiple entries. That is why there is the warning

Aggregation function missing: defaulting to length

You can get something that works if you add another variable which breaks that redundancy.

my.df$cycle <- rep(1:2, each=num.id*num.time)
dcast(melt(my.df, id.vars=c("cycle", "ID", "TIME")), cycle+ID~variable+TIME)

This works because cycle/ID/time now uniquely defines a row in my.df.

Long to wide on multiple columns by data.table

We can first melt by specifying the patterns in measure to 'long' format and then do the dcast with the fun.aggregate as sum

dcast(melt(dt, measure = patterns("^Value", "^Problem"), 
value.name = c("Value", "Problem"))[Problem != ""
][, Problem := factor(Problem, levels = c("X", "Y", "Z", "W", "V"))],
Type ~Problem, value.var = "Value", sum, na.rm = TRUE)
# Type X Y Z W V
#1: A 1100 1000 1100 0 0
#2: B 0 700 100 200 200
#3: C 1000 0 500 100 1400

melt from data.table can take multiple patterns in the measure argument. So, when we say "^Value" it matches all the columns that have names that start (^) with "Value" and similarly for "Problem" and create two 'value' columns. In the above, we are naming those columns as 'Value' and 'Problem' with value.name argument. As the dataset having some blanks, the long format also have the blank elements which we remove with Problem != "". The next step is only important if we need to have the columns in a specific order. So, we change the 'Problem' to factor class and specified the levels in that order. Now, the melt part is completed. The long format is now changed to 'wide' with dcast by specifying the formula, value.var column and the fun.aggregate (here it is sum)

How can I reshape a wide dataset into a long dataset using multiple columns?

Let's try melt first, then pivot:

tmp = df.melt(['ID','Treatment'], var_name='Animal')
tmp['Animal'] = tmp['Animal'].str.extract('^([^_]+)')
tmp['ID'] = tmp.groupby(['Animal','Treatment']).cumcount()

out = (tmp.pivot_table(index=['Animal','ID'], columns=['Treatment'],
values='value')
.add_prefix('Animal_Weight_').reset_index()
)

Output:

Treatment Animal  ID  Animal_Weight_A  Animal_Weight_B
0 Cat 0 10 50
1 Cat 1 20 60
2 Cat 2 30 70
3 Cat 3 40 80
4 Dog 0 20 60
5 Dog 1 30 70
6 Dog 2 40 80
7 Dog 3 50 90
8 Horse 0 100 500
9 Horse 1 200 600
10 Horse 2 300 700
11 Horse 3 400 800
12 Pig 0 1000 650
13 Pig 1 550 450
14 Pig 2 750 500
15 Pig 3 800 600

Reshaping data from long to wide with both sums and counts

From data.table v1.9.6, it is possible to cast multiple value.var columns and also cast by providing multiple fun.aggregate functions. See below:

library(data.table)

df <- data.table(df)
dcast(df, id ~ type, fun = list(length, sum), value.var = c("val"))
id val_length_A val_length_B val_length_C val_sum_A val_sum_B val_sum_C
1: 1 2 1 0 1 2 0
2: 2 1 1 1 0 0 4

How to reshape data from long to wide format

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

Reshape from long to wide according to the number of occurrence of one variable

You can try something like this:

df1 %>% 
group_by(person, visitID) %>%
summarise(across(matches("v[0-9]+"), list)) %>%
group_by(person) %>%
mutate(visit = seq_len(n()) %>% str_c("visit.", .)) %>%
ungroup() %>%
pivot_wider(
id_cols = person,
names_from = visit,
values_from = c("visitID", matches("v[0-9]+"))
)

replace list with ~str_c(.x, collapse = ",") if you want to have it in character style.

Reshape long to wide where most columns have multiple values

You can do this with the base function reshape after adding in a consecutive count by IDnum. Assuming your data is stored in a data.frame named df:

df2 <- within(df, count <- ave(rep(1,nrow(df)),df$IDnum,FUN=cumsum)) 

Provides a new column of the consecutive count named "time". And now we can reshape to wide format

reshape(df2,direction="wide",idvar="IDnum",timevar="count") 

IDnum zipcode.1 City.1 County.1 State.1 zipcode.2 City.2 County.2 State.2 zipcode.3 City.3 County.3 State.3 zipcode.4 City.4 County.4 State.4
1 10011 36006 Billingsley Autauga AL 36022 Deatsville Autauga AL 36051 Marbury Autauga AL 36051 Prattville Autauga AL

(output truncated, goes all the way to zipcode.12, etc.)



Related Topics



Leave a reply



Submit