Reshaping Several Variables Wide with Cast

Reshaping several variables wide with cast

I think the problem is that ff.df is not yet sufficiently molten. Try this:

library(reshape)

# Melt it down
ff.melt <- melt(ff.df, id.var = c("surveyNum", ".id"))

# Note the new "variable" column, which will be combined
# with .id to make each column header
head(ff.melt)

surveyNum .id variable value
1 1 1 pio 2
2 2 1 pio 2
3 3 1 pio 1
4 4 1 pio 2
5 5 1 pio 1
6 6 1 pio 1

# Cast it out - note that .id comes after variable in the formula;
# I think the only effect of that is that you get "pio_1" instead of "1_pio"
ff.cast <- cast(ff.melt, surveyNum ~ variable + .id)

head(ff.cast)

surveyNum pio_1 pio_2 pio_3 caremgmt_1 caremgmt_2 caremgmt_3 prev_1 prev_2 prev_3 price_1 price_2 price_3
1 1 2 2 2 2 1 1 1 2 2 2 6 3
2 2 2 1 2 1 2 2 2 2 1 1 5 5
3 3 1 2 1 1 2 1 2 1 2 2 5 2
4 4 2 1 1 2 2 2 1 2 2 5 4 5
5 5 1 2 2 1 2 1 1 1 1 3 4 4
6 6 1 2 1 2 1 1 2 1 1 4 2 5

Does that do the trick for you?

Essentially, when casting, the variables indicated on the right-hand side of the casting formula dictate the columns that will appear in the cast result. By indicating only .id, I believe that you were asking cast to somehow cram all of those vectors of values into just three columns - 1, 2, and 3. Melting the data all the way down creates the variable column, which lets you specify that the combination of the .id and variable vectors should define the columns of the cast data frame.

(Sorry if I'm being repetitious/pedantic! I'm trying to work it out for myself, too)

Reshape multiple value columns to wide format

Your best option is to reshape your data to long format, using melt, and then to dcast:

library(reshape2)

meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)

The first few lines of output:

             expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1 Adjustment 442.37 0.124025031 2.00 0.0005064625
2 Bank Service Charge 200.00 0.056072985 200.00 0.0506462461
3 Cable 21.33 0.005980184 36.33 0.0091998906
4 Charity 0.00 0.000000000 0.00 0.0000000000

Reshape data from long to wide format - more than one variable

The dcast() statement given by the OP works almost perfect with the recent versions of the data.table package as these allow for multiple measure variables to be used with dcast() and melt():

library(data.table)   # CRAN version 1.10.4
setDT(world) # coerce to data.table
data_wide <- dcast(world, Country ~ Year,
value.var = c("Growth", "Unemployment", "Population"))

data_wide
# Country Growth_2015 Growth_2016 Growth_2017 Unemployment_2015 Unemployment_2016 Unemployment_2017 Population_2015
#1: A 2.0 4.0 4.5 8.3 8.1 8.1 40
#2: B 3.0 3.5 4.4 9.2 9.0 8.4 32
#3: C 2.5 3.7 4.3 9.1 9.0 8.5 30
#4: D 1.5 3.1 4.2 6.1 5.3 5.2 27
# Population_2016 Population_2017
1: 42.0 42.5
2: 32.5 33.0
3: 31.0 30.0
4: 29.0 30.0

This is the same result as the tidyr solution.


However, the OP has requested a specific column order for his ideal solution where the different measure variables of each year are grouped together.

If the proper order of columns is important, there are two ways to achieve this. The first approach is to reorder the columns appropriately using setcolorder():

new_ord <- CJ(world$Year, c("Growth","Unemployment","Population"), 
sorted = FALSE, unique = TRUE)[, paste(V2, V1, sep = "_")]
setcolorder(data_wide, c("Country", new_ord))

data_wide
# Country Growth_2015 Unemployment_2015 Population_2015 Growth_2016 Unemployment_2016 Population_2016 Growth_2017
#1: A 2.0 8.3 40 4.0 8.1 42.0 4.5
#2: B 3.0 9.2 32 3.5 9.0 32.5 4.4
#3: C 2.5 9.1 30 3.7 9.0 31.0 4.3
#4: D 1.5 6.1 27 3.1 5.3 29.0 4.2
# Unemployment_2017 Population_2017
#1: 8.1 42.5
#2: 8.4 33.0
#3: 8.5 30.0
#4: 5.2 30.0

Note the the cross join function CJ() is used to create the cross product of the vectors.


The other approach to achieve the desired column order is to melt and recast:

molten <- melt(world, id.vars = c("Country", "Year"))
dcast(molten, Country ~ Year + variable)
# Country 2015_Growth 2015_Unemployment 2015_Population 2016_Growth 2016_Unemployment 2016_Population 2017_Growth
#1: A 2.0 8.3 40 4.0 8.1 42.0 4.5
#2: B 3.0 9.2 32 3.5 9.0 32.5 4.4
#3: C 2.5 9.1 30 3.7 9.0 31.0 4.3
#4: D 1.5 6.1 27 3.1 5.3 29.0 4.2
# 2017_Unemployment 2017_Population
#1: 8.1 42.5
#2: 8.4 33.0
#3: 8.5 30.0
#4: 5.2 30.0

Reshape long to wide where most columns have multiple values

You can do this with the base function reshape after adding in a consecutive count by IDnum. Assuming your data is stored in a data.frame named df:

df2 <- within(df, count <- ave(rep(1,nrow(df)),df$IDnum,FUN=cumsum)) 

Provides a new column of the consecutive count named "time". And now we can reshape to wide format

reshape(df2,direction="wide",idvar="IDnum",timevar="count") 

IDnum zipcode.1 City.1 County.1 State.1 zipcode.2 City.2 County.2 State.2 zipcode.3 City.3 County.3 State.3 zipcode.4 City.4 County.4 State.4
1 10011 36006 Billingsley Autauga AL 36022 Deatsville Autauga AL 36051 Marbury Autauga AL 36051 Prattville Autauga AL

(output truncated, goes all the way to zipcode.12, etc.)

Reshaping data.frame with a by-group where id variable repeats

You can create a unique row value for each id and use pivot_wider.

have %>%
group_by(id) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = id, values_from = values) %>%
select(-row)

# A tibble: 4 x 3
# `1001` `1002` `1003`
# <int> <int> <int>
#1 1 3 1
#2 3 2 3
#3 2 2 3
#4 2 2 3

Or using data.table

library(data.table)
dcast(setDT(have), rowid(id)~id, value.var = 'values')

data

df <- structure(list(id = c(1001L, 1001L, 1001L, 1001L, 1002L, 1002L, 
1002L, 1002L, 1003L, 1003L, 1003L, 1003L), values = c(2L, 2L,
2L, 3L, 2L, 3L, 2L, 2L, 1L, 3L, 1L, 2L)), class = "data.frame",
row.names = c(NA, -12L))

Reshaping data from long to wide with both sums and counts

From data.table v1.9.6, it is possible to cast multiple value.var columns and also cast by providing multiple fun.aggregate functions. See below:

library(data.table)

df <- data.table(df)
dcast(df, id ~ type, fun = list(length, sum), value.var = c("val"))
id val_length_A val_length_B val_length_C val_sum_A val_sum_B val_sum_C
1: 1 2 1 0 1 2 0
2: 2 1 1 1 0 0 4


Related Topics



Leave a reply



Submit