Wrapping Base R Reshape for Ease-Of-Use

Wrapping base R reshape for ease-of-use

I would also like to see an option to order the output, since that's one of the things I don't like about reshape in base R. As an example, let's use the Stata Learning Module: Reshaping data wide to long, which you are already familiar with. The example I'm looking at is the "kids height and weight at age 1 and age 2" example.

Here's what I normally do with reshape():

# library(foreign)
kidshtwt = read.dta("http://www.ats.ucla.edu/stat/stata/modules/kidshtwt.dta")
kidshtwt.l = reshape(kidshtwt, direction="long", idvar=1:2,
varying=3:6, sep="", timevar="age")
# The reshaped data is correct, just not in the order I want it
# so I always have to do another step like this
kidshtwt.l = kidshtwt.l[order(kidshtwt.l$famid, kidshtwt.l$birth),]

Since this is an annoying step that I always have to go through when reshaping the data, I think it would be useful to add that into your function.

I also suggest at least having an option for doing the same thing with the final column order for reshaping from long to wide.

Example function for column ordering

I'm not sure of the best way to integrate this into your function, but I put this together to sort a data frame based on basic patterns for the variable names.

col.name.sort = function(data, patterns) {
a = names(data)
b = length(patterns)

subs = vector("list", b)

for (i in 1:b) {
subs[[i]] = sort(grep(patterns[i], a, value=T))
}
x = unlist(subs)
data[ , x ]
}

It can be used in the following manner. Imagine we had saved the output of your reshapeasy long to wide example as a data frame named a, and we wanted it ordered by "surveyNum", "caremgmt" (1-3), "prev" (1-3), "pio" (1-3), and "price" (1-3), we could use:

col.name.sort(a, c("sur", "car", "pre", "pio", "pri"))

How to reproduce the result of pivot_longer with reshape in base R?

1) post process The reshape command in the question produces the same output except for row names, row order and an extra id column so just fix those up.

At the end we run pivot_longer and converting its output to data.frame show that that is identical to the fixed-up output of reshape.

out <- reshape(cprefm, dir = "long", varying = names(cprefm), 
v.names = "value", timevar = "profile", times = names(cprefm))
out <- out[order(out$id), 1:2]
rownames(out) <- NULL

out.piv <- cprefm %>% pivot_longer(1:16, "profile", "rating")

identical(out, as.data.frame(out.piv))
## [1] TRUE

2) pre process w transpose Alternately fix it up before passing it to reshape by reshaping the transpose of cprefm. In that case we only have to select out the desired columns while the row order comes out as desired without sorting.

out2 <- reshape(as.data.frame(t(cprefm)), dir = "long",
varying = 1:nrow(cprefm), idvar = "profile", v.names = "value",
ids = names(cprefm), new.row.names = 1:prod(dim(cprefm)))[3:2]

identical(out2, as.data.frame(out.piv))
## [1] TRUE

3) as.data.frame.table w transpose The transpose idea also works with as.data.frame.table:

out3 <- with(as.data.frame.table(t(cprefm), responseName = "value"),
data.frame(profile = as.character(Var1), value))

identical(out3, as.data.frame(out.piv))
## [1] TRUE

This one can be nicely written in terms of pipes like this:

cprefm |>
t() |>
as.data.frame.table(responseName = "value") |>
with(data.frame(profile = as.character(Var1), value))

4) stack w tranpose and with stack:

out4 <- with(stack(as.data.frame(t(cprefm))), 
data.frame(profile = names(cprefm), value = values))

identical(out4, as.data.frame(out.piv))
## [1] TRUE

Reshape in the middle

I am not sure if this is too obvious, but here goes. It should be self-explanatory. Pass in your so dataframe and it returns the reshaped data.

library("reshape2")

reshape.middle <- function(dat) {
dat <- melt(so, id="WorkerId")
dat$set <- substr(dat$variable, 5,5)
dat$name <- paste(substr(dat$variable, 1, 4),
substr(dat$variable, 7, 7),
sep="")
dat$variable <- NULL

dat <- melt(dat, id=c("WorkerId", "set", "name"))
dat$variable <- NULL

return(dcast(dat, WorkerId + set ~ name))
}

so # initial form
so <- reshape.middle(so)
so # as needed

Hope this helps.

Reshaping several variables wide with cast

I think the problem is that ff.df is not yet sufficiently molten. Try this:

library(reshape)

# Melt it down
ff.melt <- melt(ff.df, id.var = c("surveyNum", ".id"))

# Note the new "variable" column, which will be combined
# with .id to make each column header
head(ff.melt)

surveyNum .id variable value
1 1 1 pio 2
2 2 1 pio 2
3 3 1 pio 1
4 4 1 pio 2
5 5 1 pio 1
6 6 1 pio 1

# Cast it out - note that .id comes after variable in the formula;
# I think the only effect of that is that you get "pio_1" instead of "1_pio"
ff.cast <- cast(ff.melt, surveyNum ~ variable + .id)

head(ff.cast)

surveyNum pio_1 pio_2 pio_3 caremgmt_1 caremgmt_2 caremgmt_3 prev_1 prev_2 prev_3 price_1 price_2 price_3
1 1 2 2 2 2 1 1 1 2 2 2 6 3
2 2 2 1 2 1 2 2 2 2 1 1 5 5
3 3 1 2 1 1 2 1 2 1 2 2 5 2
4 4 2 1 1 2 2 2 1 2 2 5 4 5
5 5 1 2 2 1 2 1 1 1 1 3 4 4
6 6 1 2 1 2 1 1 2 1 1 4 2 5

Does that do the trick for you?

Essentially, when casting, the variables indicated on the right-hand side of the casting formula dictate the columns that will appear in the cast result. By indicating only .id, I believe that you were asking cast to somehow cram all of those vectors of values into just three columns - 1, 2, and 3. Melting the data all the way down creates the variable column, which lets you specify that the combination of the .id and variable vectors should define the columns of the cast data frame.

(Sorry if I'm being repetitious/pedantic! I'm trying to work it out for myself, too)

How to prevent reshape() from changing variable names

I suggest you use the reshape2 package instead - it's much easier to use melt and dcast than the built-in reshape():

library(reshape2)
dcast(cdc_city, site~variable, mean)
site var 1 var 2 var 3 var 4 var 5
1 site 1 4 7 2 6 3
2 site 2 89 43 12 54 23
3 site 3 76 62 13 43 23

Confusion on base reshape example

Because the subset operation on wide data.frame removes the custom attributes added by reshape and used by reshape itself to automagically perform the opposite reshaping.

In fact as you can notice the attributes list of wide contains reshapeWide storing all the necessary information to revert the reshape :

> names(attributes(wide))
[1] "row.names" "names" "class" "reshapeWide"

> attributes(wide)$reshapeWide
$v.names
[1] "conc"

$timevar
[1] "time"

$idvar
[1] "Subject"

$times
[1] 0.25 0.50 0.75 1.00 1.25 2.00 3.00 4.00 5.00 6.00 8.00

$varying
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] "conc.0.25" "conc.0.5" "conc.0.75" "conc.1" "conc.1.25" "conc.2" "conc.3" "conc.4" "conc.5" "conc.6" "conc.8"

while wide2 does not :

> names(attributes(wide2))
[1] "names" "class" "row.names"

How to use R's reshape in the following goal:

If its not necessary to use reshape

out <- unlist(lapply(SV37.refined, as.character))
out[out != "0"]
## Sample11 Sample21 Sample22 Sample181
## "string1" "string2" "string1" "string3"

or if you're into one-liners

Filter(function(x) x != "0", unlist(lapply(SV37.refined, as.character)))
## Sample11 Sample21 Sample22 Sample181
## "string1" "string2" "string1" "string3"

What am I doing wrong with this long-to-wide reshape?

I could not reproduce your error and I found some problems with dummy.plug(). I generated data with

library(data.table)
library(reshape2)
library("MASS")

set.seed(115)
pp<-dummy.plug(trial.count = 200,
chan.count = 4,
mean.diff = 100,
value.name = "Value")

And I don't care for data.table so i just converted it to a basic data.frame.

dd<-as.data.frame(pp)

Now you say that Time, Trial, and Channel should uniquely identify a value, but that does not seem to be the case in the dummy data. I see that

subset(dd, Time==-100 & Trial==1 & Channel=="V1")

# Time Trial Class Channel Value
# 1 -100 1 -1 V1 0.73642916
# 6401 -100 1 -1 V1 0.17648939
# 12801 -100 1 -1 V1 0.41366964
# 19201 -100 1 -1 V1 0.07044473
# 25601 -100 1 -1 V1 0.86583284
# 32001 -100 1 -1 V1 0.24255411
# 38401 -100 1 -1 V1 0.92473225
# 44801 -100 1 -1 V1 0.69989600

So there are clearly multiple values for each combination. So to proceed, I decided just to take the mean of the observed values. I had no problems using dcast with

xx<-dcast(dd, Class+Time+Trial~Channel, fun.aggregate=mean)

Then I split up the training/test datasets

train.trials = sample(unique(dd$Trial), 140)
train.data = subset(xx, Trial %in% train.trials)
test.data = subset(xx, !Trial %in% train.trials)

Then I ran lda as above

lda.model <- lda(Class ~ . -Trial, train.data)
lda.pred <- predict(lda.model, test.data)

And I checked out how I did

table(lda.pred$class, test.data$Class)
# -1 1
# -1 704 0
# 1 0 1216

And I appear to do much better than you did.

Unless something bad happened when i converted the data.table to a data.frame, there seems to be problems with your test data. Perhaps there is a problem with your non-cast reshape function. Seeing as how dcast works just fine, maybe you want to check that your function works as well.



Related Topics



Leave a reply



Submit