Wrapping base R reshape for ease-of-use
I would also like to see an option to order the output, since that's one of the things I don't like about reshape in base R. As an example, let's use the Stata Learning Module: Reshaping data wide to long, which you are already familiar with. The example I'm looking at is the "kids height and weight at age 1 and age 2" example.
Here's what I normally do with reshape()
:
# library(foreign)
kidshtwt = read.dta("http://www.ats.ucla.edu/stat/stata/modules/kidshtwt.dta")
kidshtwt.l = reshape(kidshtwt, direction="long", idvar=1:2,
varying=3:6, sep="", timevar="age")
# The reshaped data is correct, just not in the order I want it
# so I always have to do another step like this
kidshtwt.l = kidshtwt.l[order(kidshtwt.l$famid, kidshtwt.l$birth),]
Since this is an annoying step that I always have to go through when reshaping the data, I think it would be useful to add that into your function.
I also suggest at least having an option for doing the same thing with the final column order for reshaping from long
to wide
.
Example function for column ordering
I'm not sure of the best way to integrate this into your function, but I put this together to sort a data frame based on basic patterns for the variable names.
col.name.sort = function(data, patterns) {
a = names(data)
b = length(patterns)
subs = vector("list", b)
for (i in 1:b) {
subs[[i]] = sort(grep(patterns[i], a, value=T))
}
x = unlist(subs)
data[ , x ]
}
It can be used in the following manner. Imagine we had saved the output of your reshapeasy
long
to wide
example as a data frame named a
, and we wanted it ordered by "surveyNum", "caremgmt" (1-3), "prev" (1-3), "pio" (1-3), and "price" (1-3), we could use:
col.name.sort(a, c("sur", "car", "pre", "pio", "pri"))
How to reproduce the result of pivot_longer with reshape in base R?
1) post process The reshape command in the question produces the same output except for row names, row order and an extra id column so just fix those up.
At the end we run pivot_longer and converting its output to data.frame show that that is identical to the fixed-up output of reshape.
out <- reshape(cprefm, dir = "long", varying = names(cprefm),
v.names = "value", timevar = "profile", times = names(cprefm))
out <- out[order(out$id), 1:2]
rownames(out) <- NULL
out.piv <- cprefm %>% pivot_longer(1:16, "profile", "rating")
identical(out, as.data.frame(out.piv))
## [1] TRUE
2) pre process w transpose Alternately fix it up before passing it to reshape by reshaping the transpose of cprefm. In that case we only have to select out the desired columns while the row order comes out as desired without sorting.
out2 <- reshape(as.data.frame(t(cprefm)), dir = "long",
varying = 1:nrow(cprefm), idvar = "profile", v.names = "value",
ids = names(cprefm), new.row.names = 1:prod(dim(cprefm)))[3:2]
identical(out2, as.data.frame(out.piv))
## [1] TRUE
3) as.data.frame.table w transpose The transpose idea also works with as.data.frame.table:
out3 <- with(as.data.frame.table(t(cprefm), responseName = "value"),
data.frame(profile = as.character(Var1), value))
identical(out3, as.data.frame(out.piv))
## [1] TRUE
This one can be nicely written in terms of pipes like this:
cprefm |>
t() |>
as.data.frame.table(responseName = "value") |>
with(data.frame(profile = as.character(Var1), value))
4) stack w tranpose and with stack:
out4 <- with(stack(as.data.frame(t(cprefm))),
data.frame(profile = names(cprefm), value = values))
identical(out4, as.data.frame(out.piv))
## [1] TRUE
Reshape in the middle
I am not sure if this is too obvious, but here goes. It should be self-explanatory. Pass in your so
dataframe and it returns the reshaped data.
library("reshape2")
reshape.middle <- function(dat) {
dat <- melt(so, id="WorkerId")
dat$set <- substr(dat$variable, 5,5)
dat$name <- paste(substr(dat$variable, 1, 4),
substr(dat$variable, 7, 7),
sep="")
dat$variable <- NULL
dat <- melt(dat, id=c("WorkerId", "set", "name"))
dat$variable <- NULL
return(dcast(dat, WorkerId + set ~ name))
}
so # initial form
so <- reshape.middle(so)
so # as needed
Hope this helps.
Reshaping several variables wide with cast
I think the problem is that ff.df
is not yet sufficiently molten. Try this:
library(reshape)
# Melt it down
ff.melt <- melt(ff.df, id.var = c("surveyNum", ".id"))
# Note the new "variable" column, which will be combined
# with .id to make each column header
head(ff.melt)
surveyNum .id variable value
1 1 1 pio 2
2 2 1 pio 2
3 3 1 pio 1
4 4 1 pio 2
5 5 1 pio 1
6 6 1 pio 1
# Cast it out - note that .id comes after variable in the formula;
# I think the only effect of that is that you get "pio_1" instead of "1_pio"
ff.cast <- cast(ff.melt, surveyNum ~ variable + .id)
head(ff.cast)
surveyNum pio_1 pio_2 pio_3 caremgmt_1 caremgmt_2 caremgmt_3 prev_1 prev_2 prev_3 price_1 price_2 price_3
1 1 2 2 2 2 1 1 1 2 2 2 6 3
2 2 2 1 2 1 2 2 2 2 1 1 5 5
3 3 1 2 1 1 2 1 2 1 2 2 5 2
4 4 2 1 1 2 2 2 1 2 2 5 4 5
5 5 1 2 2 1 2 1 1 1 1 3 4 4
6 6 1 2 1 2 1 1 2 1 1 4 2 5
Does that do the trick for you?
Essentially, when casting, the variables indicated on the right-hand side of the casting formula dictate the columns that will appear in the cast result. By indicating only .id
, I believe that you were asking cast
to somehow cram all of those vectors of values into just three columns - 1, 2, and 3. Melting the data all the way down creates the variable
column, which lets you specify that the combination of the .id
and variable
vectors should define the columns of the cast data frame.
(Sorry if I'm being repetitious/pedantic! I'm trying to work it out for myself, too)
How to prevent reshape() from changing variable names
I suggest you use the reshape2
package instead - it's much easier to use melt
and dcast
than the built-in reshape()
:
library(reshape2)
dcast(cdc_city, site~variable, mean)
site var 1 var 2 var 3 var 4 var 5
1 site 1 4 7 2 6 3
2 site 2 89 43 12 54 23
3 site 3 76 62 13 43 23
Confusion on base reshape example
Because the subset operation on wide
data.frame removes the custom attributes added by reshape
and used by reshape
itself to automagically perform the opposite reshaping.
In fact as you can notice the attributes list of wide
contains reshapeWide
storing all the necessary information to revert the reshape :
> names(attributes(wide))
[1] "row.names" "names" "class" "reshapeWide"
> attributes(wide)$reshapeWide
$v.names
[1] "conc"
$timevar
[1] "time"
$idvar
[1] "Subject"
$times
[1] 0.25 0.50 0.75 1.00 1.25 2.00 3.00 4.00 5.00 6.00 8.00
$varying
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] "conc.0.25" "conc.0.5" "conc.0.75" "conc.1" "conc.1.25" "conc.2" "conc.3" "conc.4" "conc.5" "conc.6" "conc.8"
while wide2
does not :
> names(attributes(wide2))
[1] "names" "class" "row.names"
How to use R's reshape in the following goal:
If its not necessary to use reshape
out <- unlist(lapply(SV37.refined, as.character))
out[out != "0"]
## Sample11 Sample21 Sample22 Sample181
## "string1" "string2" "string1" "string3"
or if you're into one-liners
Filter(function(x) x != "0", unlist(lapply(SV37.refined, as.character)))
## Sample11 Sample21 Sample22 Sample181
## "string1" "string2" "string1" "string3"
What am I doing wrong with this long-to-wide reshape?
I could not reproduce your error and I found some problems with dummy.plug()
. I generated data with
library(data.table)
library(reshape2)
library("MASS")
set.seed(115)
pp<-dummy.plug(trial.count = 200,
chan.count = 4,
mean.diff = 100,
value.name = "Value")
And I don't care for data.table so i just converted it to a basic data.frame.
dd<-as.data.frame(pp)
Now you say that Time
, Trial
, and Channel
should uniquely identify a value, but that does not seem to be the case in the dummy data. I see that
subset(dd, Time==-100 & Trial==1 & Channel=="V1")
# Time Trial Class Channel Value
# 1 -100 1 -1 V1 0.73642916
# 6401 -100 1 -1 V1 0.17648939
# 12801 -100 1 -1 V1 0.41366964
# 19201 -100 1 -1 V1 0.07044473
# 25601 -100 1 -1 V1 0.86583284
# 32001 -100 1 -1 V1 0.24255411
# 38401 -100 1 -1 V1 0.92473225
# 44801 -100 1 -1 V1 0.69989600
So there are clearly multiple values for each combination. So to proceed, I decided just to take the mean of the observed values. I had no problems using dcast
with
xx<-dcast(dd, Class+Time+Trial~Channel, fun.aggregate=mean)
Then I split up the training/test datasets
train.trials = sample(unique(dd$Trial), 140)
train.data = subset(xx, Trial %in% train.trials)
test.data = subset(xx, !Trial %in% train.trials)
Then I ran lda as above
lda.model <- lda(Class ~ . -Trial, train.data)
lda.pred <- predict(lda.model, test.data)
And I checked out how I did
table(lda.pred$class, test.data$Class)
# -1 1
# -1 704 0
# 1 0 1216
And I appear to do much better than you did.
Unless something bad happened when i converted the data.table to a data.frame, there seems to be problems with your test data. Perhaps there is a problem with your non-cast reshape function. Seeing as how dcast
works just fine, maybe you want to check that your function works as well.
Related Topics
Converting Utc Time to Local Standard Time in R
Assigning and Removing Objects in a Loop: Eval(Parse(Paste(
R: Clustering Results Are Different Everytime I Run
Fuzzyjoin Two Data Frames Using Data.Table
How to Classify a Given Date/Time by the Season (E.G. Summer, Autumn)
How to Set Axis Ranges in Ggplot2 When Using a Log Scale
Enriching a Ggplot2 Plot with Multiple Geom_Segment in a Loop
How to Load CSV File into Sparkr on Rstudio
Plot Margins in Rmarkdown/Knitr
Assign Names to Vector Entries Without Assigning the Vector a Variable Name
R Reshape2 'Aggregation Function Missing: Defaulting to Length'
Ggplot Inserting Space Before Degree Symbol on Axis Label
Equal Frequency Discretization in R
R: How to Aggregate Some Columns While Keeping Other Columns
Scientific Notation Issue in R