Segfault in R Using Reshape2 Package and Dcast

segfault in R using reshape2 package and dcast

Just to close out this old question, this was a bug that was fixed as described in this github issue.

Reshaping data in R without using dcast (reshape2)

This would be the pre-Hadley method; first aggregate to get the sums, then reshape.

foo <- aggregate(d[,4,drop=FALSE], by=d[,1:3], sum)
reshape(foo, v.names="elasmo.discard", idvar=c("EID", "tspp.name"),
timevar="elasmo.name", direction="wide")

If the first part is slow, it may help to have fewer columns in the "by" part; it looks like tspp.name is defined by EID, if so, don't aggregate by it but instead add it in after the fact.

If the second part is slow, perhaps try one of the methods here:
https://stackoverflow.com/a/9617424/210673.

To get better help on speeding it up, provide an appropriate example (perhaps using sample or rep) that code can be tested on. Solution speed often depends on how many unique combinations of each variable there are.

Error message running the example from the reshape2 help page

For completeness:

PaulHurleyuk's comment:

Have you tried restarting R and trying the example in a fresh session
? Or do rm(list=ls()) to remove everything from the current session.
In the past I have managed to break things by assigning something to
something that shouldn't be assigned to.

Christoph_J's response:

Thanks ... that was exactly the problem...

The problem occurred because I most probably reassigned the mean
function while I was playing around with a statisctic example during
an R session. Restarting R solved the problem. Now, everything works
as expected again.

adding row/column total data when aggregating data using plyr and reshape2 package in R

Looking at your desired output (now that I'm in front of a computer), perhaps you should look at the margins argument of dcast:

library(reshape2)
dcast(temp.df, var2 ~ var1, value.var = "var2",
fun.aggregate=length, margins = "var1")
# var2 a b c d e (all)
# 1 11 3 1 6 4 2 16
# 2 12 1 3 6 5 5 20
# 3 13 5 9 3 6 1 24
# 4 14 4 7 3 6 2 22
# 5 15 0 5 1 5 7 18

Also look into the addmargins function in base R.

caught segfault - 'memory not mapped' error in R

It's not really an explanation of the problem or a satisfactory answer but I examined the codes more closely and figured out that in the first example, the problem appears when using acast from the reshape2 package. I deleted it in this case because I realized it's not actually needed there but it can be replaced with reshape from the reshape package (as shown in another question): reshape(input, idvar="x", timevar="y", direction="wide")[-1].

As for the second example, it's not easy to find the exact cause of the problem but as a workaround in my case helped to set a smaller number of cores used for parallel computation - the cluster has 48, I was using only 15 since even before this issue R was running out of memory if the code was run using all 48 cores. When I reduced the number of cores to 10 it suddenly started working like before.

R: How to get something like adjacency matrix, but on the intersection value of third column?

You may try this, where df is your data frame:

library(reshape2)
dcast(df, V1 ~ V2)

# V1 891552 891553
# 1 42966 C B
# 2 83965 A D
# 3 88599 B D

Contingency table based on third variable (numeric)

You can use xtabs for that :

R> xtabs(Number~Customer+Product, data=input)

Product
Customer 100001 100002 100003 100004 100008
1000001 0 1 0 2 0
1000002 0 0 3 0 0
1000003 0 1 1 0 1


Related Topics



Leave a reply



Submit