segfault in R using reshape2 package and dcast
Just to close out this old question, this was a bug that was fixed as described in this github issue.
Reshaping data in R without using dcast (reshape2)
This would be the pre-Hadley method; first aggregate to get the sums, then reshape.
foo <- aggregate(d[,4,drop=FALSE], by=d[,1:3], sum)
reshape(foo, v.names="elasmo.discard", idvar=c("EID", "tspp.name"),
timevar="elasmo.name", direction="wide")
If the first part is slow, it may help to have fewer columns in the "by" part; it looks like tspp.name
is defined by EID
, if so, don't aggregate by it but instead add it in after the fact.
If the second part is slow, perhaps try one of the methods here:
https://stackoverflow.com/a/9617424/210673.
To get better help on speeding it up, provide an appropriate example (perhaps using sample or rep) that code can be tested on. Solution speed often depends on how many unique combinations of each variable there are.
Error message running the example from the reshape2 help page
For completeness:
PaulHurleyuk's comment:
Have you tried restarting R and trying the example in a fresh session
? Or do rm(list=ls()) to remove everything from the current session.
In the past I have managed to break things by assigning something to
something that shouldn't be assigned to.
Christoph_J's response:
Thanks ... that was exactly the problem...
The problem occurred because I most probably reassigned the mean
function while I was playing around with a statisctic example during
an R session. Restarting R solved the problem. Now, everything works
as expected again.
adding row/column total data when aggregating data using plyr and reshape2 package in R
Looking at your desired output (now that I'm in front of a computer), perhaps you should look at the margins
argument of dcast
:
library(reshape2)
dcast(temp.df, var2 ~ var1, value.var = "var2",
fun.aggregate=length, margins = "var1")
# var2 a b c d e (all)
# 1 11 3 1 6 4 2 16
# 2 12 1 3 6 5 5 20
# 3 13 5 9 3 6 1 24
# 4 14 4 7 3 6 2 22
# 5 15 0 5 1 5 7 18
Also look into the addmargins
function in base R.
caught segfault - 'memory not mapped' error in R
It's not really an explanation of the problem or a satisfactory answer but I examined the codes more closely and figured out that in the first example, the problem appears when using acast
from the reshape2
package. I deleted it in this case because I realized it's not actually needed there but it can be replaced with reshape
from the reshape
package (as shown in another question): reshape(input, idvar="x", timevar="y", direction="wide")[-1]
.
As for the second example, it's not easy to find the exact cause of the problem but as a workaround in my case helped to set a smaller number of cores used for parallel computation - the cluster has 48, I was using only 15 since even before this issue R was running out of memory if the code was run using all 48 cores. When I reduced the number of cores to 10 it suddenly started working like before.
R: How to get something like adjacency matrix, but on the intersection value of third column?
You may try this, where df
is your data frame:
library(reshape2)
dcast(df, V1 ~ V2)
# V1 891552 891553
# 1 42966 C B
# 2 83965 A D
# 3 88599 B D
Contingency table based on third variable (numeric)
You can use xtabs
for that :
R> xtabs(Number~Customer+Product, data=input)
Product
Customer 100001 100002 100003 100004 100008
1000001 0 1 0 2 0
1000002 0 0 3 0 0
1000003 0 1 1 0 1
Related Topics
How to Fuzzy Join Based on Multiple Columns and Conditions
Fastest Way to Parse a Date-Time String to Class Date
Error with New R 3.1.3 Version
R: How to Expand a Row Containing a "List" to Several Rows...One for Each List Member
How to Subscript The X Axis Tick Label
"Update by Reference" Vs Shallow Copy
What Happens When Prob Argument in Sample Sums to Less/Greater Than 1
R - Insert Row for Missing Monthly Data and Interpolate
Heatmap with Values and Some Additional Features in R
Find Max Per Group and Return Another Column
How to Apply Histogram on Dependent Data in R
How to Set Themes Globally for Ggplot2
Converting a Long-Formated Dataframe to Wide Format Tidyverse
How to Force Ggplot's Geom_Tile to Fill Every Facet
How to Use Stat_Function by Group