Caught Segfault Error in R

caught segfault - 'memory not mapped' error in R

It's not really an explanation of the problem or a satisfactory answer but I examined the codes more closely and figured out that in the first example, the problem appears when using acast from the reshape2 package. I deleted it in this case because I realized it's not actually needed there but it can be replaced with reshape from the reshape package (as shown in another question): reshape(input, idvar="x", timevar="y", direction="wide")[-1].

As for the second example, it's not easy to find the exact cause of the problem but as a workaround in my case helped to set a smaller number of cores used for parallel computation - the cluster has 48, I was using only 15 since even before this issue R was running out of memory if the code was run using all 48 cores. When I reduced the number of cores to 10 it suddenly started working like before.

caught segfault error in R

In case anyone else has this problem or similar in the future, I sent a bug report to the package maintainer and he recommended uninstalling all installed packages and starting over. I took his advice and it worked!

I followed advice from this posting: http://r.789695.n4.nabble.com/Reset-R-s-library-to-base-packages-only-remove-all-installed-contributed-packages-td3596151.html

ip <- installed.packages()
pkgs.to.remove <- ip[!(ip[,"Priority"] %in% c("base", "recommended")), 1]
sapply(pkgs.to.remove, remove.packages)

fisher.test crash R with *** caught segfault *** error

I can confirm that this is a bug in R 4.2 and that it is now fixed in the development branch of R (with this commit on 7 May). I wouldn't be surprised if it were ported to a patch-release sometime soon, but that's unknown/up to the R developers. Running your example above doesn't segfault any more, but it does throw an error:

Error in fisher.test(d, simulate.p.value = FALSE) :
FEXACT[f3xact()] error: hash key 5e+09 > INT_MAX, kyy=203, it[i (= nco = 6)]= 0.

Rather set 'simulate.p.value=TRUE'

So this makes your workflow better (you can handle these errors with try()/tryCatch()), but it doesn't necessarily satisfy you if you really want to perform an exact Fisher test on these data. (Exact tests on large tables with large entries are extremely computationally difficult, as they essentially have to do computations over the set of all possible tables with given marginal values.)

I don't have any brilliant ideas for detecting the exact conditions that will cause this problem (maybe you can come up with a rough rubric based on the dimensions of the table and the sum of the counts in the table, e.g. if (prod(dim(d)) > 30 && sum(d) > 200) ... ?)

Setting simulate.p.value=TRUE is the most sensible approach. However, if you expect precise results for extreme tables (e.g. you are working in bioinformatics and are going to apply a huge multiple-comparisons correction to the results), you're going to be disappointed. For example:

dd <- matrix(0, 6, 6)
dd[5,5] <- dd[6,6] <- 100
fisher.test(dd)$p.value
## 2.208761e-59, reported as "< 2.2e-16"
fisher.test(dd, simulate.p.value = TRUE, B = 10000)$p.value
# 9.999e-05

fisher.test(..., simulate.p.value = TRUE) will never return a value smaller than 1/(B+1) (this is what happens if none of the simulated tables are more extreme than the observed table: technically, the p-value ought to be reported as "<= 9.999e-05"). Therefore, you will never (in the lifetime of the universe) be able to calculate a p-value like 1e-59, you'll just be able to set a bound based on how large you're willing to make B.



Related Topics



Leave a reply



Submit