Reordering Factor Gives Different Results, Depending on Which Packages Are Loaded

Reordering factor gives different results, depending on which packages are loaded

This happens because:

  1. gmodels imports gdata
  2. gdata creates a new method for reorder.factor

Start a clean session. Then:

methods("reorder")
[1] reorder.default* reorder.dendrogram*

Now load gdata (or load gmodels, which has the same effect):

library(gdata)
methods("reorder")
[1] reorder.default* reorder.dendrogram* reorder.factor

Notice there is no masking, since reorder.factor doesn't exist in base

Recreate the problem, but this time explicitly call the different packages:

group = c("C","F","D","B","A","E")
num = c(12,11,7,7,2,1)
data = data.frame(group,num)

The base R version (using reorder.default):

str(transform(data, group=stats:::reorder.default(group,-num)))
'data.frame': 6 obs. of 2 variables:
$ group: Factor w/ 6 levels "C","F","B","D",..: 1 2 4 3 5 6
..- attr(*, "scores")= num [1:6(1d)] -2 -7 -12 -7 -1 -11
.. ..- attr(*, "dimnames")=List of 1
.. .. ..$ : chr "A" "B" "C" "D" ...
$ num : num 12 11 7 7 2 1

The gdata version (using reorder.factor):

str(transform(data, group=gdata:::reorder.factor(group,-num)))
'data.frame': 6 obs. of 2 variables:
$ group: Factor w/ 6 levels "A","B","C","D",..: 3 6 4 2 1 5
$ num : num 12 11 7 7 2 1

Why does gdata:::reorder.factor behave differently from stats:::reorder.default for integers and doubles?


Why this is happening:

Hm, this seems to be because gdata:::reorder.factor takes in an argument named sort which by default has value mixedsort. This mixedsort argument uses mixedorder function from package gtools. By loading gtools and doing ?mixedorder, you can find out why this is happening:

?mixedorder

Order or Sort strings with embedded numbers so that the numbers are in the correct order:

These functions sort or order character strings containing numbers so that the numbers are numerically sorted rather than sorted by character value. I.e. "Asprin 50mg" will come before "Asprin 100mg". In addition, case of character strings is ignored so that "a", will come before "B" and "C".

Also ?reorder.factor clearly states this:

?gdata:::reorder.factor

If sort is provided (as it is by default): The new factor level names are generated by applying the supplied function to the existing factor level names. With sort=mixedsort the factor levels are sorted so that combined numeric and character strings are sorted in according to character rules on the character sections (including ignoring case), and the numeric rules for the numeric sections. See mixedsort for details.



Solution:

You'll have to provide a value of NULL to sort argument so that mixedsort is not taken by default.

gdata:::reorder.factor(x, z, function(X)-X, sort=NULL)
# [1] a b c d e f
# Levels: e d b c f a

Alternatively, as @BenBolker points out under comments, you can provide "sort" argument as simply sort:

gdata:::reorder.factor(x, z, function(X)-X, sort=sort)


On debugging:

For the future, debugonce is your friend for these sort of things. By doing

debugonce(gdata:::reorder.factor)
gdata:::reorder.factor(x, z, function(X)-X)

(and hitting enter and inspecting the output) you can find that the issue comes from the last few lines that are being run:

else if (!missing(FUN)) 
new.order <- names(sort(tapply(X, x, FUN, ...)))

For your data,

> X
# [1] 1.1 2.4 1.3 2.5 2.6 1.2

> x
# [1] a b c d e f
# Levels: a b c d e f

And, tapply(...) gives:

> tapply(X, x, FUN, ...)
# a b c d e f
# -1.1 -2.4 -1.3 -2.5 -2.6 -1.2

Here, the "sort" should give:

> base:::sort(tapply(X, x, FUN, ...))
# e d b c f a
# -2.6 -2.5 -2.4 -1.3 -1.2 -1.1

But it gives:

#   b    d    e    a    f    c 
# -2.4 -2.5 -2.6 -1.1 -1.2 -1.3

This is because the "sort" that's being called is not from base, which can be seen by typing "sort" from within the debugger:

> sort # from within the function call (using debugonce)
# function (x)
# x[mixedorder(x)]
# <environment: namespace:gtools>

mixedorder is a function from package gtools. Since the command fetches the names and the sorting is wrong, the wrong order is being fetched. So basically the issue is that the sort that's being called is mixedsort and not base:::sort.

It's easy to verify this by installing gtools and doing:

require(gtools)
gtools:::mixedorder(c(-2.4, -2.5, -2.6))
# [1] 1 2 3

order(c(-2.4, -2.5, -2.6))
# [1] 3 2 1

Therefore, you'll have to provide sort=NULL to make sure this doesn't happen.

Determining which version of a function is active when many packages are loaded

You can pull this information with your own function helper.

which_package <- function(fun) {
if(is.character(fun)) fun <- getFunction(fun)
stopifnot(is.function(fun))
x <- environmentName(environment(fun))
if (!is.null(x)) return(x)
}

This will return R_GlobalEnv for functions that you define in the global environment. There is also the packageName function if you really want to restrict it to packages only.
For example

library(MASS)
library(dplyr)
which_package(select)
# [1] "dplyr"

Transform and Reorder

gdata defines a method reorder.factor which has a different behavior that reorder.default applied to factors. See a more detailed discussion in this other question.

I've included a transcript of this behavior, with and without gdata, starting in a new session below; note that once you have loaded gdata, it is very difficult to get rid of that method (for reasons explored here).

> tmp <- data.frame(Letters=letters[1:26],values=rnorm(26))
> tmp <- transform(tmp, Letters = reorder(Letters,values))
>
> identical(levels(tmp$Letters),letters[1:26])
[1] FALSE
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base
>
> library("gdata")
gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.

gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.

Attaching package: ‘gdata’

The following object(s) are masked from ‘package:stats’:

nobs

The following object(s) are masked from ‘package:utils’:

object.size

>
> tmp <- data.frame(Letters=letters[1:26],values=rnorm(26))
> tmp <- transform(tmp, Letters = reorder(Letters,values))
>
> identical(levels(tmp$Letters),letters[1:26])
[1] TRUE
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] gdata_2.12.0

loaded via a namespace (and not attached):
[1] gtools_2.7.0

How can a non-imported method in a not-attached package be found by calls to functions not having it in their namespace?

I'm not sure if I correctly understand your question, but the main point is that group is character vector while data$group is factor.

After attaching gmodels, the call for reorder(factor) calls gdata:::reorder.factor.
so, reorder(factor(group)) calls it.

In transform, the function is evaluated within the environment of the first argument, so in T2 <- transform(data, group = reorder(group,-num)), group is factor.

UPDATED

library attaches the import packages into loaded namespace.

> loadedNamespaces()
[1] "RCurl" "base" "datasets" "devtools" "grDevices" "graphics" "methods"
[8] "stats" "tools" "utils"
> library(gmodels) # here, namespace:gdata is loaded
> loadedNamespaces()
[1] "MASS" "RCurl" "base" "datasets" "devtools" "gdata" "gmodels"
[8] "grDevices" "graphics" "gtools" "methods" "stats" "tools" "utils"

Just in case, the reorder generic exists in namespace:stats:

> r <- ls(.__S3MethodsTable__., envir = asNamespace("stats"))
> r[grep("reorder", r)]
[1] "reorder" "reorder.default" "reorder.dendrogram"

And for more details

The call of reorder will search the S3generics in two envs:

see ?UseMethod

first in the environment in which the generic function is called, and then in the registration data base for the environment in which the generic is defined (typically a namespace).

then, loadNamespace registers the S3 functions to the namespace.

So , in your case, library(gmodels) -> loadNamespace(gdata) -> registerS3Methods(gdata).

After this, you can find it by:

> methods(reorder)
[1] reorder.default* reorder.dendrogram* reorder.factor*

Non-visible functions are asterisked

However, as the reorder.factor is not attached on your search path, you cannot access it directly:

> reorder.factor
Error: object 'reorder.factor' not found

Probably this is whole scenario.

Reorder levels of a factor without changing order of values

Use the levels argument of factor:

df <- data.frame(f = 1:4, g = letters[1:4])
df
# f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

levels(df$g)
# [1] "a" "b" "c" "d"

df$g <- factor(df$g, levels = letters[4:1])
# levels(df$g)
# [1] "d" "c" "b" "a"

df
# f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

Reorder() not correctly reordering a factor variable in ggplot

Because you did not make it an ordered factor. Try

ggplot(x, aes(reorder(country, wing, median, order=TRUE), wing)) + geom_boxplot()

Sample Image



Related Topics



Leave a reply



Submit