Sort a List of Nontrivial Elements in R

Sort a list of nontrivial elements in R

To make this is as simple I can, say your objects are lists with two elements, a name and a value. The value is a numeric; that's what we want to sort by. You can imagine having more elements and needing to do something more complex to sort.

The sort help page tells us that sort uses xtfrm; xtfrm in turn tells us it will use == and > methods for the class of x[i].

First I'll define an object that I want to sort:

xx <- lapply(c(3,5,7,2,4), function(i) list(name=LETTERS[i], value=i))
class(xx) <- "myobj"

Now, since xtfrm works on the x[i]'s, I need to define a [ function that returns the desired elements but still with the right class

`[.myobj` <- function(x, i) {
class(x) <- "list"
structure(x[i], class="myobj")
}

Now we need == and > functions for the myobj class; this potentially could be smarter by vectorizing these properly; but for the sort function, we know that we're only going to be passing in myobj's of length 1, so I'll just use the first element to define the relations.

`>.myobj` <- function(e1, e2) {
e1[[1]]$value > e2[[1]]$value
}

`==.myobj` <- function(e1, e2) {
e1[[1]]$value == e2[[1]]$value
}

Now sort just works.

sort(xx)

It might be considered more proper to write a full Ops function for your object; however, to just sort, this seems to be all you need. See p.89-90 in Venables/Ripley for more details about doing this using the S3 style. Also, if you can easily write an xtfrm function for your objects, that would be simpler and most likely faster.

Order list elements in R

dates <- do.call(rbind, 
strsplit(gsub(".dat", "",
myfiles, fixed=TRUE),
"-"))
dates <- matrix(as.numeric(dates), ncol=2)

myfilesContent[order(dates[,1], dates[,2])]

Sorting list of list of elements of a custom class in R?

This answer from Aaron demonstrates, exactly, what is needed to apply a customized sort on a classed object. As Roland notes, you -actually- need to sort "L" and, thus, that is where the focus on custom sort should be. To provide flexibility specifying on which index of "L" 's elements to sort, a way would be to store an extra attr on "L":

Turn "L" to an appropriate object:

class(L) = "myclass"
attr(L, "sort_ind") = 1L

Ops methods need to be defined (extract the relevant element of your data):

"<.myclass" = function(x, y) 
{
i = attr(x, "sort_ind") ## also check if 'x' and 'y' have the same 'attr(, "sort_ind")'
x[[1]][[i]] < y[[1]][[i]]
}
"==.myclass" = function(x, y)
{
i = attr(x, "sort_ind")
x[[1]][[i]] == y[[1]][[i]]
}
">.myclass" = function(x, y)
{
i = attr(x, "sort_ind")
x[[1]][[i]] > y[[1]][[i]]
}

And a subset method:

"[.myclass" = function(x, i) 
{
y = .subset(x, i)
attributes(y) = attributes(x)
return(y)
}

The above methods are necessary (perhaps, except "<") to be defined since a call to sort/order will end up calling rank which needs .gt in order to subset accordingly each element and compare.
Finally, a get/set function for sauce:

sort_ind = function(x) attr(x, "sort_ind")
"sort_ind<-" = function(x, value)
{
attr(x, "sort_ind") = value
return(x)
}

And:

order(L)
#[1] 3 2 1
sort_ind(L) = 3
order(L)
#[1] 2 3 1

A method for sort can be, also, created to wrap all the above:

sort.myclass = function(x, sort_ind = attr(x, "sort_ind"), ...)
{
sort_ind(x) = sort_ind
NextMethod()
}

sort(L)
sort(L, sort_ind = 1)

(I assumed that your toList function would look like something toList = function(x) x[[1L]])

How to order a list by a custom function, discarding duplicates?

Adding another alternative, for completeness, regarding the "custom sort"/"custom unique" part of the question. By defining methods for certain functions (as seen in ?xtfrm) we can apply custom sort and unique functions to any list (or other object).

First, a "class" attribute needs to be added:

class(thresholds) = "thresholds"

Then, define the necessary custom functions:

"==.thresholds" = function(x, y) return(x[[1]][["value"]] == y[[1]][["value"]])
">.thresholds" = function(x, y) return(x[[1]][["value"]] > y[[1]][["value"]])
"[.thresholds" = function(x, i) return(structure(.subset(x, i), class = class(x)))
is.na.thresholds = function(x) return(is.na(x[[1]][["value"]]))

Now, we can apply sort:

sort(thresholds)

Finally, add a custom unique function:

duplicated.thresholds = function(x, ...) return(duplicated(sapply(x, function(elt) elt[["value"]])))
unique.thresholds = function(x, ...) return(x[!duplicated((x))])

And:

sort(unique(thresholds))

(Similar answers and more information here and here)

How to sort by Library of Congress Classification (LCC) number in R

mixedsort from the gtools package (part of standard R) turns out to do just the trick:

library(gtools)
call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")
mixedsort(call_numbers)
## [1] "QA 7 H3 1992" "QA 76.73 R3 W53 2015" "QA 90 H33 2016" "QA 276.45 R3 A35 2010"

Further, mixedorder can be used to sort a data frame by one column.

This is a special case of what was answered earlier in How to sort a character vector where elements contain letters and numbers in R?

Group data frame by elements from a variable containing lists of elements

We can use simple base R solution with table to calculate the frequency after unlisting the list and then create a data.table based on that table object

tbl <- table(unlist(df$y))
data.frame(group = names(tbl), n = as.vector(tbl))
# group n
#1 A 2
#2 B 2
#3 C 2
#4 D 1
#5 E 1

Or another option with tidyverse

library(dplyr)
library(tidyr)
unnest(df) %>%
group_by(group = y) %>%
summarise(n=n())
# <chr> <int>
#1 A 2
#2 B 2
#3 C 2
#4 D 1
#5 E 1

Or as @alexis_laz mentioned in the comments, an alternative is as.data.frame.table

as.data.frame(table(group = unlist(df$y)), responseName = "n")

R - find rows with at least n distinct elements

Try any of these:

nuniq <- function(x) length(unique(x))
subset(dd, apply(dd, 1, nuniq) >= 2)

subset(dd, apply(dd, 1, sd) > 0)

subset(dd, apply(dd[-1] != dd[[1]], 1, any))

subset(dd, rowSums(dd[-1] != dd[[1]]) > 0)

subset(dd, lengths(lapply(as.data.frame(t(dd)), unique)) >= 2)

subset(dd, lengths(apply(dd, 1, table)) >= 2)

# nuniq is from above
subset(dd, tapply(as.matrix(dd), row(dd), nuniq) >= 2)

giving:

  col.1 col.2 col.3 col.4
1 0 0 1 0
2 0 2 2 1
5 0 1 1 1

Alternatives to nuniq

In the above nuniq could be replaced with any of these:

function(x) nlevels(factor(x))

function(x) sum(!duplicated(x))

funtion(x) length(table(x))

dplyr::n_distinct

Note

dd in reproducible form is:

dd <- structure(list(col.1 = c(0L, 0L, 2L, 0L, 0L), col.2 = c(0L, 2L, 
2L, 0L, 1L), col.3 = c(1L, 2L, 2L, 0L, 1L), col.4 = c(0L, 1L,
2L, 0L, 1L)), class = "data.frame", row.names = c(NA, -5L))

Sort columns by column sums, identical columns adjacent

Here's a bit of a strange solution. You can secondarily order by a collapsed string representation of the columns, which will serve as a tiebreaker for column sets that have equal colSums(). This will ensure that identical columns are clustered together, as they will lexicographically sort next to each other.

dat[,order(decreasing=T,colSums(dat,na.rm=T),apply(dat,2L,paste,collapse=''))];
## var2 var1 var5 var3 var4
## [1,] 1 1 1 0 0
## [2,] 1 0 1 0 0
## [3,] 1 1 0 1 1
## [4,] 1 1 0 1 1
## [5,] 1 0 0 0 0


Related Topics



Leave a reply



Submit