Sort a list of nontrivial elements in R
To make this is as simple I can, say your objects are lists with two elements, a name and a value. The value is a numeric; that's what we want to sort by. You can imagine having more elements and needing to do something more complex to sort.
The sort
help page tells us that sort
uses xtfrm
; xtfrm
in turn tells us it will use ==
and >
methods for the class of x[i]
.
First I'll define an object that I want to sort:
xx <- lapply(c(3,5,7,2,4), function(i) list(name=LETTERS[i], value=i))
class(xx) <- "myobj"
Now, since xtfrm
works on the x[i]
's, I need to define a [
function that returns the desired elements but still with the right class
`[.myobj` <- function(x, i) {
class(x) <- "list"
structure(x[i], class="myobj")
}
Now we need ==
and >
functions for the myobj
class; this potentially could be smarter by vectorizing these properly; but for the sort function, we know that we're only going to be passing in myobj
's of length 1, so I'll just use the first element to define the relations.
`>.myobj` <- function(e1, e2) {
e1[[1]]$value > e2[[1]]$value
}
`==.myobj` <- function(e1, e2) {
e1[[1]]$value == e2[[1]]$value
}
Now sort
just works.
sort(xx)
It might be considered more proper to write a full Ops
function for your object; however, to just sort, this seems to be all you need. See p.89-90 in Venables/Ripley for more details about doing this using the S3 style. Also, if you can easily write an xtfrm
function for your objects, that would be simpler and most likely faster.
Order list elements in R
dates <- do.call(rbind,
strsplit(gsub(".dat", "",
myfiles, fixed=TRUE),
"-"))
dates <- matrix(as.numeric(dates), ncol=2)
myfilesContent[order(dates[,1], dates[,2])]
Sorting list of list of elements of a custom class in R?
This answer from Aaron demonstrates, exactly, what is needed to apply a customized sort
on a class
ed object. As Roland notes, you -actually- need to sort
"L" and, thus, that is where the focus on custom sort
should be. To provide flexibility specifying on which index of "L" 's elements to sort
, a way would be to store an extra attr
on "L":
Turn "L" to an appropriate object:
class(L) = "myclass"
attr(L, "sort_ind") = 1L
Ops
methods need to be defined (extract the relevant element of your data):
"<.myclass" = function(x, y)
{
i = attr(x, "sort_ind") ## also check if 'x' and 'y' have the same 'attr(, "sort_ind")'
x[[1]][[i]] < y[[1]][[i]]
}
"==.myclass" = function(x, y)
{
i = attr(x, "sort_ind")
x[[1]][[i]] == y[[1]][[i]]
}
">.myclass" = function(x, y)
{
i = attr(x, "sort_ind")
x[[1]][[i]] > y[[1]][[i]]
}
And a subset method:
"[.myclass" = function(x, i)
{
y = .subset(x, i)
attributes(y) = attributes(x)
return(y)
}
The above methods are necessary (perhaps, except "<"
) to be defined since a call to sort
/order
will end up calling rank
which needs .gt
in order to subset accordingly each element and compare.
Finally, a get/set function for sauce:
sort_ind = function(x) attr(x, "sort_ind")
"sort_ind<-" = function(x, value)
{
attr(x, "sort_ind") = value
return(x)
}
And:
order(L)
#[1] 3 2 1
sort_ind(L) = 3
order(L)
#[1] 2 3 1
A method for sort
can be, also, created to wrap all the above:
sort.myclass = function(x, sort_ind = attr(x, "sort_ind"), ...)
{
sort_ind(x) = sort_ind
NextMethod()
}
sort(L)
sort(L, sort_ind = 1)
(I assumed that your toList
function would look like something toList = function(x) x[[1L]]
)
How to order a list by a custom function, discarding duplicates?
Adding another alternative, for completeness, regarding the "custom sort"/"custom unique" part of the question. By defining methods for certain functions (as seen in ?xtfrm
) we can apply custom sort
and unique
functions to any list (or other object).
First, a "class" attribute needs to be added:
class(thresholds) = "thresholds"
Then, define the necessary custom functions:
"==.thresholds" = function(x, y) return(x[[1]][["value"]] == y[[1]][["value"]])
">.thresholds" = function(x, y) return(x[[1]][["value"]] > y[[1]][["value"]])
"[.thresholds" = function(x, i) return(structure(.subset(x, i), class = class(x)))
is.na.thresholds = function(x) return(is.na(x[[1]][["value"]]))
Now, we can apply sort
:
sort(thresholds)
Finally, add a custom unique
function:
duplicated.thresholds = function(x, ...) return(duplicated(sapply(x, function(elt) elt[["value"]])))
unique.thresholds = function(x, ...) return(x[!duplicated((x))])
And:
sort(unique(thresholds))
(Similar answers and more information here and here)
How to sort by Library of Congress Classification (LCC) number in R
mixedsort
from the gtools
package (part of standard R) turns out to do just the trick:
library(gtools)
call_numbers <- c("QA 7 H3 1992", "QA 76.73 R3 W53 2015", "QA 90 H33 2016", "QA 276.45 R3 A35 2010")
mixedsort(call_numbers)
## [1] "QA 7 H3 1992" "QA 76.73 R3 W53 2015" "QA 90 H33 2016" "QA 276.45 R3 A35 2010"
Further, mixedorder
can be used to sort a data frame by one column.
This is a special case of what was answered earlier in How to sort a character vector where elements contain letters and numbers in R?
Group data frame by elements from a variable containing lists of elements
We can use simple base R solution with table
to calculate the frequency after unlist
ing the list
and then create a data.table
based on that table object
tbl <- table(unlist(df$y))
data.frame(group = names(tbl), n = as.vector(tbl))
# group n
#1 A 2
#2 B 2
#3 C 2
#4 D 1
#5 E 1
Or another option with tidyverse
library(dplyr)
library(tidyr)
unnest(df) %>%
group_by(group = y) %>%
summarise(n=n())
# <chr> <int>
#1 A 2
#2 B 2
#3 C 2
#4 D 1
#5 E 1
Or as @alexis_laz mentioned in the comments, an alternative is as.data.frame.table
as.data.frame(table(group = unlist(df$y)), responseName = "n")
R - find rows with at least n distinct elements
Try any of these:
nuniq <- function(x) length(unique(x))
subset(dd, apply(dd, 1, nuniq) >= 2)
subset(dd, apply(dd, 1, sd) > 0)
subset(dd, apply(dd[-1] != dd[[1]], 1, any))
subset(dd, rowSums(dd[-1] != dd[[1]]) > 0)
subset(dd, lengths(lapply(as.data.frame(t(dd)), unique)) >= 2)
subset(dd, lengths(apply(dd, 1, table)) >= 2)
# nuniq is from above
subset(dd, tapply(as.matrix(dd), row(dd), nuniq) >= 2)
giving:
col.1 col.2 col.3 col.4
1 0 0 1 0
2 0 2 2 1
5 0 1 1 1
Alternatives to nuniq
In the above nuniq
could be replaced with any of these:
function(x) nlevels(factor(x))
function(x) sum(!duplicated(x))
funtion(x) length(table(x))
dplyr::n_distinct
Note
dd
in reproducible form is:
dd <- structure(list(col.1 = c(0L, 0L, 2L, 0L, 0L), col.2 = c(0L, 2L,
2L, 0L, 1L), col.3 = c(1L, 2L, 2L, 0L, 1L), col.4 = c(0L, 1L,
2L, 0L, 1L)), class = "data.frame", row.names = c(NA, -5L))
Sort columns by column sums, identical columns adjacent
Here's a bit of a strange solution. You can secondarily order by a collapsed string representation of the columns, which will serve as a tiebreaker for column sets that have equal colSums()
. This will ensure that identical columns are clustered together, as they will lexicographically sort next to each other.
dat[,order(decreasing=T,colSums(dat,na.rm=T),apply(dat,2L,paste,collapse=''))];
## var2 var1 var5 var3 var4
## [1,] 1 1 1 0 0
## [2,] 1 0 1 0 0
## [3,] 1 1 0 1 1
## [4,] 1 1 0 1 1
## [5,] 1 0 0 0 0
Related Topics
Got Message Unable to Load Shared Object Stats.So When R Starts
R Optimization with Equality and Inequality Constraints
Merge Overlapping Ranges into Unique Groups, in Dataframe
Consolidating Data Frames in R
Subset Data.Table by Logical Column
Can't Run Rcpp Function in Foreach - "Null Value Passed as Symbol Address"
Add Row in Each Group Using Dplyr and Add_Row()
How to Turn Gpclibpermit() to True
How to Jitter Both Geom_Line and Geom_Point by the Same Magnitude
How to Write Contents of Help to a File from Within R
Color Points with the Color as a Column in Ggplot2
How to Italicize One Category in a Legend in Ggplot2
R - Ggplot Line Color (Using Geom_Line) Doesn't Change
R: Find Vector in List of Vectors
Dplyr/Rlang: Parse_Expr with Multiple Expressions
Error in File(File, "Rt"):Invalid 'Description' Argument in Complete.Cases Program