Order a "Mixed" Vector (Numbers With Letters)

Order a mixed vector (numbers with letters)

> library(gtools)
> mixedsort(alph)

[1] "7" "8" "9" "10a" "10b" "10c" "11a" "11b" "11c" "12"

To sort a data.frame you use mixedorder instead

> mydf <- data.frame(alph, USArrests[seq_along(alph),])
> mydf[mixedorder(mydf$alph),]

alph Murder Assault UrbanPop Rape
Alabama 7 13.2 236 58 21.2
California 8 9.0 276 91 40.6
Colorado 9 7.9 204 78 38.7
Alaska 10a 10.0 263 48 44.5
Arizona 10b 8.1 294 80 31.0
Arkansas 10c 8.8 190 50 19.5
Florida 11a 15.4 335 80 31.9
Delaware 11b 5.9 238 72 15.8
Connecticut 11c 3.3 110 77 11.1
Georgia 12 17.4 211 60 25.8

mixedorder on multiple vectors (columns)

Apparently mixedorder cannot handle multiple vectors. I have made a function that circumvents this by converting all character vectors to factors with mixedsorted sorted levels, and pass all vectors on to the standard order function.

multi.mixedorder <- function(..., na.last = TRUE, decreasing = FALSE){
do.call(order, c(
lapply(list(...), function(l){
if(is.character(l)){
factor(l, levels=mixedsort(unique(l)))
} else {
l
}
}),
list(na.last = na.last, decreasing = decreasing)
))
}

However, in your particular case multi.mixedorder gets you the same result as the standard order, since V2 is numeric.

df <- data.frame(
V1 = c("A","A","B","B","C","C","D","D","E","E"),
V2 = 19:10,
V3 = alph,
stringsAsFactors = FALSE)

df[multi.mixedorder(df$V2, df$V3),]

V1 V2 V3
10 E 10 12
9 E 11 11a
8 D 12 11b
7 D 13 11c
6 C 14 9
5 C 15 8
4 B 16 10c
3 B 17 10b
2 A 18 10a
1 A 19 7

Notice that

  • 19:10 is equivalent to c(19:10). c means concat, that is to make one long vector out of many short, but in you case you only have one vector (19:10) so there's no need to concat anything. However, in the case of V1 you have 10 vectors of length 1, so there you need to concat, as you already do.
  • You need stringsAsFactors=FALSE to not convert V1 and V3 to (incorrectly sorted) factors (which is default).

How to control the order of a variable mixed with string and numbers in R

Here's one possible way within dplyr -

df %>% 
arrange(nchar(x), x)

x y
1 S1 a
2 S2 b
3 S3 c
4 S4 d
5 S5 e
6 S6 f
7 S7 g
8 S8 h
9 S9 i
10 S10 j
11 S11 k
12 S12 l
13 S13 m
14 S14 n
15 S15 o

Order vector in R: Letter with number sorts funny

We can use mixed_sort from gtools. According to ?mixed_sort

These functions sort or order character strings containing embedded numbers so that the numbers are numerically sorted rather than sorted by character value.

library(gtools)
mixedsort(v1)
#[1] "r_1" "r_2" "r_10"

The reason for the sort is that it is not a numeric vector. So, sorting happen

data

v1 <- c("r_1", "r_2", "r_10")

How to do a sort of mixed values in R

It's slightly ugly, but you could just split the data frame in two using filter statements, arrange each section individually, and then bind them back together:

df <- bind_rows(df %>%
filter(!is.na(as.numeric(level))) %>%
arrange(variable, as.numeric(level)),
df %>%
filter(is.na(as.numeric(level))) %>%
arrange(variable, level))

Gives you:

# A tibble: 6 x 2
variable level
<chr> <chr>
1 comp_ded 500
2 comp_ded 750
3 comp_ded 1000
4 channel DIR
5 channel EA
6 channel IA

Sort a dataframe based on a character column containing letters followed by numbers in R

You can try using something like this that does numeric day sorting:

Day <- c("Day1","Day20","Day5","Day10")
A <- c (5,7,2,0)
B <- c(15,12,16,30)
df <- data.frame(Day,A,B, stringsAsFactors = FALSE)

df$DayNum <- as.numeric(gsub('Day', '', df$Day))
df <- df[order(df$DayNum), ]

Output as follows:

df
Day A B DayNum
1 Day1 5 15 1
3 Day5 2 16 5
4 Day10 0 30 10
2 Day20 7 12 20

You can avoid creating a new column by doing the following (was trying to show full detail of what was going on):

df <- df[order(as.numeric(substr(df$Day, 4, nchar(df$Day)))), ]

Output will be same as above.



Related Topics



Leave a reply



Submit