How to Sort a Character Vector Where Elements Contain Letters and Numbers

sort strings by numbers inside them

You could do something like this:

no <- gsub("[^0-9]", "", all.files) # remove everything except 0-9
no <- as.numeric(no)
all.files[order(no)] # sort by numeric component

Order a mixed vector (numbers with letters)

> library(gtools)
> mixedsort(alph)

[1] "7" "8" "9" "10a" "10b" "10c" "11a" "11b" "11c" "12"

To sort a data.frame you use mixedorder instead

> mydf <- data.frame(alph, USArrests[seq_along(alph),])
> mydf[mixedorder(mydf$alph),]

alph Murder Assault UrbanPop Rape
Alabama 7 13.2 236 58 21.2
California 8 9.0 276 91 40.6
Colorado 9 7.9 204 78 38.7
Alaska 10a 10.0 263 48 44.5
Arizona 10b 8.1 294 80 31.0
Arkansas 10c 8.8 190 50 19.5
Florida 11a 15.4 335 80 31.9
Delaware 11b 5.9 238 72 15.8
Connecticut 11c 3.3 110 77 11.1
Georgia 12 17.4 211 60 25.8

mixedorder on multiple vectors (columns)

Apparently mixedorder cannot handle multiple vectors. I have made a function that circumvents this by converting all character vectors to factors with mixedsorted sorted levels, and pass all vectors on to the standard order function.

multi.mixedorder <- function(..., na.last = TRUE, decreasing = FALSE){
do.call(order, c(
lapply(list(...), function(l){
if(is.character(l)){
factor(l, levels=mixedsort(unique(l)))
} else {
l
}
}),
list(na.last = na.last, decreasing = decreasing)
))
}

However, in your particular case multi.mixedorder gets you the same result as the standard order, since V2 is numeric.

df <- data.frame(
V1 = c("A","A","B","B","C","C","D","D","E","E"),
V2 = 19:10,
V3 = alph,
stringsAsFactors = FALSE)

df[multi.mixedorder(df$V2, df$V3),]

V1 V2 V3
10 E 10 12
9 E 11 11a
8 D 12 11b
7 D 13 11c
6 C 14 9
5 C 15 8
4 B 16 10c
3 B 17 10b
2 A 18 10a
1 A 19 7

Notice that

  • 19:10 is equivalent to c(19:10). c means concat, that is to make one long vector out of many short, but in you case you only have one vector (19:10) so there's no need to concat anything. However, in the case of V1 you have 10 vectors of length 1, so there you need to concat, as you already do.
  • You need stringsAsFactors=FALSE to not convert V1 and V3 to (incorrectly sorted) factors (which is default).

Order vector in R: Letter with number sorts funny

We can use mixed_sort from gtools. According to ?mixed_sort

These functions sort or order character strings containing embedded numbers so that the numbers are numerically sorted rather than sorted by character value.

library(gtools)
mixedsort(v1)
#[1] "r_1" "r_2" "r_10"

The reason for the sort is that it is not a numeric vector. So, sorting happen

data

v1 <- c("r_1", "r_2", "r_10")

R: order a vector of strings with both character and numeric values both alphabetically and numerically

EDIT completely change the solution after OP clarification

You can extract the last 3 elements and order, and you create a data.frame:

dat = read.table(text=sub('.*:1:([0-9]+):([0-9]+):([0-9]+)','\\1|\\2|\\3',a),sep='|')
dat
V1 V2 V3
1 1102 14591 91480
2 1102 14592 3881
3 1102 14592 37103
4 1102 14592 37356

Then you order using 3 columns:

 a[with(dat,order(V1,V2,V3))]
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"

Sort vector with character and number in R

We set the names of 'm' and 'n' with '0' and '1', concatenate it to a single vector, and order by extracting the numeric part in the new vector (using gsub) convert to numeric, order it and use that index to order the 'n1'.

n1 <- c(setNames(m, rep(0, length(m))),setNames(n, rep(1, length(n))))
r1 <- n1[order(as.numeric(gsub("\\D+", "", n1)))]

as.vector(r1)
#[1] "1" "2" "<4" "<4" "5" "7" "8" "<12" "15" "17" "18" "20"
#[13] "<21" "<25" "25" "27" "34" "<35" "40" "43"

as.integer(names(r1))
#[1] 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 1 1

How to sort a character vector first by letter and then by number?

Find the strings not staring with a number. Create a logical index:

idx <- grepl("^[^0-9]", sample.condition)

Use this index for subsetting and sort the subsets. Then, combine both sorted subsets:

c(sort(sample.condition[idx]), sort(sample.condition[!idx]))

# [1] "GFP_t" "GFP_t" "1_t" "1_t" "2_t" "2_t" "3_t" "3_t"
# [9] "4_t" "4_t" "5_t" "5_t" "6_t" "6_t" "7_t" "7_t"

Is there a way to sort a string that has numerics in R?

One option is mixedsort

library(gtools)
mixedsort(x)
#[1] "1:A" "2:A" "201:A"

Or remove the non-numeric characters with gsub and order

x[order(as.numeric(gsub("\\D+", "", x)))]
#[1] "1:A" "2:A" "201:A"


Related Topics



Leave a reply



Submit