sort strings by numbers inside them
You could do something like this:
no <- gsub("[^0-9]", "", all.files) # remove everything except 0-9
no <- as.numeric(no)
all.files[order(no)] # sort by numeric component
Order a mixed vector (numbers with letters)
> library(gtools)
> mixedsort(alph)
[1] "7" "8" "9" "10a" "10b" "10c" "11a" "11b" "11c" "12"
To sort a data.frame you use mixedorder
instead
> mydf <- data.frame(alph, USArrests[seq_along(alph),])
> mydf[mixedorder(mydf$alph),]
alph Murder Assault UrbanPop Rape
Alabama 7 13.2 236 58 21.2
California 8 9.0 276 91 40.6
Colorado 9 7.9 204 78 38.7
Alaska 10a 10.0 263 48 44.5
Arizona 10b 8.1 294 80 31.0
Arkansas 10c 8.8 190 50 19.5
Florida 11a 15.4 335 80 31.9
Delaware 11b 5.9 238 72 15.8
Connecticut 11c 3.3 110 77 11.1
Georgia 12 17.4 211 60 25.8
mixedorder
on multiple vectors (columns)
Apparently mixedorder
cannot handle multiple vectors. I have made a function that circumvents this by converting all character vectors to factors with mixedsorted sorted levels, and pass all vectors on to the standard order
function.
multi.mixedorder <- function(..., na.last = TRUE, decreasing = FALSE){
do.call(order, c(
lapply(list(...), function(l){
if(is.character(l)){
factor(l, levels=mixedsort(unique(l)))
} else {
l
}
}),
list(na.last = na.last, decreasing = decreasing)
))
}
However, in your particular case multi.mixedorder
gets you the same result as the standard order
, since V2
is numeric.
df <- data.frame(
V1 = c("A","A","B","B","C","C","D","D","E","E"),
V2 = 19:10,
V3 = alph,
stringsAsFactors = FALSE)
df[multi.mixedorder(df$V2, df$V3),]
V1 V2 V3
10 E 10 12
9 E 11 11a
8 D 12 11b
7 D 13 11c
6 C 14 9
5 C 15 8
4 B 16 10c
3 B 17 10b
2 A 18 10a
1 A 19 7
Notice that
19:10
is equivalent toc(19:10)
.c
means concat, that is to make one long vector out of many short, but in you case you only have one vector (19:10
) so there's no need to concat anything. However, in the case ofV1
you have 10 vectors of length 1, so there you need to concat, as you already do.- You need
stringsAsFactors=FALSE
to not convertV1
andV3
to (incorrectly sorted) factors (which is default).
Order vector in R: Letter with number sorts funny
We can use mixed_sort
from gtools
. According to ?mixed_sort
These functions sort or order character strings containing embedded numbers so that the numbers are numerically sorted rather than sorted by character value.
library(gtools)
mixedsort(v1)
#[1] "r_1" "r_2" "r_10"
The reason for the sort is that it is not a numeric vector
. So, sorting happen
data
v1 <- c("r_1", "r_2", "r_10")
R: order a vector of strings with both character and numeric values both alphabetically and numerically
EDIT completely change the solution after OP clarification
You can extract the last 3 elements and order, and you create a data.frame:
dat = read.table(text=sub('.*:1:([0-9]+):([0-9]+):([0-9]+)','\\1|\\2|\\3',a),sep='|')
dat
V1 V2 V3
1 1102 14591 91480
2 1102 14592 3881
3 1102 14592 37103
4 1102 14592 37356
Then you order using 3 columns:
a[with(dat,order(V1,V2,V3))]
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"
Sort vector with character and number in R
We set the names
of 'm' and 'n' with '0' and '1', concatenate it to a single vector
, and order
by extracting the numeric part in the new vector (using gsub
) convert to numeric
, order
it and use that index to order the 'n1'.
n1 <- c(setNames(m, rep(0, length(m))),setNames(n, rep(1, length(n))))
r1 <- n1[order(as.numeric(gsub("\\D+", "", n1)))]
as.vector(r1)
#[1] "1" "2" "<4" "<4" "5" "7" "8" "<12" "15" "17" "18" "20"
#[13] "<21" "<25" "25" "27" "34" "<35" "40" "43"
as.integer(names(r1))
#[1] 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 1 1
How to sort a character vector first by letter and then by number?
Find the strings not staring with a number. Create a logical index:
idx <- grepl("^[^0-9]", sample.condition)
Use this index for subsetting and sort the subsets. Then, combine both sorted subsets:
c(sort(sample.condition[idx]), sort(sample.condition[!idx]))
# [1] "GFP_t" "GFP_t" "1_t" "1_t" "2_t" "2_t" "3_t" "3_t"
# [9] "4_t" "4_t" "5_t" "5_t" "6_t" "6_t" "7_t" "7_t"
Is there a way to sort a string that has numerics in R?
One option is mixedsort
library(gtools)
mixedsort(x)
#[1] "1:A" "2:A" "201:A"
Or remove the non-numeric characters with gsub
and order
x[order(as.numeric(gsub("\\D+", "", x)))]
#[1] "1:A" "2:A" "201:A"
Related Topics
How to Drop Columns by Name in a Data Frame
Plot Multiple Boxplot in One Graph
Reorder Levels of a Factor Without Changing Order of Values
Generate List of All Possible Combinations of Elements of Vector
Add a Common Legend For Combined Ggplots
Find Indices of Duplicated Rows
Create Stacked Barplot Where Each Stack Is Scaled to Sum to 100%
Looping Over a Date or Posixct Object Results in a Numeric Iterator
Specify Custom Date Format For Colclasses Argument in Read.Table/Read.Csv
How to Spread Repeated Measures of Multiple Variables into Wide Format
Lm' Summary Not Display All Factor Levels
Numeric Comparison Difficulty in R
Global and Local Variables in R
Axis Labels on Two Lines With Nested X Variables (Year Below Months)