Is it possible to sort a vector of alphanumeric values using lexical ordering in R?
You could look at the code for mixedsort
and type it into R yourself. Then you would have the function without installing an additional package.
Or you can use the order
function after splitting the character strings into their pieces:
1 <- c('p 1', 'q 2','p 2','p 11', 'p 10')
sort(v1)
tmp <- strsplit(v1, ' +')
tmp1 <- sapply(tmp, '[[', 1)
tmp2 <- as.numeric(sapply(tmp, '[[', 2))
v1[ order( tmp1, tmp2 ) ]
Or you can automate this by writing a method for xtfrm
and giving your vector the appropriate class:
xtfrm.mixed <- function(x) {
tmp <- strsplit(x, ' +')
tmp1 <- sapply(tmp, '[[', 1)
tmp2 <- as.numeric(sapply(tmp, '[[', 2))
tmp3 <- rank(tmp1, ties.method='min')
tmp4 <- rank(tmp2, ties.method='min')
tmp3+tmp4/(max(tmp4)+1)
}
class(v1) <- 'mixed'
sort(v1)
If all of your data starts with "p " then you could just strip that off and coerce to numeric and use in order
.
How to perform natural (lexicographic) sorting in R?
I don't think "alphanumeric sort" means what you think it means.
In any case, looks like you want mixedsort, part of gtools.
> install.packages('gtools')
[...]
> require('gtools')
Loading required package: gtools
> n
[1] "abc21" "abc2" "abc1" "abc01" "abc4" "abc201" "1b" "1a"
> mixedsort(n)
[1] "1a" "1b" "abc1" "abc01" "abc2" "abc4" "abc21" "abc201"
How to sort an alphanumeric character object?
as.numeric
on the result of subbing out everything up to the last decimal point:
> tt[ order( as.numeric( sub("^.+\\.", "", tt) ) ) ]
[1] "/PATH.to.FILES/AA.22.1 " "/PATH.to.FILES/AA.22.2 "
[3] "/PATH.to.FILES/AA.22.3 " "/PATH.to.FILES/AA.22.4 "
[5] "/PATH.to.FILES/AA.22.5 " "/PATH.to.FILES/AA.22.6 "
[7] "/PATH.to.FILES/AA.22.7 " "/PATH.to.FILES/AA.22.8 "
[9] "/PATH.to.FILES/AA.22.9" "/PATH.to.FILES/AA.22.10"
[11] "/PATH.to.FILES/AA.22.11" "/PATH.to.FILES/AA.22.12"
[13] "/PATH.to.FILES/AA.22.13"
If you wanted to match the second to last item in strings separated by dots it would be bit more complicated. I've illustrated one possible approach for matching "digit" characters prior to removing 'dot'[alpha] endings.
sub("(^.+\\.)(\\d+)(\\.[A-Z]+$)", "\\2", "AA.BB.$i.2.CC")
[1] "2"
You need to look up ?regex
.
How can I use the row.names attribute to order the rows of my dataframe in R?
This worked for me:
new_df <- df[ order(row.names(df)), ]
Setting levels when creating a factor vs. `levels()-`
F1
uses numeric sorting, as you figured out yourself.
F2
uses lexicographic sorting, first comparing the first character, breaking ties using the second, and so on, which is why "10 years"
is between "1 years"
and "2 years"
.
F4
is created from a character vector, but with an explicit list of possible factors. So that list is taken (without sorting) and identified with the numbers 1 through 6. Then every item of your input is compared against the set of possible levels, and the associated number is stored. After all, a factor is simply a bunch of numbers (as.numeric
will show them to you) associated with a list of levels used for printing. So F4
gets printed just like F2
, but its levels are sorted differently.
F3
was created from F2, so its levels were unsorted initially. The assignment only replaces the set of level names, not the numbers in the vector. So you can think of this as renaming existing levels. If you look at the numbers, they will match those from F2
, whereas the names associated, and the order of names in particular, matches that from F4
.
As your question claims that this was not purely a relabel: yes, it is a pure relabel, you obtain F3
from F2
using the following changes (in both rows of the printout):
- 10 → 2
- 2 → 3
- 20 → 10
- 25 → 20
- 3 → 25
The str
function is also a good tool to look at the internal representation of a factor.
What is lexicographical order?
lexicographical order is alphabetical order. The other type is numerical ordering. Consider the following values,
1, 10, 2
Those values are in lexicographical order. 10 comes after 2 in numerical order, but 10 comes before 2 in "alphabetical" order.
R - Can you compare which value is first in alphabetical order?
The built-in comparison operators work fine on strings.
x < y
[1] TRUE
y < x
[1] FALSE
Note the details in the help page ?Comparison
, or perhaps more intuitively, ?`<`
, especially the importances of locale:
Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use [...]
Beware of making any assumptions about the collation order
Related Topics
Change Standard Error Color for Geom_Smooth
Using Italic() with a Variable in Ggplot2 Title Expression
R: in Barplot Midpoints Are Not Centered W.R.T. Bars
Adding Manual Legend in Ggplot
How to Format the X-Axis of the Hard Coded Plotting Function of Spei Package in R
How to Check If Multiple Strings Exist in Another String
How to Force the X-Axis Tick Marks to Appear at the End of Bar in Heatmap Graph
Align Points and Error Bars in Ggplot When Using 'Jitterdodge'
R - Converting Posixct to Milliseconds
How to Use Geom_Rect with Discrete Axis Values
How to Plot Charts with Nested Categories Axes
Backports 1.1.1 Package Fails to Install
Finding Number of Elements in One Vector That Are Less Than an Element in Another Vector
Calculate Row Means Based on (Partial) Matching Column Names
Check Which Elements of a Vector Is Between the Elements of Another One in R
Removing Row with Duplicated Values in All Columns of a Data Frame (R)
How to Read Large Numbers Precisely in R and Perform Arithmetic on Them