How to sort a character vector according to a specific order?
x <- c("white","white","blue","green","red","blue","red")
y <- c("red","white","blue","green")
x[order(match(x, y))]
# [1] "red" "red" "white" "white" "blue" "blue" "green"
Sort a vector of strings based on a specified order
You can do this by making y an ordered factor and then simply sorting.
x <- c("green", "red", "orange", "blue", "yellow")
set.seed(1066)
y = factor(sample(x, 5, replace=T), levels=x, ordered=T)
y
[1] red blue blue red green
Levels: green < red < orange < blue < yellow
sort(y)
[1] green red red blue blue
Levels: green < red < orange < blue < yellow
How to order a character vector according to a second character vector made up of substrings of the first?
You can use grep
in combination with sapply
. But it will only work when there is no overlap in y
. It will only return hits between x
and y
. With ^
you say that it need to be at the begin. value = TRUE
says that it should return the string where it has a hit.
unlist(sapply(paste0("^",y), grep, x, value = TRUE))
# ^r1 ^r2 ^white1 ^white2 ^bl1 ^bl2 ^gree
# "red" "red" "white" "white" "blue" "blue" "green"
The following will also work with an overlap in y and takes the first hit.
x <- c(x, "redd"); y <- c(y, "redd")
x[unique(unlist(sapply(paste0("^",y), grep, x)))]
#[1] "red" "red" "redd" "white" "white" "blue" "blue" "green"
or get the last hit:
x[unique(unlist(sapply(paste0("^",y), grep, x)), fromLast = TRUE)]
[1] "red" "red" "white" "white" "blue" "blue" "green" "redd"
To get all x and place the no-match and the end you can use:
x <- c(x, "yellow")
x[unique(c(unlist(sapply(paste0("^",y), grep, x)), seq_along(x)))]
[1] "red" "red" "redd" "white" "white" "blue" "blue" "green"
[9] "yellow"
Order data frame rows according to vector with specific order
Try match
:
df <- data.frame(name=letters[1:4], value=c(rep(TRUE, 2), rep(FALSE, 2)))
target <- c("b", "c", "a", "d")
df[match(target, df$name),]
name value
2 b TRUE
3 c FALSE
1 a TRUE
4 d FALSE
It will work as long as your target
contains exactly the same elements as df$name
, and neither contain duplicate values.
From ?match
:
match returns a vector of the positions of (first) matches of its first argument
in its second.
Therefore match
finds the row numbers that matches target
's elements, and then we return df
in that order.
How to sort a vector according to a given sequence in R
Here's another option:
dat_value[match(rank(given_seq, ties = "random"), rank(dat_seq, ties = "random"))]
# [1] 0.7383247 0.5757814 -0.8204684 1.5952808 0.4874291 0.3295078
First we convert the two sequences into ones that have no repetitive elements; e.g.,
rank(given_seq, ties = "random")
# [1] 3 5 6 1 2 4
That is, if two entries of given_seq
are, say, (1,1), then they will randomly be converted into (1,2) or (2,1). The same is done with dat_seq
and, consequently, we can match them and reorder dat_value
accordingly. Thus, this kind of method would give you some randomization, which may or may not be something desirable in your application.
What are the R sorting rules of character vectors?
Details:
for sort()
states:
The sort order for character vectors will depend on the collating
sequence of the locale in use: see ‘Comparison’. The sort order
for factors is the order of their levels (which is particularly
appropriate for ordered factors).
and help(Comparison)
then shows:
Comparison of strings in character vectors is lexicographicwithin
the strings using the collating sequence of the locale in use:see
‘locales’. The collating sequence of locales such as ‘en_US’ is
normally different from ‘C’ (which should use ASCII) and can be
surprising. Beware of making _any_ assumptions about the
collation order: e.g. in Estonian ‘Z’ comes between ‘S’ and ‘T’,
and collation is not necessarily character-by-character - in
Danish ‘aa’ sorts as a single letter, after ‘z’. In Welsh ‘ng’
may or may not be a single sorting unit: if it is it follows ‘g’.
Some platforms may not respect the locale and always sort in
numerical order of the bytes in an 8-bit locale, or in Unicode
point order for a UTF-8 locale (and may not sort in the same order
for the same language in different character sets). Collation of
non-letters (spaces, punctuation signs, hyphens, fractions and so
on) is even more problematic.
so it depends on your locale setting.
R custom ordering of character vector by matching the first character
You can use sub
to remove p or q and everything afterwards and then use match
and order
.
test[order(match(sub("[pq].*", "", test), order_custom))]
#[1] "Xpsomethingelse" "3qsometext" "22qsomeothertext"
Related Topics
Different Robust Standard Errors of Logit Regression in Stata and R
Ggplot2: Splitting Facet/Strip Text into Two Lines
Add a New Column Between Other Dataframe Columns
First Day of the Month from a Posixct Date Time Using Lubridate
How to Have Na's Displayed First Using Arrange()
Applying a Function to Each Row of a Data.Table
R Programming: Cache the Inverse of a Matrix
Aggregating All Unique Values of Each Column of Data Frame
How to Calculate Any Negative Number to the Power of Some Fraction in R
How to Read the Source Code for an R Function
Two Horizontal Bar Charts with Shared Axis in Ggplot2 (Similar to Population Pyramid)
Replace Specific Values Based on Another Dataframe
How to Sort a Character Vector According to a Specific Order
Using R to Download Newest Files from Ftp-Server
Numbers as Column Names of Data Frames
All Possible Combinations of a Set That Sum to a Target Value