How to Sort a Data Frame by Alphabetic Order of a Character Variable in R

How to sort a data frame by alphabetic order of a character variable in R?

Well, I've got no problem here :

df <- data.frame(v=1:5, x=sample(LETTERS[1:5],5))
df

# v x
# 1 1 D
# 2 2 A
# 3 3 B
# 4 4 C
# 5 5 E

df <- df[order(df$x),]
df

# v x
# 2 2 A
# 3 3 B
# 4 4 C
# 1 1 D
# 5 5 E

order a data.table based on a character column with a specific (not alphabetical) order in mind

One possibility is to join on the preferred order:

DT[preferred.order, on="x"]
x y
1: b 2
2: a 3
3: c 1

Note that this requires the preferred.order vector contains all elements in DT$x and has no duplicates.

As an alternative, you could create a factor variable of DT$x with the preferred ordering and then use setorder to order DT by reference.

DT[, xFac := factor(x, levels=preferred.order)]
setorder(DT, xFac)

which returns

DT
x y xFac
1: b 2 b
2: a 3 a
3: c 1 c

Which method is preferable will vary on the use-case.

Sort column in R: strings first (alphabetically), then numbers (numerically)

For a base R option:

df <- data.frame(Col2=c("100", "B", "A", "Z", "10", "4"), stringsAsFactors=FALSE)
df[order(grepl("^\\d+$", df$Col2), sprintf("%10s", df$Col2)), ]

[1] "A" "B" "Z" "4" "10" "100"

The two sorting levels here first place letters before numbers. The second sorting level left pads everything to 10 characters with zeroes. Then it sorts ascending. This is effectively an ascending numeric sort for the numbers. The trick here is to realize that number strings actually do sort correctly as text if they all have the same width.

dply: order columns alphabetically in R

Try this

df %>% select(noquote(order(colnames(df))))

or just

df[,order(colnames(df))]

Update Dec 2021

New versions of dplyr (>= 1.0.7) work without the noquote:

df %>% select(order(colnames(df)))

Order data frame rows according to vector with specific order

Try match:

df <- data.frame(name=letters[1:4], value=c(rep(TRUE, 2), rep(FALSE, 2)))
target <- c("b", "c", "a", "d")
df[match(target, df$name),]

name value
2 b TRUE
3 c FALSE
1 a TRUE
4 d FALSE

It will work as long as your target contains exactly the same elements as df$name, and neither contain duplicate values.

From ?match:

match returns a vector of the positions of (first) matches of its first argument 
in its second.

Therefore match finds the row numbers that matches target's elements, and then we return df in that order.

How can I sort a dataframe by a predetermined order of factor levels in R?

We can specify the levels of the 'group' as category_order and that use that to `arrange

library(dplyr)
df1 <- df %>%
arrange(factor(group, levels = category_order))
df1
# group value
#1 tree 50
#2 house 2
#3 lake 1
#4 human 5

Or using fct_relevel

library(forcats)
df %>%
arrange(fct_relevel(group, category_order))

Sorting multiple columns by first letter and by numbers in R

Another method using dplyr:

library(dplyr)
arrange(df, sub('_.+$', '', item), mean)

an alternative would be to use str_extract from stringr to extract only the first letter from item:

library(stringr)
arrange(df, str_extract(item, '^._'), mean)

Result:

  item mean
1 a_c 2
2 a_a 4
3 a_b 5
4 b_e 1
5 b_f 3
6 b_d 7

Data:

df <- structure(list(item = c("a_b", "a_c", "a_a", "b_d", "b_f", "b_e"
), mean = c(5L, 2L, 4L, 7L, 3L, 1L)), .Names = c("item", "mean"
), class = "data.frame", row.names = c(NA, -6L))

Notes:

  • sub('_.+$', '', item) creates a temporary variable by removing _ and everything after that from item. _.+$ matches a literal underscore (_) followed by any character one or more times (.+) at the end of the string ($).

  • str_extract(item, '^._') creates a temporary variable by extracting any one character (.) followed by a literal underscore (_) in the beginning of the string (^)

  • The neat thing about dplyr::arrange is that you can create a temporary sorting variable within the function and not have it included in the output.



Related Topics



Leave a reply



Submit