﻿ How to Sort a Data Frame by Alphabetic Order of a Character Variable in R - ITCodar

How to Sort a Data Frame by Alphabetic Order of a Character Variable in R

How to sort a data frame by alphabetic order of a character variable in R?

Well, I've got no problem here :

``df <- data.frame(v=1:5, x=sample(LETTERS[1:5],5))df#   v x# 1 1 D# 2 2 A# 3 3 B# 4 4 C# 5 5 Edf <- df[order(df\$x),]df#   v x# 2 2 A# 3 3 B# 4 4 C# 1 1 D# 5 5 E``

order a data.table based on a character column with a specific (not alphabetical) order in mind

One possibility is to join on the preferred order:

``DT[preferred.order, on="x"]   x y1: b 22: a 33: c 1``

Note that this requires the preferred.order vector contains all elements in `DT\$x` and has no duplicates.

As an alternative, you could create a factor variable of `DT\$x` with the preferred ordering and then use `setorder` to order DT by reference.

``DT[, xFac := factor(x, levels=preferred.order)]setorder(DT, xFac)``

which returns

``DT   x y xFac1: b 2    b2: a 3    a3: c 1    c``

Which method is preferable will vary on the use-case.

Sort column in R: strings first (alphabetically), then numbers (numerically)

For a base R option:

``df <- data.frame(Col2=c("100", "B", "A", "Z", "10", "4"), stringsAsFactors=FALSE)df[order(grepl("^\\d+\$", df\$Col2), sprintf("%10s", df\$Col2)), ][1] "A"   "B"   "Z"   "4"   "10"  "100"``

The two sorting levels here first place letters before numbers. The second sorting level left pads everything to 10 characters with zeroes. Then it sorts ascending. This is effectively an ascending numeric sort for the numbers. The trick here is to realize that number strings actually do sort correctly as text if they all have the same width.

dply: order columns alphabetically in R

Try this

``df %>% select(noquote(order(colnames(df))))``

or just

``df[,order(colnames(df))]``

Update Dec 2021

New versions of `dplyr` (>= 1.0.7) work without the `noquote`:

``df %>% select(order(colnames(df)))``

Order data frame rows according to vector with specific order

Try `match`:

``df <- data.frame(name=letters[1:4], value=c(rep(TRUE, 2), rep(FALSE, 2)))target <- c("b", "c", "a", "d")df[match(target, df\$name),]  name value2    b  TRUE3    c FALSE1    a  TRUE4    d FALSE``

It will work as long as your `target` contains exactly the same elements as `df\$name`, and neither contain duplicate values.

From `?match`:

``match returns a vector of the positions of (first) matches of its first argument in its second.``

Therefore `match` finds the row numbers that matches `target`'s elements, and then we return `df` in that order.

How can I sort a dataframe by a predetermined order of factor levels in R?

We can specify the `levels` of the 'group' as `category_order` and that use that to `arrange

``library(dplyr)df1 <- df %>%           arrange(factor(group, levels = category_order))df1#  group value#1  tree    50#2 house     2#3  lake     1#4 human     5``

Or using `fct_relevel`

``library(forcats)df %>%    arrange(fct_relevel(group, category_order))``

Sorting multiple columns by first letter and by numbers in R

Another method using `dplyr`:

``library(dplyr)arrange(df, sub('_.+\$', '', item), mean)``

an alternative would be to use `str_extract` from `stringr` to extract only the first letter from `item`:

``library(stringr)arrange(df, str_extract(item, '^._'), mean)``

Result:

``  item mean1  a_c    22  a_a    43  a_b    54  b_e    15  b_f    36  b_d    7``

Data:

``df <- structure(list(item = c("a_b", "a_c", "a_a", "b_d", "b_f", "b_e"), mean = c(5L, 2L, 4L, 7L, 3L, 1L)), .Names = c("item", "mean"), class = "data.frame", row.names = c(NA, -6L))``

Notes:

• `sub('_.+\$', '', item)` creates a temporary variable by removing `_` and everything after that from `item`. `_.+\$` matches a literal underscore (`_`) followed by any character one or more times (`.+`) at the end of the string (`\$`).

• `str_extract(item, '^._')` creates a temporary variable by extracting any one character (`.`) followed by a literal underscore (`_`) in the beginning of the string (`^`)

• The neat thing about `dplyr::arrange` is that you can create a temporary sorting variable within the function and not have it included in the output.