Sorting Rows Alphabetically

Sort each row of character strings alphabetically in R

With dplyr, you can try:

df %>%
rowwise() %>%
mutate(names = paste(sort(unlist(strsplit(names, ", ", fixed = TRUE))), collapse = ", "))

names var1 var2
<chr> <dbl> <dbl>
1 John D., Josh C., Karl H. -0.226 19.9
2 Bob S., John D., Tim H. 0.424 24.8
3 Amy A., Art U., Wes T. 1.42 25.0
4 John D., Josh C., Karl H. 5.42 20.4

Sample data:

df <- data.frame(names, var1, var2,
stringsAsFactors = FALSE)

Python order dataframe alphabetically

Pre pandas 0.17:

# Sort by ascending student name
df.sort('student')
# reverse ascending
df.sort('student', ascending=False)

Pandas 0.17+ (as mentioned in the other answers):

# ascending
df.sort_values('student')
# reverse ascending
df.sort_values('student', ascending=False)

How can I sort csv data alphabetically then numerically by column?

The following assumes bash (if you don't use bash replace $'\t' by a quoted real tab character) and GNU coreutils. It also assumes that you want to sort alphabetically by Make column first, then numerically in decreasing order by Total, and finally keep at most the first 3 of each Make entries.

Sorting is a job for sort, head and tail can be used to isolate the header line, and awk can be used to keep maximum 3 of each Make, and re-number the first column:

$ head -n1 data.tsv; tail -n+2 data.tsv | sort -t$'\t' -k4,4 -k6,6rn |
awk -F'\t' -vOFS='\t' '$4==p {n+=1} $4!=p {n=1;p=$4} {$1=++r} n<=3'
Ranking ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 116
2 112 2008 Acura TL 110
3 50 2015 Acura TLX 102
4 15 014 Audi S4 120
5 216 2007 Chrystler 300 96
6 83 2014 Honda Accord 112
7 65 2009 Honda Fit 106
8 31 2007 Honda Fit 102
10 128 2010 Infiniti G37 128
11 124 2015 Jeep Wrangler 124
12 91 2010 Mitsu Lancer 102
13 126 2010 Volkswagen Eos 92

Note that this is different from your expected output: Make is sorted in alphabetic order (Audi comes after Acura, not Honda) and only the 3 largest Total are kept (112, 106, 102 for Honda, not 112, 102, 92).

If you use GNU awk, and your input file is small enough to fit in memory, you can also do all this with just awk, thanks to its multidimensional arrays and its asorti function, that sorts arrays based on indices:

$ awk -F'\t' -vOFS='\t' 'NR==1 {print; next} {l[$4][$6][$0]}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
for(m in l) {
n = asorti(l[m], t, "@ind_num_desc"); n = (n>3) ? 3 : n
for(i=1; i<=n; i++) for(s in l[m][t[i]]) {$0 = s; $1 = ++r; print}
}
}' data.tsv
Ranking ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 116
2 112 2008 Acura TL 110
3 50 2015 Acura TLX 102
4 15 014 Audi S4 120
5 216 2007 Chrystler 300 96
6 83 2014 Honda Accord 112
7 65 2009 Honda Fit 106
8 31 2007 Honda Fit 102
9 128 2010 Infiniti G37 128
10 124 2015 Jeep Wrangler 124
11 91 2010 Mitsu Lancer 102
12 126 2010 Volkswagen Eos 92

Sorting rows into non-alphabetical customized order

We can do this in base R

i1 <- with(df, ave(seq_len(nrow(df)), as.integer(gl(nrow(df), 3, 
nrow(df))), FUN = function(i) c(i[c(2, 1, 3)])))
out <- df[i1,]
row.names(out) <- NULL
out
# Name Code
#1 Gas 2
#2 Tax 1
#3 Gas 2
#4 Gas 2
#5 Tax 1
#6 Gas 2
#7 Lunch 2
#8 Tax 1
#9 Lunch 2
#10 Car 2
#11 Tax 1
#12 Car 2

Or with tidyverse

library(tidyverse)
df %>% # initial dataset
uncount(Code, .remove = FALSE) %>%
mutate(rn = row_number()) %>%
group_by(grp = gl(n(), 3, n())) %>%
slice(c(2, 1, 3)) %>%
ungroup %>%
select(-rn, -grp)


Related Topics



Leave a reply



Submit