Sort each row of character strings alphabetically in R
With dplyr
, you can try:
df %>%
rowwise() %>%
mutate(names = paste(sort(unlist(strsplit(names, ", ", fixed = TRUE))), collapse = ", "))
names var1 var2
<chr> <dbl> <dbl>
1 John D., Josh C., Karl H. -0.226 19.9
2 Bob S., John D., Tim H. 0.424 24.8
3 Amy A., Art U., Wes T. 1.42 25.0
4 John D., Josh C., Karl H. 5.42 20.4
Sample data:
df <- data.frame(names, var1, var2,
stringsAsFactors = FALSE)
Python order dataframe alphabetically
Pre pandas 0.17:
# Sort by ascending student name
df.sort('student')
# reverse ascending
df.sort('student', ascending=False)
Pandas 0.17+ (as mentioned in the other answers):
# ascending
df.sort_values('student')
# reverse ascending
df.sort_values('student', ascending=False)
How can I sort csv data alphabetically then numerically by column?
The following assumes bash (if you don't use bash replace $'\t'
by a quoted real tab character) and GNU coreutils. It also assumes that you want to sort alphabetically by Make
column first, then numerically in decreasing order by Total
, and finally keep at most the first 3 of each Make
entries.
Sorting is a job for sort
, head
and tail
can be used to isolate the header line, and awk
can be used to keep maximum 3 of each Make
, and re-number the first column:
$ head -n1 data.tsv; tail -n+2 data.tsv | sort -t$'\t' -k4,4 -k6,6rn |
awk -F'\t' -vOFS='\t' '$4==p {n+=1} $4!=p {n=1;p=$4} {$1=++r} n<=3'
Ranking ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 116
2 112 2008 Acura TL 110
3 50 2015 Acura TLX 102
4 15 014 Audi S4 120
5 216 2007 Chrystler 300 96
6 83 2014 Honda Accord 112
7 65 2009 Honda Fit 106
8 31 2007 Honda Fit 102
10 128 2010 Infiniti G37 128
11 124 2015 Jeep Wrangler 124
12 91 2010 Mitsu Lancer 102
13 126 2010 Volkswagen Eos 92
Note that this is different from your expected output: Make
is sorted in alphabetic order (Audi
comes after Acura
, not Honda
) and only the 3 largest Total
are kept (112, 106, 102
for Honda
, not 112, 102, 92
).
If you use GNU awk
, and your input file is small enough to fit in memory, you can also do all this with just awk
, thanks to its multidimensional arrays and its asorti
function, that sorts arrays based on indices:
$ awk -F'\t' -vOFS='\t' 'NR==1 {print; next} {l[$4][$6][$0]}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
for(m in l) {
n = asorti(l[m], t, "@ind_num_desc"); n = (n>3) ? 3 : n
for(i=1; i<=n; i++) for(s in l[m][t[i]]) {$0 = s; $1 = ++r; print}
}
}' data.tsv
Ranking ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 116
2 112 2008 Acura TL 110
3 50 2015 Acura TLX 102
4 15 014 Audi S4 120
5 216 2007 Chrystler 300 96
6 83 2014 Honda Accord 112
7 65 2009 Honda Fit 106
8 31 2007 Honda Fit 102
9 128 2010 Infiniti G37 128
10 124 2015 Jeep Wrangler 124
11 91 2010 Mitsu Lancer 102
12 126 2010 Volkswagen Eos 92
Sorting rows into non-alphabetical customized order
We can do this in base R
i1 <- with(df, ave(seq_len(nrow(df)), as.integer(gl(nrow(df), 3,
nrow(df))), FUN = function(i) c(i[c(2, 1, 3)])))
out <- df[i1,]
row.names(out) <- NULL
out
# Name Code
#1 Gas 2
#2 Tax 1
#3 Gas 2
#4 Gas 2
#5 Tax 1
#6 Gas 2
#7 Lunch 2
#8 Tax 1
#9 Lunch 2
#10 Car 2
#11 Tax 1
#12 Car 2
Or with tidyverse
library(tidyverse)
df %>% # initial dataset
uncount(Code, .remove = FALSE) %>%
mutate(rn = row_number()) %>%
group_by(grp = gl(n(), 3, n())) %>%
slice(c(2, 1, 3)) %>%
ungroup %>%
select(-rn, -grp)
Related Topics
How to Change Library Location in R
Explain Ggplot2 Warning: "Removed K Rows Containing Missing Values"
Convert Named Character Vector to Data.Frame
Creating "Radar Chart" (A.K.A. Star Plot; Spider Plot) Using Ggplot2 in R
Problems When Trying to Load a Package in R Due to Rjava
Why am I Getting X. in My Column Names When Reading a Data Frame
Backtransform 'Scale()' for Plotting
Add Empty Columns to a Dataframe with Specified Names from a Vector
Scraping a Dynamic Ecommerce Page with Infinite Scroll
How to Read Data in Utf-8 Format in R
Remove Rows in R Matrix Where All Data Is Na
Element-Wise Mean Over List of Matrices