Row wise Sorting in R
If we need only to sort
by rows, use apply
with MARGIN=1
and assign the output back to the original columns after transposing the output.
df1[-1] <- t(apply(df1[-1], 1,
FUN=function(x) sort(x, decreasing=TRUE)))
df1
# Name English Math French
# 1 John 86 78 56
# 2 Sam 97 86 79
# 3 Viru 93 44 34
NOTE: But we may need to change the column names as sorting by row gives the new sorted values.
Another option will be use apply
separately to get the column names and the values, with Map
we get the corresponding columns, cbind
with the first column to have the output.
nMat <- `dim<-`(names(df1)[-1][t(apply(df1[-1], 1,
order, decreasing=TRUE))], dim(df1[-1]))
vMat <- t(apply(df1[-1], 1, sort, decreasing=TRUE))
cbind(df1[1], data.frame(Map(cbind, as.data.frame(nMat,
stringsAsFactors=FALSE), as.data.frame(vMat))))
# Name V1.1 V1.2 V2.1 V2.2 V3.1 V3.2
#1 John French 86 Math 78 English 56
#2 Sam Math 97 French 86 English 79
#3 Viru English 93 Math 44 French 34
Or another option is data.table
. We melt
the 'wide' format to 'long' format, grouped by 'Name', we order
the 'value' in decreasing order in 'i', get the Subset of Data.table (.SD
), create a new column ('N'), grouped by 'Name' and use dcast
to convert from 'long' to 'wide'.
library(data.table)
dcast(melt(setDT(df1), id.var='Name')[order(-value),
.SD, Name][, N:=paste0("Col", 1:.N) , .(Name)],
Name~N, value.var=c("variable", "value"))
# Name variable_Col1 variable_Col2 variable_Col3 value_Col1 value_Col2 value_Col3
#1: John French Math English 86 78 56
#2: Sam Math French English 97 86 79
#3: Viru English Math French 93 44 34
EDIT:
The above data.table
solution will not work in case you have 10 or more columns with values, because then col10
will preceed col2
in the ordering, even though higher values will be stored in col2
. To resolve this issue, you can use just number for the names of your new columns as in:
dcast(melt(setDT(df1), id.var='Name')[order(-value),
.SD, Name][, N:=1:.N , .(Name)],
Name~N, value.var=c("variable", "value"))
R row-wise sort on specific columns
Instead of assigning to df
, only assign to the columns you want to sort.
df[1:2] <- t(apply(df[1:2], 1,
FUN=function(x) sort(x, decreasing=FALSE)))
Or written more simply:
to_sort <- 1:2
df[to_sort] <- t(apply(df[to_sort], 1, sort, decreasing = FALSE))
Sorting dates row-wise
The apply
approach also works with dates. They just get coerced to a character matrix, but we can coerce as.data.frame
and lapply
as.Date
over it.
my_data[-1] <- as.data.frame(t(apply(my_data[-1], 1, sort))) |> lapply(as.Date)
Gives
my_data
# id d1 d2 d3 d4 d5
# 1 1 1999-03-14 2009-08-31 2010-01-19 2013-01-01 2015-11-25
# 2 2 2000-09-10 2001-02-22 2007-01-29 2010-04-10 2019-09-11
# 3 3 2001-04-05 2007-12-26 2008-09-12 2012-07-15 2015-10-14
# 4 4 1999-03-15 2007-01-18 2009-12-19 2014-02-08 2016-07-19
# 5 5 2003-07-03 2004-07-22 2006-11-05 2009-05-31 2011-05-25
Where
str(my_data)
# 'data.frame': 5 obs. of 6 variables:
# $ id: int 1 2 3 4 5
# $ d1: Date, format: "1999-03-14" "2000-09-10" "2001-04-05" "1999-03-15" ...
# $ d2: Date, format: "2009-08-31" "2001-02-22" "2007-12-26" "2007-01-18" ...
# $ d3: Date, format: "2010-01-19" "2007-01-29" "2008-09-12" "2009-12-19" ...
# $ d4: Date, format: "2013-01-01" "2010-04-10" "2012-07-15" "2014-02-08" ...
# $ d5: Date, format: "2015-11-25" "2019-09-11" "2015-10-14" "2016-07-19" ...
Sorting each row of a data frame
You could use the plain apply
function with MARGIN = 1
to apply over rows and then transpose the result.
t(apply(df, 1, sort))
Fastest way to sort and desort rows of a matrix [r]
Row-wise sorting seems to be straightforward. To get the original order back (un-sort) we need the row-wise ranks rather than their order. Thereafter, what works for column sorting in @Josh O'Brien's answer we can adapt for rows.
Base R solution:
rr <- t(apply(m, 1, rank)) ## get initial RANKS by row
sm <- t(apply(m, 1, sort)) ## sort m
## DOING STUFF HERE ##
sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))] ## un-sort
all(m == sm) ## check
# [1] TRUE
Seems to work.
In your linked answer, the rowSort
function of the Rfast
package stands out well in terms of performance, which may cover the sorting issue. Moreover there's also a rowRanks
function that will cover our ranking issue. So we can avoid apply
.
Let's try it out.
m[1:3, ]
# [,1] [,2] [,3] [,4]
# [1,] 0.9148060 0.5142118 0.3334272 0.719355838
# [2,] 0.9370754 0.3902035 0.3467482 0.007884739
# [3,] 0.2861395 0.9057381 0.3984854 0.375489965
library(Rfast)
rr <- rowRanks(m) ## get initial RANKS by row
sm <- rowSort(m) ## sort m
sm[1:3, ] # check
# [,1] [,2] [,3] [,4]
# [1,] 0.36106962 0.4112159 0.6262453 0.6311956
# [2,] 0.01405302 0.2171577 0.5459867 0.6836634
# [3,] 0.07196981 0.2165673 0.5739766 0.6737271
## DOING STUFF HERE ##
sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))] ## un-sort
all(sm == m) ## check
# [1] TRUE
Dito.
Benchmark
m.test <- matrix(runif(4e6), ncol = 4)
dim(m.test)
# [1] 1000000 4
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# Rfast 897.6286 910.91 956.6259 924.1914 986.1246 1048.058 3 a
# baseR 87931.2824 88004.73 95659.8671 88078.1737 99524.1594 110970.145 3 c
# forloop 58927.7784 59434.54 60317.3903 59941.2930 61012.1963 62083.100 3 b
Not so bad!!
Data/Code:
set.seed(42)
m <- matrix(runif(100), nrow = 25, ncol = 4)
## benchmark
m.test <- matrix(runif(4e6), ncol = 4)
microbenchmark::microbenchmark(
Rfast={
rr <- rowRanks(m.test)
sm <- rowSort(m.test)
sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]},
baseR={
rr <- t(apply(m.test, 1, rank))
sm <- t(apply(m.test, 1, sort))
sm[] <- sm[cbind(as.vector(row(rr)), as.vector(rr))]
},
forloop={
om <- t(apply(m.test, 1, order, decreasing = T))
sm <- m.test
for (i in seq_len(nrow(m.test))) {
sm[i, ] <- sm[i, om[i, ]]
}
for (i in seq_len(nrow(m.test))) {
sm[i, ] <- sm[i, order(om[i, ])]
}
}, times=3L
)
How to sort each row of a data frame WITHOUT losing the column names
Store the names and apply them:
nm = names(df)
sorted_df <- as.data.frame(t(apply(df, 1, sort)))
names(sorted_df) = nm
You could compress this down to a single line if you prefer:
sorted_df = setNames(as.data.frame(t(apply(df, 1, sort))), names(df))
Related Topics
Import Data into R with an Unknown Number of Columns
How to Change Order of Array Dimensions
Delete a Column in a Data Frame Within a List
Dplyr - Using Mutate() Like Rowmeans()
How to Convert Integer into Categorical Data in R
Join R Data.Tables Where Key Values Are Not Exactly Equal--Combine Rows with Closest Times
Select Row with Most Recent Date by Group
Add a Box for the Na Values to the Ggplot Legend for a Continuous Map
Modify X-Axis Labels in Each Facet
Stop an R Program Without Error
Smaller Gap Between Two Legends in One Plot (E.G. Color and Size Scale)
Reading Global Variables Using Foreach in R
Join Two Data Frames in R Based on Closest Timestamp
Install.Packages Fails in Knitr Document: "Trying to Use Cran Without Setting a Mirror"
Take Sum of a Variable If Combination of Values in Two Other Columns Are Unique
Ggplot2: Connecting Points in Polar Coordinates with a Straight Line 2