Sorting Each Row of a Data Frame

Fastest way to sort each row in a pandas dataframe

I think I would do this in numpy:

In [11]: a = df.values

In [12]: a.sort(axis=1)  # no ascending argument

In [13]: a = a[:, ::-1]  # so reverse

In [14]: a
Out[14]:
array([[8, 4, 3, 1],
       [9, 7, 2, 2]])

In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
   A  B  C  D
0  8  4  3  1
1  9  7  2  2

I had thought this might work, but it sorts the columns:

In [21]: df.sort(axis=1, ascending=False)
Out[21]:
   D  C  B  A
0  1  8  4  3
1  2  7  2  9

Ah, pandas raises:

In [22]: df.sort(df.columns, axis=1, ascending=False)

ValueError: When sorting by column, axis must be 0 (rows)

how to sort pandas dataframe from one column

Use sort_values to sort the df by a specific column's values:

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

If you want to sort by two columns, pass a list of column labels to sort_values with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0']), the result would be sorted by column 2 then column 0. Granted, this does not really make sense for this example because each value in df['2'] is unique.

How to sort each row of a data frame WITHOUT losing the column names

Store the names and apply them:

nm = names(df)
sorted_df <- as.data.frame(t(apply(df, 1, sort)))
names(sorted_df) = nm

You could compress this down to a single line if you prefer:

sorted_df = setNames(as.data.frame(t(apply(df, 1, sort))), names(df))

Python - Sorting the values of every row in a table and get a new Pandas dataframe with original column index/labels in sorted sequence in each row

You can use .apply() on each row to sort values in descending order and get the index (i.e. column labels) of sorted sequence:

df2 = (df.set_index('Date')[['Company1', 'Company2', 'Company3']]
         .replace(r',', r'.', regex=True)
         .astype(float)
         .apply(lambda x: x.sort_values(ascending=False).index.tolist(), axis=1, result_type='expand')
         .pipe(lambda x: x.set_axis(x.columns+1, axis=1))
         .reset_index()
      )

Result:

print(df2)


         Date         1         2         3
0  01.01.2020  Company1  Company3  Company2
1  02.01.2020  Company2  Company3  Company1
2  24.10.2020  Company3  Company1  Company2

Pandas Dataframe - sort a list of values in each row of a column

If you want to sort the lists in column type and remove the duplicates checked based on other columns, you can use numpy.sort() to sort the list, and then use .drop_duplicates() to check duplicates on other columns:

Using numpy.sort() is more performance efficient than similar Python processing since numpy modules are optimized for system performance and run faster for Pandas and numpy lists/arrays.

import numpy as np

# in case your column "type" is of string type, run one of the following line (depending on your string list layout):
# use this for string list layout e.g. "['GFTBE', 'AYPIC', 'MNXYZ', 'BYPAC', 'KLUYT', 'PQRRC']"
df['type'] = df['type'].str.strip("[]").str.replace("'", "").str.split(', ')   
#df['type'] = df['type'].map(eval)    # for general use to convert string like a list to a real list
#df['type'] = df['type'].str.strip('[]').str.split(',')  # for use when no extra spaces and extra single quotes  


df['type'] = df['type'].map(np.sort).map(list)   # convert the sorted numpy array to Python list to avoid incorrect formatting (e.g. missing comma) in writing to CSV 
df = df.drop_duplicates(subset=['dt', 'name', 'City'])

Result:

print(df)

           dt name                                        type City
0  05-10-2021   MK  [AYPIC, BYPAC, GFTBE, KLUYT, MNXYZ, PQRRC]  NYC
2  05-12-2021   MK  [AYPIC, BYPAC, GFTBE, KLUYT, MNXYZ, PQRRC]  NYC
4  05-13-2021   PS  [CPQLE, LTRDX, ORSHC, QRTSL, VXWUT, XYDFE]  BAL

how to sort each row of a dataframe?

Use array_sort

from pyspark.sql import functions as F

df.withColumn('col', F.array_sort('col')).show(10, False)

# Output
# +-------------------------------------------------------+
# |col                                                    |
# +-------------------------------------------------------+
# |[Computer Programming, R Programming Language]         |
# |[R Programming Language, Working Under Pressure]       |
# |[Entity Relationship Models, Master Data Management]   |
# |[Master Data Management, Statistical Analysis Software]|
# +-------------------------------------------------------+

Sorting the values inside row in a data frame, by the order of its factor levels?

In your code, the issue is happening at this line.

arcana_table <- as.data.frame(matrix(shuffled_arcana, nrow = 5, ncol = 5))

shuffled_arcana is a factored vector but you cannot have a factor-matrix so it changes the vector from factor to character and hence, sorting does not happen as desired.

Here's a way -

set.seed(2022)

arcanaVector <- c(rep("Supreme", 3),
                  rep(c("Good", "Moderate", "Poor", "Awful"), each = 5),
                  rep("Worst", 2))
arcanaLevels <- c("Supreme", "Good", "Moderate", "Poor", "Awful", "Worst")
shuffled_arcana <- sample(arcanaVector)
arcana_table <- matrix(shuffled_arcana,nrow = 5, ncol = 5)
row.names(arcana_table) <- c("Presence", "Manner", "Expression", "Complexity", "Tradition")

arcana_table <- apply(arcana_table, 1, function(x) sort(factor(x, arcanaLevels))) |>
  t() |>
  as.data.frame()

arcana_table

#                 V1       V2       V3       V4    V5
#Presence       Good     Good     Good     Good Awful
#Manner      Supreme Moderate     Poor    Awful Awful
#Expression  Supreme  Supreme Moderate Moderate  Poor
#Complexity Moderate Moderate     Poor     Poor Worst
#Tradition      Good     Poor    Awful    Awful Worst

If you want to change a specific row you may use -

arcana_table[1, ] <- as.character(sort(factor(arcana_table[1, ], arcanaLevels)))

How to sort dataframe rows by multiple columns

Use sort_values, which can accept a list of sorting targets. In this case it sounds like you want to sort by S/N, then Dis, then Rate:

df = df.sort_values(['S/N', 'Dis', 'Rate'])

#     S/N      Dis       Rate
# 0   332   4.6030  91.204062
# 3   332   9.1985  76.212943
# 6   332  14.4405  77.664282
# 9   332  20.2005  76.725955
# 12  332  25.4780  31.597510
# 15  332  30.6670  74.096975
# 1   445   5.4280  60.233917
# 4   445   9.7345  31.902842
# 7   445  14.6015  36.261851
# 10  445  19.8630  40.705467
# 13  445  24.9050   4.897008
# 16  445  30.0550  35.217889
# ...