Sorting Each Row of a Data Frame

Fastest way to sort each row in a pandas dataframe

I think I would do this in numpy:

In [11]: a = df.values

In [12]: a.sort(axis=1) # no ascending argument

In [13]: a = a[:, ::-1] # so reverse

In [14]: a
Out[14]:
array([[8, 4, 3, 1],
[9, 7, 2, 2]])

In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
A B C D
0 8 4 3 1
1 9 7 2 2

I had thought this might work, but it sorts the columns:

In [21]: df.sort(axis=1, ascending=False)
Out[21]:
D C B A
0 1 8 4 3
1 2 7 2 9

Ah, pandas raises:

In [22]: df.sort(df.columns, axis=1, ascending=False)

ValueError: When sorting by column, axis must be 0 (rows)

how to sort pandas dataframe from one column

Use sort_values to sort the df by a specific column's values:

In [18]:
df.sort_values('2')

Out[18]:
0 1 2
4 85.6 January 1.0
3 95.5 February 2.0
7 104.8 March 3.0
0 354.7 April 4.0
8 283.5 May 5.0
6 238.7 June 6.0
5 152.0 July 7.0
1 55.4 August 8.0
11 212.7 September 9.0
10 249.6 October 10.0
9 278.8 November 11.0
2 176.5 December 12.0

If you want to sort by two columns, pass a list of column labels to sort_values with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0']), the result would be sorted by column 2 then column 0. Granted, this does not really make sense for this example because each value in df['2'] is unique.

How to sort each row of a data frame WITHOUT losing the column names

Store the names and apply them:

nm = names(df)
sorted_df <- as.data.frame(t(apply(df, 1, sort)))
names(sorted_df) = nm

You could compress this down to a single line if you prefer:

sorted_df = setNames(as.data.frame(t(apply(df, 1, sort))), names(df))

Python - Sorting the values of every row in a table and get a new Pandas dataframe with original column index/labels in sorted sequence in each row

You can use .apply() on each row to sort values in descending order and get the index (i.e. column labels) of sorted sequence:

df2 = (df.set_index('Date')[['Company1', 'Company2', 'Company3']]
.replace(r',', r'.', regex=True)
.astype(float)
.apply(lambda x: x.sort_values(ascending=False).index.tolist(), axis=1, result_type='expand')
.pipe(lambda x: x.set_axis(x.columns+1, axis=1))
.reset_index()
)

Result:

print(df2)


Date 1 2 3
0 01.01.2020 Company1 Company3 Company2
1 02.01.2020 Company2 Company3 Company1
2 24.10.2020 Company3 Company1 Company2

Pandas Dataframe - sort a list of values in each row of a column

If you want to sort the lists in column type and remove the duplicates checked based on other columns, you can use numpy.sort() to sort the list, and then use .drop_duplicates() to check duplicates on other columns:

Using numpy.sort() is more performance efficient than similar Python processing since numpy modules are optimized for system performance and run faster for Pandas and numpy lists/arrays.

import numpy as np

# in case your column "type" is of string type, run one of the following line (depending on your string list layout):
# use this for string list layout e.g. "['GFTBE', 'AYPIC', 'MNXYZ', 'BYPAC', 'KLUYT', 'PQRRC']"
df['type'] = df['type'].str.strip("[]").str.replace("'", "").str.split(', ')
#df['type'] = df['type'].map(eval) # for general use to convert string like a list to a real list
#df['type'] = df['type'].str.strip('[]').str.split(',') # for use when no extra spaces and extra single quotes


df['type'] = df['type'].map(np.sort).map(list) # convert the sorted numpy array to Python list to avoid incorrect formatting (e.g. missing comma) in writing to CSV
df = df.drop_duplicates(subset=['dt', 'name', 'City'])

Result:

print(df)

dt name type City
0 05-10-2021 MK [AYPIC, BYPAC, GFTBE, KLUYT, MNXYZ, PQRRC] NYC
2 05-12-2021 MK [AYPIC, BYPAC, GFTBE, KLUYT, MNXYZ, PQRRC] NYC
4 05-13-2021 PS [CPQLE, LTRDX, ORSHC, QRTSL, VXWUT, XYDFE] BAL

how to sort each row of a dataframe?

Use array_sort

from pyspark.sql import functions as F

df.withColumn('col', F.array_sort('col')).show(10, False)

# Output
# +-------------------------------------------------------+
# |col |
# +-------------------------------------------------------+
# |[Computer Programming, R Programming Language] |
# |[R Programming Language, Working Under Pressure] |
# |[Entity Relationship Models, Master Data Management] |
# |[Master Data Management, Statistical Analysis Software]|
# +-------------------------------------------------------+

Sorting the values inside row in a data frame, by the order of its factor levels?

In your code, the issue is happening at this line.

arcana_table <- as.data.frame(matrix(shuffled_arcana, nrow = 5, ncol = 5))

shuffled_arcana is a factored vector but you cannot have a factor-matrix so it changes the vector from factor to character and hence, sorting does not happen as desired.

Here's a way -

set.seed(2022)

arcanaVector <- c(rep("Supreme", 3),
rep(c("Good", "Moderate", "Poor", "Awful"), each = 5),
rep("Worst", 2))
arcanaLevels <- c("Supreme", "Good", "Moderate", "Poor", "Awful", "Worst")
shuffled_arcana <- sample(arcanaVector)
arcana_table <- matrix(shuffled_arcana,nrow = 5, ncol = 5)
row.names(arcana_table) <- c("Presence", "Manner", "Expression", "Complexity", "Tradition")

arcana_table <- apply(arcana_table, 1, function(x) sort(factor(x, arcanaLevels))) |>
t() |>
as.data.frame()

arcana_table

# V1 V2 V3 V4 V5
#Presence Good Good Good Good Awful
#Manner Supreme Moderate Poor Awful Awful
#Expression Supreme Supreme Moderate Moderate Poor
#Complexity Moderate Moderate Poor Poor Worst
#Tradition Good Poor Awful Awful Worst

If you want to change a specific row you may use -

arcana_table[1, ] <- as.character(sort(factor(arcana_table[1, ], arcanaLevels))) 

How to sort dataframe rows by multiple columns

Use sort_values, which can accept a list of sorting targets. In this case it sounds like you want to sort by S/N, then Dis, then Rate:

df = df.sort_values(['S/N', 'Dis', 'Rate'])

# S/N Dis Rate
# 0 332 4.6030 91.204062
# 3 332 9.1985 76.212943
# 6 332 14.4405 77.664282
# 9 332 20.2005 76.725955
# 12 332 25.4780 31.597510
# 15 332 30.6670 74.096975
# 1 445 5.4280 60.233917
# 4 445 9.7345 31.902842
# 7 445 14.6015 36.261851
# 10 445 19.8630 40.705467
# 13 445 24.9050 4.897008
# 16 445 30.0550 35.217889
# ...


Related Topics



Leave a reply



Submit