How to Sort Pandas Dataframe from One Column

how to sort pandas dataframe from one column

Use sort_values to sort the df by a specific column's values:

In [18]:
df.sort_values('2')

Out[18]:
0 1 2
4 85.6 January 1.0
3 95.5 February 2.0
7 104.8 March 3.0
0 354.7 April 4.0
8 283.5 May 5.0
6 238.7 June 6.0
5 152.0 July 7.0
1 55.4 August 8.0
11 212.7 September 9.0
10 249.6 October 10.0
9 278.8 November 11.0
2 176.5 December 12.0

If you want to sort by two columns, pass a list of column labels to sort_values with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0']), the result would be sorted by column 2 then column 0. Granted, this does not really make sense for this example because each value in df['2'] is unique.

How to sort a pandas DataFrame on one column given an already ordered list of the values in that column?

Approach 1

Convert the Fruit column to ordered categorical type and sort the values

df['fruit'] = pd.Categorical(df['fruit'], ordered_list, ordered=True)
df.sort_values('fruit')

Approach 2

Sort the values by passing a key function, which maps the fruit names to there corresponding order

df.sort_values('fruit', key=lambda x: x.map({v:k for k, v in enumerate(ordered_list)}))


   id      fruit  trash
2 3 pineapple 93
1 2 banana 22
3 4 orange 1
4 5 orange 15
0 1 apple 38

Sort pandas dataframe by two columns using key in one of them, kind mergesort, not working

import pandas as pd

data = {
"col1": ["chr5","chr5","chr5","chr3","chr3","chr3","chr3","chr2","chr2","chr2","chr11"],
"col2": ["CDS","gene","mRNA","three_prime_UTR","gene","CDS","mRNA","CDS","gene","mRNA","CDS"]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print("Before Sort:",df)
df['col2'] = pd.Categorical(df['col2'],categories=['gene','mRNA','five_prime_UTR', 'CDS', 'three_prime_UTR'],ordered=True)
df['new'] = df['col1'].str.extract('(\d+$)').astype(int)
df = df.sort_values(by=['new', 'col2']).drop('new', axis=1)
df.reset_index(drop=True, inplace=True)
print("\n\nAfter sort:",df)

For col2 i have used categorical sort and for col1 retrived number in end and sorted based on it and dropped newly created column "new".

Output:

Before Sort:      col1             col2
0 chr5 CDS
1 chr5 gene
2 chr5 mRNA
3 chr3 three_prime_UTR
4 chr3 gene
5 chr3 CDS
6 chr3 mRNA
7 chr2 CDS
8 chr2 gene
9 chr2 mRNA
10 chr11 CDS

After sort: col1 col2
0 chr2 gene
1 chr2 mRNA
2 chr2 CDS
3 chr3 gene
4 chr3 mRNA
5 chr3 CDS
6 chr3 three_prime_UTR
7 chr5 gene
8 chr5 mRNA
9 chr5 CDS
10 chr11 CDS

Sort a pandas dataframe by 2 columns (one with integers, one with alphanumerics) with priority for integer column

You can do it this way:

  1. Split the second column with alphanumeric strings into 2 columns: one column Letter to hold the first letter and another column Number to hold a number of one or two digits.
  2. Convert Number column from string to integer.
  3. Then, sort these 2 new columns together with the first column of integers

Let's illustrate the process with an example below:

Assume we have the dataframe df as follows:

print(df)

Col1 Col2
0 2 B12
1 11 C2
2 2 A1
3 11 B2
4 2 B1
5 11 C12
6 2 A12
7 11 C1
8 2 A2

Step 1 & 2: Split Col2 into 2 columns Letter & Number + Convert Number column from string to integer:

df['Letter'] = df['Col2'].str[0]               # take 1st char
df['Number'] = df['Col2'].str[1:].astype(int) # take 2nd char onwards and convert to integer

Result:

print(df)

Col1 Col2 Letter Number
0 2 B12 B 12
1 11 C2 C 2
2 2 A1 A 1
3 11 B2 B 2
4 2 B1 B 1
5 11 C12 C 12
6 2 A12 A 12
7 11 C1 C 1
8 2 A2 A 2

Step 3: Sort Col1, Letter and Number with priority: Col1 ---> Number ---> Letter:

df = df.sort_values(by=['Col1', 'Number', 'Letter'])

Result:

print(df)

Col1 Col2 Letter Number
2 2 A1 A 1
4 2 B1 B 1
8 2 A2 A 2
6 2 A12 A 12
0 2 B12 B 12
7 11 C1 C 1
3 11 B2 B 2
1 11 C2 C 2
5 11 C12 C 12

After sorting, you can remove the Letter and Number columns, as follows:

df = df.drop(['Letter', 'Number'], axis=1)

If you want to do all in one step, you can also chain the instructions, as follows:

df = (df.assign(Letter=df['Col2'].str[0], 
Number=df['Col2'].str[1:].astype(int))
.sort_values(by=['Col1', 'Number', 'Letter'])
.drop(['Letter', 'Number'], axis=1)
)

Result:

print(df)

Col1 Col2
2 2 A1
4 2 B1
8 2 A2
6 2 A12
0 2 B12
7 11 C1
3 11 B2
1 11 C2
5 11 C12

How to sort ascending and descending depending on a value in another column in pandas?

If you can assume that your "price" column will always contain non-negative values, we could "cheat". Assign a negative value to the prices of buy or sell operations, sort, and then calculate the absolute value to go back to the original prices:

  1. If type is "buy", the price remains positive (2 * 1 - 1 = 1). If type is "sell", the price will become negative (2 * 0 - 1 = -1).

    df["price"] = df["price"] * (2 * (df["type"] == "buy").astype(int) - 1)
  2. Now sort values normally. I've included both "initiator_id" and "type" columns to match your expected output:

    df = df.sort_values(["initiator_id", "type", "price"])
  3. Finally, calculate the absolute value of the "price" column to retrieve your original values:

    df["price"] = df["price"].abs()

Expected output of this operation on your sample input:

   initiator_id   price  type  bidnum
0 1 170.81 sell 0
2 2 169.19 buy 0
1 2 170.81 sell 0
4 3 169.19 buy 0
3 3 170.81 sell 0
5 3 70.81 sell 1
9 4 69.19 buy 1
7 4 169.19 buy 0
6 4 170.81 sell 0
8 4 70.81 sell 1

How to sort dataframe rows by multiple columns

Use sort_values, which can accept a list of sorting targets. In this case it sounds like you want to sort by S/N, then Dis, then Rate:

df = df.sort_values(['S/N', 'Dis', 'Rate'])

# S/N Dis Rate
# 0 332 4.6030 91.204062
# 3 332 9.1985 76.212943
# 6 332 14.4405 77.664282
# 9 332 20.2005 76.725955
# 12 332 25.4780 31.597510
# 15 332 30.6670 74.096975
# 1 445 5.4280 60.233917
# 4 445 9.7345 31.902842
# 7 445 14.6015 36.261851
# 10 445 19.8630 40.705467
# 13 445 24.9050 4.897008
# 16 445 30.0550 35.217889
# ...

Sort values in a dataframe by a column and take second one only if equal

Your solution almost working well, but if use inplace in reset_index it is not reused in sort_values.

Possible solution is add ignore_index=True, so reset_index is not necessary.

np.random.seed(2022)  
df = pd.DataFrame({'col1':np.random.random(5), 'col2':np.random.random(5)})
df = df.sort_values(by=['col2','col1'],ascending=False, ignore_index=True)
print (df)
col1 col2
0 0.499058 0.897657
1 0.049974 0.896963
2 0.685408 0.721135
3 0.113384 0.647452
4 0.009359 0.486988

Or if want use inplace add it only to sort_values and add also ignore_index=True:

df.sort_values(by=['col2','col1'],ascending=False, ignore_index=True,inplace=True)
print (df)
col1 col2
0 0.499058 0.897657
1 0.049974 0.896963
2 0.685408 0.721135
3 0.113384 0.647452
4 0.009359 0.486988


Related Topics



Leave a reply



Submit