how to sort pandas dataframe from one column
Use sort_values
to sort the df by a specific column's values:
In [18]:
df.sort_values('2')
Out[18]:
0 1 2
4 85.6 January 1.0
3 95.5 February 2.0
7 104.8 March 3.0
0 354.7 April 4.0
8 283.5 May 5.0
6 238.7 June 6.0
5 152.0 July 7.0
1 55.4 August 8.0
11 212.7 September 9.0
10 249.6 October 10.0
9 278.8 November 11.0
2 176.5 December 12.0
If you want to sort by two columns, pass a list of column labels to sort_values
with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0'])
, the result would be sorted by column 2
then column 0
. Granted, this does not really make sense for this example because each value in df['2']
is unique.
How to sort a pandas DataFrame on one column given an already ordered list of the values in that column?
Approach 1
Convert the Fruit
column to ordered categorical type and sort the values
df['fruit'] = pd.Categorical(df['fruit'], ordered_list, ordered=True)
df.sort_values('fruit')
Approach 2
Sort the values by passing a key function, which maps the fruit names to there corresponding order
df.sort_values('fruit', key=lambda x: x.map({v:k for k, v in enumerate(ordered_list)}))
id fruit trash
2 3 pineapple 93
1 2 banana 22
3 4 orange 1
4 5 orange 15
0 1 apple 38
Sort pandas dataframe by two columns using key in one of them, kind mergesort, not working
import pandas as pd
data = {
"col1": ["chr5","chr5","chr5","chr3","chr3","chr3","chr3","chr2","chr2","chr2","chr11"],
"col2": ["CDS","gene","mRNA","three_prime_UTR","gene","CDS","mRNA","CDS","gene","mRNA","CDS"]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print("Before Sort:",df)
df['col2'] = pd.Categorical(df['col2'],categories=['gene','mRNA','five_prime_UTR', 'CDS', 'three_prime_UTR'],ordered=True)
df['new'] = df['col1'].str.extract('(\d+$)').astype(int)
df = df.sort_values(by=['new', 'col2']).drop('new', axis=1)
df.reset_index(drop=True, inplace=True)
print("\n\nAfter sort:",df)
For col2 i have used categorical sort and for col1 retrived number in end and sorted based on it and dropped newly created column "new".
Output:
Before Sort: col1 col2
0 chr5 CDS
1 chr5 gene
2 chr5 mRNA
3 chr3 three_prime_UTR
4 chr3 gene
5 chr3 CDS
6 chr3 mRNA
7 chr2 CDS
8 chr2 gene
9 chr2 mRNA
10 chr11 CDS
After sort: col1 col2
0 chr2 gene
1 chr2 mRNA
2 chr2 CDS
3 chr3 gene
4 chr3 mRNA
5 chr3 CDS
6 chr3 three_prime_UTR
7 chr5 gene
8 chr5 mRNA
9 chr5 CDS
10 chr11 CDS
Sort a pandas dataframe by 2 columns (one with integers, one with alphanumerics) with priority for integer column
You can do it this way:
- Split the second column with alphanumeric strings into 2 columns: one column
Letter
to hold the first letter and another columnNumber
to hold a number of one or two digits. - Convert
Number
column from string to integer. - Then, sort these 2 new columns together with the first column of integers
Let's illustrate the process with an example below:
Assume we have the dataframe df
as follows:
print(df)
Col1 Col2
0 2 B12
1 11 C2
2 2 A1
3 11 B2
4 2 B1
5 11 C12
6 2 A12
7 11 C1
8 2 A2
Step 1 & 2: Split Col2
into 2 columns Letter
& Number
+ Convert Number
column from string to integer:
df['Letter'] = df['Col2'].str[0] # take 1st char
df['Number'] = df['Col2'].str[1:].astype(int) # take 2nd char onwards and convert to integer
Result:
print(df)
Col1 Col2 Letter Number
0 2 B12 B 12
1 11 C2 C 2
2 2 A1 A 1
3 11 B2 B 2
4 2 B1 B 1
5 11 C12 C 12
6 2 A12 A 12
7 11 C1 C 1
8 2 A2 A 2
Step 3: Sort Col1
, Letter
and Number
with priority: Col1
---> Number
---> Letter
:
df = df.sort_values(by=['Col1', 'Number', 'Letter'])
Result:
print(df)
Col1 Col2 Letter Number
2 2 A1 A 1
4 2 B1 B 1
8 2 A2 A 2
6 2 A12 A 12
0 2 B12 B 12
7 11 C1 C 1
3 11 B2 B 2
1 11 C2 C 2
5 11 C12 C 12
After sorting, you can remove the Letter
and Number
columns, as follows:
df = df.drop(['Letter', 'Number'], axis=1)
If you want to do all in one step, you can also chain the instructions, as follows:
df = (df.assign(Letter=df['Col2'].str[0],
Number=df['Col2'].str[1:].astype(int))
.sort_values(by=['Col1', 'Number', 'Letter'])
.drop(['Letter', 'Number'], axis=1)
)
Result:
print(df)
Col1 Col2
2 2 A1
4 2 B1
8 2 A2
6 2 A12
0 2 B12
7 11 C1
3 11 B2
1 11 C2
5 11 C12
How to sort ascending and descending depending on a value in another column in pandas?
If you can assume that your "price"
column will always contain non-negative values, we could "cheat". Assign a negative value to the prices of buy or sell operations, sort, and then calculate the absolute value to go back to the original prices:
If type is
"buy"
, the price remains positive (2 * 1 - 1 = 1). If type is"sell"
, the price will become negative (2 * 0 - 1 = -1).df["price"] = df["price"] * (2 * (df["type"] == "buy").astype(int) - 1)
Now sort values normally. I've included both
"initiator_id"
and"type"
columns to match your expected output:df = df.sort_values(["initiator_id", "type", "price"])
Finally, calculate the absolute value of the
"price"
column to retrieve your original values:df["price"] = df["price"].abs()
Expected output of this operation on your sample input:
initiator_id price type bidnum
0 1 170.81 sell 0
2 2 169.19 buy 0
1 2 170.81 sell 0
4 3 169.19 buy 0
3 3 170.81 sell 0
5 3 70.81 sell 1
9 4 69.19 buy 1
7 4 169.19 buy 0
6 4 170.81 sell 0
8 4 70.81 sell 1
How to sort dataframe rows by multiple columns
Use sort_values
, which can accept a list of sorting targets. In this case it sounds like you want to sort by S/N
, then Dis
, then Rate
:
df = df.sort_values(['S/N', 'Dis', 'Rate'])
# S/N Dis Rate
# 0 332 4.6030 91.204062
# 3 332 9.1985 76.212943
# 6 332 14.4405 77.664282
# 9 332 20.2005 76.725955
# 12 332 25.4780 31.597510
# 15 332 30.6670 74.096975
# 1 445 5.4280 60.233917
# 4 445 9.7345 31.902842
# 7 445 14.6015 36.261851
# 10 445 19.8630 40.705467
# 13 445 24.9050 4.897008
# 16 445 30.0550 35.217889
# ...
Sort values in a dataframe by a column and take second one only if equal
Your solution almost working well, but if use inplace
in reset_index
it is not reused in sort_values
.
Possible solution is add ignore_index=True
, so reset_index
is not necessary.
np.random.seed(2022)
df = pd.DataFrame({'col1':np.random.random(5), 'col2':np.random.random(5)})
df = df.sort_values(by=['col2','col1'],ascending=False, ignore_index=True)
print (df)
col1 col2
0 0.499058 0.897657
1 0.049974 0.896963
2 0.685408 0.721135
3 0.113384 0.647452
4 0.009359 0.486988
Or if want use inplace
add it only to sort_values
and add also ignore_index=True
:
df.sort_values(by=['col2','col1'],ascending=False, ignore_index=True,inplace=True)
print (df)
col1 col2
0 0.499058 0.897657
1 0.049974 0.896963
2 0.685408 0.721135
3 0.113384 0.647452
4 0.009359 0.486988
Related Topics
How to Limit the Amount of Time a Function Can Run for (Add a Timeout)
How to Select Which Version of Python I am Running on Linux
Auto Executable Python File Without Opening from Terminal
Multiple Kernels in Enthought Canopy
Sharing Psycopg2/Libpq Connections Across Processes
Python Deepcopy and Shallow Copy and Pass Reference
Run Python Script Only If It's Not Running
How to Split My 800X480 5-Inch Screen into 2 Parts
Conversion Text to Number in Python
Modulenotfounderror: No Module Named 'Pydip', Although It's Installed
Reading from Linux Command Line with Python
How to Pass a List Variable to Subprocess.Call Command in Python
Launch Default Image Viewer from Pygtk Program
How to Connect to Flask Local Server
How to Deploy a Python Dash Application on an Internal Company Server