Naturally sorting Pandas DataFrame
If you want to sort the df, just sort the index or the data and assign directly to the index of the df rather than trying to pass the df as an arg as that yields an empty list:
In [7]:
df.index = natsorted(a)
df.index
Out[7]:
Index(['0hr', '48hr', '72hr', '96hr', '128hr'], dtype='object')
Note that df.index = natsorted(df.index)
also works
if you pass the df as an arg it yields an empty list, in this case because the df is empty (has no columns), otherwise it will return the columns sorted which is not what you want:
In [10]:
natsorted(df)
Out[10]:
[]
EDIT
If you want to sort the index so that the data is reordered along with the index then use reindex
:
In [13]:
df=pd.DataFrame(index=a, data=np.arange(5))
df
Out[13]:
0
0hr 0
128hr 1
72hr 2
48hr 3
96hr 4
In [14]:
df = df*2
df
Out[14]:
0
0hr 0
128hr 2
72hr 4
48hr 6
96hr 8
In [15]:
df.reindex(index=natsorted(df.index))
Out[15]:
0
0hr 0
48hr 6
72hr 4
96hr 8
128hr 2
Note that you have to assign the result of reindex
to either a new df or to itself, it does not accept the inplace
param.
Natural sort a data frame column in pandas
You can convert values to ordered categorical with sorted catgories by natsorted
and then use sort_values
:
import natsort as ns
df['a'] = pd.Categorical(df['a'], ordered=True, categories= ns.natsorted(df['a'].unique()))
df = df.sort_values('a')
print (df)
a b
5 a1 b1
2 a1 b11
4 a3 b4
3 a10 b22
6 a11 b12
1 a20 b2
0 a22 b5
df['b'] = pd.Categorical(df['b'], ordered=True, categories= ns.natsorted(df['b'].unique()))
df = df.sort_values('b')
print (df)
a b
5 a1 b1
1 a20 b2
4 a3 b4
0 a22 b5
2 a1 b11
6 a11 b12
3 a10 b22
Natural sorting in pandas
Use str.extract
, sort_values
, then use the index to reindex df
.
idx = (df.assign(ID2=df.ID.str.extract(r'(\d+)$').astype(int))
.sort_values(['ID2', 'Time'])
.index)
df.iloc[idx]
ID Time oneMissing singleValue empty oneEmpty
0 CS1-1 1 10000.0 NaN None 0.0
2 CS1-1 2 30000.0 NaN None 0.0
3 CS1-2 1 10000.0 NaN None NaN
1 CS1-2 2 20000.0 0.0 None 0.0
5 CS1-2 3 30000.0 NaN None NaN
4 CS1-11 1 NaN 0.0 None NaN
This is under the assumption that your ID column follows the pattern "XXX-NUMBER".
A fool-proof solution will involve the use of the natsort
module, which excels at fast natural sorting. With a little elbow-grease, we can argsort your data.
from natsort import natsorted
idx, *_ = zip(*natsorted(
zip(df.index, df.ID, df.Time), key=lambda x: (x[1], x[2])))
df.iloc[list(idx)]
ID Time oneMissing singleValue empty oneEmpty
0 CS1-1 1 10000.0 NaN None 0.0
2 CS1-1 2 30000.0 NaN None 0.0
3 CS1-2 1 10000.0 NaN None NaN
1 CS1-2 2 20000.0 0.0 None 0.0
5 CS1-2 3 30000.0 NaN None NaN
4 CS1-11 1 NaN 0.0 None NaN
Use PyPi to install: pip install natsort
.
Naturally sort pandas DataFrame by index
To handle the additional requirement that all "S" come before "N", use the natsort
module, and pass a key=...
argument to natsorted
:
natsorted(df.index, lambda x: (x[:-1], x[-1] == 'N'))
# ['1S', '1N', '2S', '2N', '15N', '16S', '16N']
df.loc[natsorted(df.index, lambda x: (x[:-1], x[-1] == 'N'))]
material
1S 716.2
1N 724.5
2S 723.5
2N 721.5
15N 649.7
16S 703.2
16N 711.7
natsort
is more generalised for natural sorting problems, and is a handy tool for times like this. Installation can be done using PyPi.
If the ordering of "S" and "N" is not important, consider argsort
for performance:
df.iloc[df.index.str[:-1].astype(int).argsort()]
Or,
df.iloc[np.argsort([int(x[:-1]) for x in df.index])]
material
1S 716.2
1N 724.5
2S 723.5
2N 721.5
15N 649.7
16S 703.2
16N 711.7
Sort pandas dataframe on index that is string+digits
Onne idea is convert index
to Series
with Series.str.split
to DataFrame
, convert second column to integers and sorting both columns and this index is used for change order in original df.index
by DataFrame.reindex
:
df1 = df.index.to_series().str.split('_',expand=True)
df1[1] = df1[1].astype(int)
df1 = df1.sort_values([0, 1], ascending=[True, False])
print (df1)
0 1
index
A_100 A 100
A_60 A 60
A_30 A 30
B_100 B 100
B_60 B 60
B_30 B 30
df = df.reindex(df1.index)
print (df)
vals
index
A_100 0
A_60 12
A_30 13
B_100 12
B_60 6
B_30 6
Sort a pandas dataframe by 2 columns (one with integers, one with alphanumerics) with priority for integer column
You can do it this way:
- Split the second column with alphanumeric strings into 2 columns: one column
Letter
to hold the first letter and another columnNumber
to hold a number of one or two digits. - Convert
Number
column from string to integer. - Then, sort these 2 new columns together with the first column of integers
Let's illustrate the process with an example below:
Assume we have the dataframe df
as follows:
print(df)
Col1 Col2
0 2 B12
1 11 C2
2 2 A1
3 11 B2
4 2 B1
5 11 C12
6 2 A12
7 11 C1
8 2 A2
Step 1 & 2: Split Col2
into 2 columns Letter
& Number
+ Convert Number
column from string to integer:
df['Letter'] = df['Col2'].str[0] # take 1st char
df['Number'] = df['Col2'].str[1:].astype(int) # take 2nd char onwards and convert to integer
Result:
print(df)
Col1 Col2 Letter Number
0 2 B12 B 12
1 11 C2 C 2
2 2 A1 A 1
3 11 B2 B 2
4 2 B1 B 1
5 11 C12 C 12
6 2 A12 A 12
7 11 C1 C 1
8 2 A2 A 2
Step 3: Sort Col1
, Letter
and Number
with priority: Col1
---> Number
---> Letter
:
df = df.sort_values(by=['Col1', 'Number', 'Letter'])
Result:
print(df)
Col1 Col2 Letter Number
2 2 A1 A 1
4 2 B1 B 1
8 2 A2 A 2
6 2 A12 A 12
0 2 B12 B 12
7 11 C1 C 1
3 11 B2 B 2
1 11 C2 C 2
5 11 C12 C 12
After sorting, you can remove the Letter
and Number
columns, as follows:
df = df.drop(['Letter', 'Number'], axis=1)
If you want to do all in one step, you can also chain the instructions, as follows:
df = (df.assign(Letter=df['Col2'].str[0],
Number=df['Col2'].str[1:].astype(int))
.sort_values(by=['Col1', 'Number', 'Letter'])
.drop(['Letter', 'Number'], axis=1)
)
Result:
print(df)
Col1 Col2
2 2 A1
4 2 B1
8 2 A2
6 2 A12
0 2 B12
7 11 C1
3 11 B2
1 11 C2
5 11 C12
How can I sort strings in pandas data file according to the value in specific column?
For last pandas versions is possible use parameter key
with split values by _
and converting values to integers:
df_sorted = df.sort_values('Source', key=lambda x: x.str.split('_').str[1].astype(int))
Or is possible get positions of sorted values by Series.argsort
and pass to DataFrame.iloc
:
df_sorted = df.iloc[df['Source'].str.split('_').str[1].astype(int).argsort()]
print (df_sorted)
Source LogP MolWt HBA HBD
0 cne_1 1.1732 263.405 3 1
1 cne_10 2.6639 197.237 2 0
2 cne_100 -0.2886 170.193 4 2
1031 cne_995 3.0179 347.219 4 2
1032 cne_996 4.8419 407.495 6 2
1033 cne_997 3.3560 354.524 3 1
1034 cne_998 7.5465 635.316 4 2
1035 cne_999 3.3514 389.556 4 1
3 cne_1000 1.9644 304.709 5 1
4 cne_1001 1.4986 162.144 3 1
Sort a Pandas Dataframe by Multiple Columns Using Key Argument
You can split column one
into its constituent parts, add them as columns to the dataframe and then sort on them with column two
. Finally, remove the temporary columns.
>>> (df.assign(lhs=df['one'].str[0], rhs=df['one'].str[1:].astype(int))
.sort_values(['two', 'rhs', 'lhs'])
.drop(columns=['lhs', 'rhs']))
one two
5 A1 1
1 B1 1
3 A2 1
2 A1 2
4 B1 2
0 A2 2
Related Topics
Which Is the Easiest Way to Simulate Keyboard and Mouse on Python
How to Do Exponentiation in Python
Usage of Sys.Stdout.Flush() Method
How to Apply Piecewise Linear Fit in Python
How to Make a For-Loop Pyramid More Concise in Python
Python Pandas Max Value in a Group as a New Column
Doing Something Before Program Exit
Fast Replacement of Values in a Numpy Array
What Is the G Object in This Flask Code
Cannot Concatenate 'Str' and 'Float' Objects
How to Change Default Anaconda Python Environment
Could Not Find a Version That Satisfies the Requirement Tensorflow
Cleanest Way to Get Last Item from Python Iterator