Sorting Columns in Pandas Dataframe Based on Column Name

Sorting columns in pandas dataframe based on column name

df = df.reindex(sorted(df.columns), axis=1)

This assumes that sorting the column names will give the order you want. If your column names won't sort lexicographically (e.g., if you want column Q10.3 to appear after Q9.1), you'll need to sort differently, but that has nothing to do with pandas.

Python/Pandas: Sort dataframe columns based on a column name

You can change order of columns like:

data = {'X1':  ['11', '12'],
'X2': ['21', '22'],
'X3': ['31', '32']
}

df = pd.DataFrame(data)
df
X1 X2 X3
0 11 21 31
1 12 22 32

df = df.reindex(['X3','X1','X2'], axis=1)
df

X3 X1 X2
0 31 11 21
1 32 12 22

Note : You need to provide desired order.

You can create a function to change order by given column:

def sorter(desired, df):
columns = df.columns.tolist()
columns.remove(desired)
columns.insert(0,desired)
return df.reindex(columns, axis=1)

sorter('X2',df)

X2 X1 X3
0 21 11 31
1 22 12 32

Re-ordering columns in pandas dataframe based on column header names, where the name of the columns are string, contains the numeric number at the end

It will be :

df = df.reindex(sorted(df.columns[1:], key=lambda x: int(x.split('_')[-1][1:])), axis=1)

Sort a subset of columns of a pandas dataframe alphabetically by column name

You can split your your dataframe based on column names, using normal indexing operator [], sort alphabetically the other columns using sort_index(axis=1), and concat back together:

>>> pd.concat([df[['subject','timepoint']],
df[df.columns.difference(['subject', 'timepoint'])]\
.sort_index(axis=1)],ignore_index=False,axis=1)

subject timepoint a b c d
0 1 1 2 2 2 2
1 1 2 3 3 3 3
2 1 3 4 4 4 4
3 1 4 5 5 5 5
4 1 5 6 6 6 6
5 1 6 7 7 7 7
6 2 1 3 3 3 3
7 2 2 4 4 4 4
8 2 3 1 1 1 1
9 2 4 2 2 2 2
10 2 5 3 3 3 3
11 2 6 4 4 4 4
12 3 1 5 5 5 5
13 3 2 4 4 4 4
14 3 4 5 5 5 5
15 4 1 8 8 8 8
16 4 2 4 4 4 4
17 4 3 5 5 5 5
18 4 4 6 6 6 6
19 4 5 2 2 2 2
20 4 6 3 3 3 3

Reordering Pandas Columns based on Column name

This type of sorting is called natural sorting. (There are more details in Naturally sorting Pandas DataFrame which demonstrates how to sort rows using natsort)

Setup with natsort

import pandas as pd
from natsort import natsorted

df = pd.DataFrame(columns=[f'company_{i}' for i in [5, 2, 3, 4, 1, 10]])

# Before column sort
print(df)

df = df.reindex(natsorted(df.columns), axis=1)

# After column sort
print(df)

Before sort:

Empty DataFrame
Columns: [company_5, company_2, company_3, company_4, company_1, company_10]
Index: []

After sort:

Empty DataFrame
Columns: [company_1, company_2, company_3, company_4, company_5, company_10]
Index: []

Compared to lexicographic sorting with sorted:

df = df.reindex(sorted(df.columns), axis=1)
Empty DataFrame
Columns: [company_1, company_10, company_2, company_3, company_4, company_5]
Index: []

Sorting pandas dataframe by column index instead of column name

sort_values is not an indexer but a method. You use it with [] instead of () but it doesn't seem to be the problem.

If you want to sort your dataframe by the second column whatever the name, use:

>>> df.sort_values(df.columns[1])
name score
1 Joe 3
0 Mary 10
2 Jessie 13

Pandas: How to sort dataframe on columns with same column labels

You can create temporary helper columns copying the 2 columns by position using iloc, sort by the temporary helper columns. Finally, drop the temporary helper columns, as follows:

df_test = df_test.assign(A=df_test.iloc[:, 0], B=df_test.iloc[:, 1]).sort_values(by=['A', 'B'], ascending=(False,False)).drop(['A', 'B'], axis=1)

Result:

print(df_test)

Region Region
2 San Francisco 4.0
4 San Francisco 1.0
1 Portland 1.0
0 Peninsula 2.0
3 Los Angeles 3.0

how to sort pandas dataframe from one column

Use sort_values to sort the df by a specific column's values:

In [18]:
df.sort_values('2')

Out[18]:
0 1 2
4 85.6 January 1.0
3 95.5 February 2.0
7 104.8 March 3.0
0 354.7 April 4.0
8 283.5 May 5.0
6 238.7 June 6.0
5 152.0 July 7.0
1 55.4 August 8.0
11 212.7 September 9.0
10 249.6 October 10.0
9 278.8 November 11.0
2 176.5 December 12.0

If you want to sort by two columns, pass a list of column labels to sort_values with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0']), the result would be sorted by column 2 then column 0. Granted, this does not really make sense for this example because each value in df['2'] is unique.



Related Topics



Leave a reply



Submit