Deleting Multiple Columns Based on Column Names in Pandas

Deleting multiple columns based on column names in Pandas

I don't know what you mean by inefficient but if you mean in terms of typing it could be easier to just select the cols of interest and assign back to the df:

df = df[cols_of_interest]

Where cols_of_interest is a list of the columns you care about.

Or you can slice the columns and pass this to drop:

df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)

The call to head just selects 0 rows as we're only interested in the column names rather than data

update

Another method: It would be simpler to use the boolean mask from str.contains and invert it to mask the columns:

In [2]:
df = pd.DataFrame(columns=['a','Unnamed: 1', 'Unnamed: 1','foo'])
df

Out[2]:
Empty DataFrame
Columns: [a, Unnamed: 1, Unnamed: 1, foo]
Index: []

In [4]:
~df.columns.str.contains('Unnamed:')

Out[4]:
array([ True, False, False, True], dtype=bool)

In [5]:
df[df.columns[~df.columns.str.contains('Unnamed:')]]

Out[5]:
Empty DataFrame
Columns: [a, foo]
Index: []

Pandas - Delete multiple columns based on column position

In your case doing numpy.r_ with iloc(Adding copy for prevent the future copy warning)

#import numpy as np
out = df.iloc[:,np.r_[3:6,7]].copy()

Dropping multiple columns in a pandas dataframe between two columns based on column names

You can use .loc with column range. For example if you have this dataframe:

   A  B  C  D  E
0 1 3 3 6 0
1 2 2 4 9 1
2 3 1 5 8 4

Then to delete columns B to D:

df = df.drop(columns=df.loc[:, "B":"D"].columns)
print(df)

Prints:

   A  E
0 1 0
1 2 1
2 3 4

How to drop multiple columns (using column names) from a dataframe using pandas?

df.drop(df.columns.to_series()["column_name_1":"column_name_2"], axis=1)

By converting to a series you can actually use a range to drop. You'd just need to know the column names.

Removing multiple columns with the same name except the first one?

Let df be a dataframe with two duplicated columns:

df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns=("a","a","b"))
# a a b
#0 1 2 3
#1 4 5 6
#2 7 8 9

Find out which column names are not duplicated, and keep them:

df1 = df.loc[:, ~df.columns.duplicated()]
# a b
#0 1 3
#1 4 6
#2 7 9

Dropping Multiple Columns from a dataframe

To delete multiple columns at the same time in pandas, you could specify the column names as shown below. The option inplace=True is needed if one wants the change affected column in the same dataframe. Otherwise remove it.

flight_data_copy.drop(['TailNum', 'OriginStateFips', 
'DestStateFips', 'Diverted'], axis=1, inplace=True)

Source: Python Pandas - Deleting multiple series from a data frame in one command

Delete a column from a Pandas DataFrame

As you've guessed, the right syntax is

del df['column_name']

It's difficult to make del df.column_name work simply as the result of syntactic limitations in Python. del df[name] gets translated to df.__delitem__(name) under the covers by Python.

Drop columns whose name contains a specific string from pandas DataFrame

import pandas as pd

import numpy as np

array=np.random.random((2,4))

df=pd.DataFrame(array, columns=('Test1', 'toto', 'test2', 'riri'))

print df

Test1 toto test2 riri
0 0.923249 0.572528 0.845464 0.144891
1 0.020438 0.332540 0.144455 0.741412

cols = [c for c in df.columns if c.lower()[:4] != 'test']

df=df[cols]

print df
toto riri
0 0.572528 0.144891
1 0.332540 0.741412


Related Topics



Leave a reply



Submit