Drop Data Frame Columns by Name

Delete a column from a Pandas DataFrame

As you've guessed, the right syntax is

del df['column_name']

It's difficult to make del df.column_name work simply as the result of syntactic limitations in Python. del df[name] gets translated to df.__delitem__(name) under the covers by Python.

Drop data frame columns by name

There's also the subset command, useful if you know which columns you want:

df <- data.frame(a = 1:10, b = 2:11, c = 3:12)
df <- subset(df, select = c(a, c))

UPDATED after comment by @hadley: To drop columns a,c you could do:

df <- subset(df, select = -c(a, c))

How to delete column name

In pandas by default need column names.

But if really want 'remove' columns what is strongly not recommended, because get duplicated column names is possible assign empty strings:

df.columns = [''] * len(df.columns)

But if need write df to file without columns and index add parameter header=False and index=False to to_csv or to_excel.

df.to_csv('file.csv', header=False, index=False)

df.to_excel('file.xlsx', header=False, index=False)

How to drop columns by name in a data frame

You should use either indexing or the subset function. For example :

R> df <- data.frame(x=1:5, y=2:6, z=3:7, u=4:8)
R> df
x y z u
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8

Then you can use the which function and the - operator in column indexation :

R> df[ , -which(names(df) %in% c("z","u"))]
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

Or, much simpler, use the select argument of the subset function : you can then use the - operator directly on a vector of column names, and you can even omit the quotes around the names !

R> subset(df, select=-c(z,u))
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

Note that you can also select the columns you want instead of dropping the others :

R> df[ , c("x","y")]
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

R> subset(df, select=c(x,y))
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

pandas dataframe drop column name None

Since None coincides with the default values of the arguments to DataFrame.drop, confusion arises and no drop happens.

A remedy is to supply a list with 1 element:

df = df.drop([None], axis=1)

or equivalently,

df = df.drop(columns=[None])

What is the best way to remove columns in pandas

Follow the doc:

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

And pandas.DataFrame.drop:

Drop specified labels from rows or columns.

So, I think we should stick with df.drop. Why? I think the pros are:

  1. It gives us more control of the remove action:

    # This will return a NEW DataFrame object, leave the original `df` untouched.
    df.drop('a', axis=1)
    # This will modify the `df` inplace. **And return a `None`**.
    df.drop('a', axis=1, inplace=True)
  2. It can handle more complicated cases with it's args. E.g. with level, we can handle MultiIndex deletion. And with errors, we can prevent some bugs.

  3. It's a more unified and object oriented way.


And just like @jezrael noted in his answer:

Option 1: Using key word del is a limited way.

Option 3: And df=df[['b','c']] isn't even a deletion in essence. It first select data by indexing with [] syntax, then unbind the name df with the original DataFrame and bind it with the new one (i.e. df[['b','c']]).

Deleting multiple columns based on column names in Pandas

I don't know what you mean by inefficient but if you mean in terms of typing it could be easier to just select the cols of interest and assign back to the df:

df = df[cols_of_interest]

Where cols_of_interest is a list of the columns you care about.

Or you can slice the columns and pass this to drop:

df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)

The call to head just selects 0 rows as we're only interested in the column names rather than data

update

Another method: It would be simpler to use the boolean mask from str.contains and invert it to mask the columns:

In [2]:
df = pd.DataFrame(columns=['a','Unnamed: 1', 'Unnamed: 1','foo'])
df

Out[2]:
Empty DataFrame
Columns: [a, Unnamed: 1, Unnamed: 1, foo]
Index: []

In [4]:
~df.columns.str.contains('Unnamed:')

Out[4]:
array([ True, False, False, True], dtype=bool)

In [5]:
df[df.columns[~df.columns.str.contains('Unnamed:')]]

Out[5]:
Empty DataFrame
Columns: [a, foo]
Index: []


Related Topics



Leave a reply



Submit