Delete a column from a Pandas DataFrame
As you've guessed, the right syntax is
del df['column_name']
It's difficult to make del df.column_name
work simply as the result of syntactic limitations in Python. del df[name]
gets translated to df.__delitem__(name)
under the covers by Python.
Drop data frame columns by name
There's also the subset
command, useful if you know which columns you want:
df <- data.frame(a = 1:10, b = 2:11, c = 3:12)
df <- subset(df, select = c(a, c))
UPDATED after comment by @hadley: To drop columns a,c you could do:
df <- subset(df, select = -c(a, c))
How to delete column name
In pandas by default need column names.
But if really want 'remove'
columns what is strongly not recommended, because get duplicated column names is possible assign empty strings:
df.columns = [''] * len(df.columns)
But if need write df
to file without columns and index add parameter header=False
and index=False
to to_csv
or to_excel
.
df.to_csv('file.csv', header=False, index=False)
df.to_excel('file.xlsx', header=False, index=False)
How to drop columns by name in a data frame
You should use either indexing or the subset
function. For example :
R> df <- data.frame(x=1:5, y=2:6, z=3:7, u=4:8)
R> df
x y z u
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8
Then you can use the which
function and the -
operator in column indexation :
R> df[ , -which(names(df) %in% c("z","u"))]
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
Or, much simpler, use the select
argument of the subset
function : you can then use the -
operator directly on a vector of column names, and you can even omit the quotes around the names !
R> subset(df, select=-c(z,u))
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
Note that you can also select the columns you want instead of dropping the others :
R> df[ , c("x","y")]
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
R> subset(df, select=c(x,y))
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
pandas dataframe drop column name None
Since None
coincides with the default values of the arguments to DataFrame.drop
, confusion arises and no drop happens.
A remedy is to supply a list with 1 element:
df = df.drop([None], axis=1)
or equivalently,
df = df.drop(columns=[None])
What is the best way to remove columns in pandas
Follow the doc:
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
And pandas.DataFrame.drop
:
Drop specified labels from rows or columns.
So, I think we should stick with df.drop
. Why? I think the pros are:
It gives us more control of the remove action:
# This will return a NEW DataFrame object, leave the original `df` untouched.
df.drop('a', axis=1)
# This will modify the `df` inplace. **And return a `None`**.
df.drop('a', axis=1, inplace=True)It can handle more complicated cases with it's args. E.g. with
level
, we can handle MultiIndex deletion. And witherrors
, we can prevent some bugs.It's a more unified and object oriented way.
And just like @jezrael noted in his answer:
Option 1: Using key word del
is a limited way.
Option 3: And df=df[['b','c']]
isn't even a deletion in essence. It first select data by indexing with []
syntax, then unbind the name df
with the original DataFrame and bind it with the new one (i.e. df[['b','c']]
).
Deleting multiple columns based on column names in Pandas
I don't know what you mean by inefficient but if you mean in terms of typing it could be easier to just select the cols of interest and assign back to the df:
df = df[cols_of_interest]
Where cols_of_interest
is a list of the columns you care about.
Or you can slice the columns and pass this to drop
:
df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)
The call to head
just selects 0 rows as we're only interested in the column names rather than data
update
Another method: It would be simpler to use the boolean mask from str.contains
and invert it to mask the columns:
In [2]:
df = pd.DataFrame(columns=['a','Unnamed: 1', 'Unnamed: 1','foo'])
df
Out[2]:
Empty DataFrame
Columns: [a, Unnamed: 1, Unnamed: 1, foo]
Index: []
In [4]:
~df.columns.str.contains('Unnamed:')
Out[4]:
array([ True, False, False, True], dtype=bool)
In [5]:
df[df.columns[~df.columns.str.contains('Unnamed:')]]
Out[5]:
Empty DataFrame
Columns: [a, foo]
Index: []
Related Topics
How to Test When Condition Returns Numeric(0) in R
Converting Data Frame into a List of Lists in R
Selecting Only Duplicates Based on Multiple Columns in R
Subtracting Two Columns to Give a New Column in R
I Want to Split Street Address into Two Columns. One With Street Number Other With Street Name
How to Generate the First N Terms in the Series:
Split Data Frame String Column into Multiple Columns
How to Implement Coalesce Efficiently in R
How to Select the Rows With Maximum Values in Each Group With Dplyr
How to Remove All Duplicates So That None Are Left in a Data Frame
Interpreting "Condition Has Length ≫ 1" Warning from 'If' Function
How to Plot With 2 Different Y-Axes
Can Lists Be Created That Name Themselves Based on Input Object Names
Force R Not to Use Exponential Notation (E.G. E+10)
Get the Difference Between Dates in Terms of Weeks, Months, Quarters, and Years