Delete a column from a Pandas DataFrame
As you've guessed, the right syntax is
del df['column_name']
It's difficult to make del df.column_name
work simply as the result of syntactic limitations in Python. del df[name]
gets translated to df.__delitem__(name)
under the covers by Python.
What is the best way to remove columns in pandas
Follow the doc:
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
And pandas.DataFrame.drop
:
Drop specified labels from rows or columns.
So, I think we should stick with df.drop
. Why? I think the pros are:
It gives us more control of the remove action:
# This will return a NEW DataFrame object, leave the original `df` untouched.
df.drop('a', axis=1)
# This will modify the `df` inplace. **And return a `None`**.
df.drop('a', axis=1, inplace=True)It can handle more complicated cases with it's args. E.g. with
level
, we can handle MultiIndex deletion. And witherrors
, we can prevent some bugs.It's a more unified and object oriented way.
And just like @jezrael noted in his answer:
Option 1: Using key word del
is a limited way.
Option 3: And df=df[['b','c']]
isn't even a deletion in essence. It first select data by indexing with []
syntax, then unbind the name df
with the original DataFrame and bind it with the new one (i.e. df[['b','c']]
).
How to delete a column from a data frame with pandas?
To actually delete the column
del df['id']
or df.drop('id', 1)
should have worked if the passed column matches exactly
However, if you don't need to delete the column then you can just select the column of interest like so:
In [54]:
df['text']
Out[54]:
0 text1
1 text2
2 textn
Name: text, dtype: object
If you never wanted it in the first place then you pass a list of cols to read_csv
as a param usecols
:
In [53]:
import io
temp="""id text
363.327 text1
366.356 text2
37782 textn"""
df = pd.read_csv(io.StringIO(temp), delimiter='\s+', usecols=['text'])
df
Out[53]:
text
0 text1
1 text2
2 textn
Regarding your error it's because 'id'
is not in your columns or that it's spelt differently or has whitespace. To check this look at the output from print(df.columns.tolist())
this will output a list of the columns and will show if you have any leading/trailing whitespace.
Removing a column permanently from a data frame in Python
You have to assign it back to mydf
, if you want to reach a permanent change, i.e. do
mydf = mydf.drop('Z', axis=1)
instead.
Remove an entire column from a data.frame in R
You can set it to NULL
.
> Data$genome <- NULL
> head(Data)
chr region
1 chr1 CDS
2 chr1 exon
3 chr1 CDS
4 chr1 exon
5 chr1 CDS
6 chr1 exon
As pointed out in the comments, here are some other possibilities:
Data[2] <- NULL # Wojciech Sobala
Data[[2]] <- NULL # same as above
Data <- Data[,-2] # Ian Fellows
Data <- Data[-2] # same as above
You can remove multiple columns via:
Data[1:2] <- list(NULL) # Marek
Data[1:2] <- NULL # does not work!
Be careful with matrix-subsetting though, as you can end up with a vector:
Data <- Data[,-(2:3)] # vector
Data <- Data[,-(2:3),drop=FALSE] # still a data.frame
How to remove a list of columns from pydatatable dataframe?
Removing columns (or rows) from a Frame is easy: take any syntax that you would normally use to select those columns, and then append the python del
keyword.
Thus, if you want to delete columns 'id'
, 'country'
, and 'egg'
, run
>>> del comidas_gen_dt[:, ['id','country','egg']]
>>> comidas_gen_dt
| veg fork beef
-- + --- ---- ----
0 | 30 5 90
1 | 40 10 50
2 | 10 2 20
3 | 3 1 NA
4 | 5 9 4
[5 rows x 3 columns]
If you want to keep the original frame unmodified, and then select a new frame with some of the columns removed, then the easiest way would be to first copy the frame, and then use the del
operation:
>>> DT = comidas_gen_dt.copy()
>>> del DT[:, columns_to_remove]
(note that .copy()
makes a shallow copy, i.e. its cost is typically negligible).
You can also use the f[:].remove()
approach. It's a bit strange that it didn't work the way you've written it, but going from a list of strings to a list of f
-symbols is quite straightforward:
def pydt_remove_cols(DT, *rmcols):
return DT[:, f[:].remove([f[col] for col in rmcols])]
Here I use the fact that f.A
is the same as f["A"]
, where the inner string "A"
might as well be replaced with any variable.
Related Topics
Format for Ordinal Dates (Day of Month with Suffixes -St, -Nd, -Rd, -Th)
Fill Missing Combinations in a Dataframe
Convert a Dataframe to Presence Absence Matrix
How to Learn R as a Programming Language
What Does "S3 Methods" Mean in R
How to Fit a Smooth Curve to My Data in R
Factors in R: More Than an Annoyance
Select First Element of Nested List
How to Reorder Data.Table Columns (Without Copying)
Create Empty Data Frame with Column Names by Assigning a String Vector
R: += (Plus Equals) and ++ (Plus Plus) Equivalent from C++/C#/Java, etc.
Ggplot2 Heatmaps: Using Different Gradients for Categories
Databricks Configure Using Cmd and R
Embedded Nul in String' Error When Importing CSV with Fread