How to Remove Columns from a Data.Frame

Delete a column from a Pandas DataFrame

As you've guessed, the right syntax is

del df['column_name']

It's difficult to make del df.column_name work simply as the result of syntactic limitations in Python. del df[name] gets translated to df.__delitem__(name) under the covers by Python.

What is the best way to remove columns in pandas

Follow the doc:

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

And pandas.DataFrame.drop:

Drop specified labels from rows or columns.

So, I think we should stick with df.drop. Why? I think the pros are:

  1. It gives us more control of the remove action:

    # This will return a NEW DataFrame object, leave the original `df` untouched.
    df.drop('a', axis=1)
    # This will modify the `df` inplace. **And return a `None`**.
    df.drop('a', axis=1, inplace=True)
  2. It can handle more complicated cases with it's args. E.g. with level, we can handle MultiIndex deletion. And with errors, we can prevent some bugs.

  3. It's a more unified and object oriented way.


And just like @jezrael noted in his answer:

Option 1: Using key word del is a limited way.

Option 3: And df=df[['b','c']] isn't even a deletion in essence. It first select data by indexing with [] syntax, then unbind the name df with the original DataFrame and bind it with the new one (i.e. df[['b','c']]).

How to delete a column from a data frame with pandas?

To actually delete the column

del df['id'] or df.drop('id', 1) should have worked if the passed column matches exactly

However, if you don't need to delete the column then you can just select the column of interest like so:

In [54]:

df['text']
Out[54]:
0 text1
1 text2
2 textn
Name: text, dtype: object

If you never wanted it in the first place then you pass a list of cols to read_csv as a param usecols:

In [53]:
import io
temp="""id text
363.327 text1
366.356 text2
37782 textn"""
df = pd.read_csv(io.StringIO(temp), delimiter='\s+', usecols=['text'])
df
Out[53]:
text
0 text1
1 text2
2 textn

Regarding your error it's because 'id' is not in your columns or that it's spelt differently or has whitespace. To check this look at the output from print(df.columns.tolist()) this will output a list of the columns and will show if you have any leading/trailing whitespace.

Removing a column permanently from a data frame in Python

You have to assign it back to mydf, if you want to reach a permanent change, i.e. do

mydf = mydf.drop('Z', axis=1)

instead.

Remove an entire column from a data.frame in R

You can set it to NULL.

> Data$genome <- NULL
> head(Data)
chr region
1 chr1 CDS
2 chr1 exon
3 chr1 CDS
4 chr1 exon
5 chr1 CDS
6 chr1 exon

As pointed out in the comments, here are some other possibilities:

Data[2] <- NULL    # Wojciech Sobala
Data[[2]] <- NULL # same as above
Data <- Data[,-2] # Ian Fellows
Data <- Data[-2] # same as above

You can remove multiple columns via:

Data[1:2] <- list(NULL)  # Marek
Data[1:2] <- NULL # does not work!

Be careful with matrix-subsetting though, as you can end up with a vector:

Data <- Data[,-(2:3)]             # vector
Data <- Data[,-(2:3),drop=FALSE] # still a data.frame

How to remove a list of columns from pydatatable dataframe?

Removing columns (or rows) from a Frame is easy: take any syntax that you would normally use to select those columns, and then append the python del keyword.

Thus, if you want to delete columns 'id', 'country', and 'egg', run

>>> del comidas_gen_dt[:, ['id','country','egg']]
>>> comidas_gen_dt
| veg fork beef
-- + --- ---- ----
0 | 30 5 90
1 | 40 10 50
2 | 10 2 20
3 | 3 1 NA
4 | 5 9 4

[5 rows x 3 columns]

If you want to keep the original frame unmodified, and then select a new frame with some of the columns removed, then the easiest way would be to first copy the frame, and then use the del operation:

>>> DT = comidas_gen_dt.copy()
>>> del DT[:, columns_to_remove]

(note that .copy() makes a shallow copy, i.e. its cost is typically negligible).

You can also use the f[:].remove() approach. It's a bit strange that it didn't work the way you've written it, but going from a list of strings to a list of f-symbols is quite straightforward:

def pydt_remove_cols(DT, *rmcols):
return DT[:, f[:].remove([f[col] for col in rmcols])]

Here I use the fact that f.A is the same as f["A"], where the inner string "A" might as well be replaced with any variable.



Related Topics



Leave a reply



Submit