How to Remove Columns from a Data.Frame by Data Type

How to drop DataFrame columns based on dtype

You can use select_dtypes to exclude columns of a particular type.

import pandas as pd

df = pd.DataFrame({'x': ['a', 'b', 'c'], 'y': [1, 2, 3], 'z': ['d', 'e', 'f']})

df = df.select_dtypes(exclude=['object'])
print(df)

How to remove columns from a data.frame by data type?

Assuming a generic data.frame this will remove columns of type factor

df[,-which(sapply(df, class) == "factor")]

EDIT

As per @Roland's suggestion, you can also just keep those which are not factor. Whichever you prefer.

df[, sapply(df, class) != "factor"]

EDIT 2

As you are concerned with the cor function, @Ista also points out that it would be safer in that particular instance to filter on is.numeric. The above are only to remove factor types.

df[,sapply(df, is.numeric)]

Drop columns from a DataFrame based on their data types

Use exclude parameter in DataFrame.select_dtypes:

df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1.3,3,5,7,1,0],

})

print (df.dtypes)
A object
B int64
C int64
D float64
dtype: object

print (df.select_dtypes(exclude='int64'))
A D
0 a 1.3
1 b 3.0
2 c 5.0
3 d 7.0
4 e 1.0
5 f 0.0

Remove columns from data.frame that are of type list

If you need to use logical indexing:

  df[,!purrr::map_lgl(df,is.list)] %>% 
names()
[1] "CATEGORY" "BIBTEXKEY" "ADDRESS" "ANNOTE" "BOOKTITLE"
[6] "CHAPTER" "CROSSREF" "EDITION" "HOWPUBLISHED" "INSTITUTION"
[11] "JOURNAL" "KEY" "MONTH" "NOTE" "NUMBER"
[16] "ORGANIZATION" "PAGES" "PUBLISHER" "SCHOOL" "SERIES"
[21] "TITLE" "TYPE" "VOLUME" "YEAR" "ISSN"
[26] "DOI" "ISBN" "URL"

You can also do df %>% select_if(Negate(is.list))

Also, As mentioned by @akrun,
You can simply use discard from purrr:

purrr::discard(dat, is.list) 

Or as @markus points out, we can use keep and negate:

keep(dat, negate(is.list))

Otherwise:

We can unnest:

library(tidyverse)
df %>%
unnest(AUTHOR) %>%
select(-AUTHOR)

What is the best way to remove columns in pandas

Follow the doc:

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

And pandas.DataFrame.drop:

Drop specified labels from rows or columns.

So, I think we should stick with df.drop. Why? I think the pros are:

  1. It gives us more control of the remove action:

    # This will return a NEW DataFrame object, leave the original `df` untouched.
    df.drop('a', axis=1)
    # This will modify the `df` inplace. **And return a `None`**.
    df.drop('a', axis=1, inplace=True)
  2. It can handle more complicated cases with it's args. E.g. with level, we can handle MultiIndex deletion. And with errors, we can prevent some bugs.

  3. It's a more unified and object oriented way.


And just like @jezrael noted in his answer:

Option 1: Using key word del is a limited way.

Option 3: And df=df[['b','c']] isn't even a deletion in essence. It first select data by indexing with [] syntax, then unbind the name df with the original DataFrame and bind it with the new one (i.e. df[['b','c']]).

Drop data frame columns by name

There's also the subset command, useful if you know which columns you want:

df <- data.frame(a = 1:10, b = 2:11, c = 3:12)
df <- subset(df, select = c(a, c))

UPDATED after comment by @hadley: To drop columns a,c you could do:

df <- subset(df, select = -c(a, c))

How do you remove columns from a data.frame?

I use data.table's := operator to delete columns instantly regardless of the size of the table.

DT[, coltodelete := NULL]

or

DT[, c("col1","col20") := NULL]

or

DT[, (125:135) := NULL]

or

DT[, (variableHoldingNamesOrNumbers) := NULL]

Any solution using <- or subset will copy the whole table. data.table's := operator merely modifies the internal vector of pointers to the columns, in place. That operation is therefore (almost) instant.

How do I delete columns in R data frame

We can use setdiff to get all the columns except the 'year' and 'category'.

 df1 <- df[setdiff(colnames(df), c('year', 'category'))]
df1
# vin make model
#1 1 A D
#2 2 B E
#3 3 C F

Including the comments from @Frank and @Ben Bolker.

We can assign the columns to NULL

  df$year <- NULL
df$category <- NULL

Or use subset from base R or select from dplyr

  subset(df, select=-c(year, category))
library(dplyr)
select(df, -year, -category)

data

df <- data.frame(vin=1:3, make=LETTERS[1:3], model=LETTERS[4:6],
year=1991:1993, category= paste0('GR', 1:3))

Delete a column from a Pandas DataFrame

As you've guessed, the right syntax is

del df['column_name']

It's difficult to make del df.column_name work simply as the result of syntactic limitations in Python. del df[name] gets translated to df.__delitem__(name) under the covers by Python.

How to delete a column from a data frame with pandas?

To actually delete the column

del df['id'] or df.drop('id', 1) should have worked if the passed column matches exactly

However, if you don't need to delete the column then you can just select the column of interest like so:

In [54]:

df['text']
Out[54]:
0 text1
1 text2
2 textn
Name: text, dtype: object

If you never wanted it in the first place then you pass a list of cols to read_csv as a param usecols:

In [53]:
import io
temp="""id text
363.327 text1
366.356 text2
37782 textn"""
df = pd.read_csv(io.StringIO(temp), delimiter='\s+', usecols=['text'])
df
Out[53]:
text
0 text1
1 text2
2 textn

Regarding your error it's because 'id' is not in your columns or that it's spelt differently or has whitespace. To check this look at the output from print(df.columns.tolist()) this will output a list of the columns and will show if you have any leading/trailing whitespace.



Related Topics



Leave a reply



Submit