How to drop DataFrame columns based on dtype
You can use select_dtypes
to exclude columns of a particular type.
import pandas as pd
df = pd.DataFrame({'x': ['a', 'b', 'c'], 'y': [1, 2, 3], 'z': ['d', 'e', 'f']})
df = df.select_dtypes(exclude=['object'])
print(df)
How to remove columns from a data.frame by data type?
Assuming a generic data.frame
this will remove columns of type factor
df[,-which(sapply(df, class) == "factor")]
EDIT
As per @Roland's suggestion, you can also just keep those which are not factor
. Whichever you prefer.
df[, sapply(df, class) != "factor"]
EDIT 2
As you are concerned with the cor
function, @Ista also points out that it would be safer in that particular instance to filter on is.numeric
. The above are only to remove factor
types.
df[,sapply(df, is.numeric)]
Drop columns from a DataFrame based on their data types
Use exclude
parameter in DataFrame.select_dtypes
:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1.3,3,5,7,1,0],
})
print (df.dtypes)
A object
B int64
C int64
D float64
dtype: object
print (df.select_dtypes(exclude='int64'))
A D
0 a 1.3
1 b 3.0
2 c 5.0
3 d 7.0
4 e 1.0
5 f 0.0
Remove columns from data.frame that are of type list
If you need to use logical indexing:
df[,!purrr::map_lgl(df,is.list)] %>%
names()
[1] "CATEGORY" "BIBTEXKEY" "ADDRESS" "ANNOTE" "BOOKTITLE"
[6] "CHAPTER" "CROSSREF" "EDITION" "HOWPUBLISHED" "INSTITUTION"
[11] "JOURNAL" "KEY" "MONTH" "NOTE" "NUMBER"
[16] "ORGANIZATION" "PAGES" "PUBLISHER" "SCHOOL" "SERIES"
[21] "TITLE" "TYPE" "VOLUME" "YEAR" "ISSN"
[26] "DOI" "ISBN" "URL"
You can also do df %>% select_if(Negate(is.list))
Also, As mentioned by @akrun,
You can simply use discard
from purrr
:
purrr::discard(dat, is.list)
Or as @markus points out, we can use keep
and negate
:
keep(dat, negate(is.list))
Otherwise:
We can unnest:
library(tidyverse)
df %>%
unnest(AUTHOR) %>%
select(-AUTHOR)
What is the best way to remove columns in pandas
Follow the doc:
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
And pandas.DataFrame.drop
:
Drop specified labels from rows or columns.
So, I think we should stick with df.drop
. Why? I think the pros are:
It gives us more control of the remove action:
# This will return a NEW DataFrame object, leave the original `df` untouched.
df.drop('a', axis=1)
# This will modify the `df` inplace. **And return a `None`**.
df.drop('a', axis=1, inplace=True)It can handle more complicated cases with it's args. E.g. with
level
, we can handle MultiIndex deletion. And witherrors
, we can prevent some bugs.It's a more unified and object oriented way.
And just like @jezrael noted in his answer:
Option 1: Using key word del
is a limited way.
Option 3: And df=df[['b','c']]
isn't even a deletion in essence. It first select data by indexing with []
syntax, then unbind the name df
with the original DataFrame and bind it with the new one (i.e. df[['b','c']]
).
Drop data frame columns by name
There's also the subset
command, useful if you know which columns you want:
df <- data.frame(a = 1:10, b = 2:11, c = 3:12)
df <- subset(df, select = c(a, c))
UPDATED after comment by @hadley: To drop columns a,c you could do:
df <- subset(df, select = -c(a, c))
How do you remove columns from a data.frame?
I use data.table's :=
operator to delete columns instantly regardless of the size of the table.
DT[, coltodelete := NULL]
or
DT[, c("col1","col20") := NULL]
or
DT[, (125:135) := NULL]
or
DT[, (variableHoldingNamesOrNumbers) := NULL]
Any solution using <-
or subset
will copy the whole table. data.table's :=
operator merely modifies the internal vector of pointers to the columns, in place. That operation is therefore (almost) instant.
How do I delete columns in R data frame
We can use setdiff
to get all the columns except the 'year' and 'category'.
df1 <- df[setdiff(colnames(df), c('year', 'category'))]
df1
# vin make model
#1 1 A D
#2 2 B E
#3 3 C F
Including the comments from @Frank and @Ben Bolker.
We can assign the columns to NULL
df$year <- NULL
df$category <- NULL
Or use subset
from base R
or select
from dplyr
subset(df, select=-c(year, category))
library(dplyr)
select(df, -year, -category)
data
df <- data.frame(vin=1:3, make=LETTERS[1:3], model=LETTERS[4:6],
year=1991:1993, category= paste0('GR', 1:3))
Delete a column from a Pandas DataFrame
As you've guessed, the right syntax is
del df['column_name']
It's difficult to make del df.column_name
work simply as the result of syntactic limitations in Python. del df[name]
gets translated to df.__delitem__(name)
under the covers by Python.
How to delete a column from a data frame with pandas?
To actually delete the column
del df['id']
or df.drop('id', 1)
should have worked if the passed column matches exactly
However, if you don't need to delete the column then you can just select the column of interest like so:
In [54]:
df['text']
Out[54]:
0 text1
1 text2
2 textn
Name: text, dtype: object
If you never wanted it in the first place then you pass a list of cols to read_csv
as a param usecols
:
In [53]:
import io
temp="""id text
363.327 text1
366.356 text2
37782 textn"""
df = pd.read_csv(io.StringIO(temp), delimiter='\s+', usecols=['text'])
df
Out[53]:
text
0 text1
1 text2
2 textn
Regarding your error it's because 'id'
is not in your columns or that it's spelt differently or has whitespace. To check this look at the output from print(df.columns.tolist())
this will output a list of the columns and will show if you have any leading/trailing whitespace.
Related Topics
Find Matches of a Vector of Strings in Another Vector of Strings
Loop Through a Series of Qplots
From Long to Wide Data with Multiple Columns
How to Preserve Continuous (1,2,3,...N) Ranking Notation When Ranking in R
R - Replace Specific Value Contents with Na
Convert String of Anyformat into Dd-Mm-Yy Hh:Mm:Ss in R
Delete Rows with Less Than 7 Characters
Arranging Rows in Custom Order Using Dplyr
Truncate Decimal to Specified Places
Extracting Data from Text Files
Extract Last Non-Missing Value in Row with Data.Table