Remove an Entire Column from a Data.Frame in R

Remove an entire column from a data.frame in R

You can set it to NULL.

> Data$genome <- NULL
> head(Data)
chr region
1 chr1 CDS
2 chr1 exon
3 chr1 CDS
4 chr1 exon
5 chr1 CDS
6 chr1 exon

As pointed out in the comments, here are some other possibilities:

Data[2] <- NULL    # Wojciech Sobala
Data[[2]] <- NULL # same as above
Data <- Data[,-2] # Ian Fellows
Data <- Data[-2] # same as above

You can remove multiple columns via:

Data[1:2] <- list(NULL)  # Marek
Data[1:2] <- NULL # does not work!

Be careful with matrix-subsetting though, as you can end up with a vector:

Data <- Data[,-(2:3)]             # vector
Data <- Data[,-(2:3),drop=FALSE] # still a data.frame

Drop data frame columns by name

There's also the subset command, useful if you know which columns you want:

df <- data.frame(a = 1:10, b = 2:11, c = 3:12)
df <- subset(df, select = c(a, c))

UPDATED after comment by @hadley: To drop columns a,c you could do:

df <- subset(df, select = -c(a, c))

How do I delete columns in R data frame

We can use setdiff to get all the columns except the 'year' and 'category'.

 df1 <- df[setdiff(colnames(df), c('year', 'category'))]
df1
# vin make model
#1 1 A D
#2 2 B E
#3 3 C F

Including the comments from @Frank and @Ben Bolker.

We can assign the columns to NULL

  df$year <- NULL
df$category <- NULL

Or use subset from base R or select from dplyr

  subset(df, select=-c(year, category))
library(dplyr)
select(df, -year, -category)

data

df <- data.frame(vin=1:3, make=LETTERS[1:3], model=LETTERS[4:6],
year=1991:1993, category= paste0('GR', 1:3))

How do you delete a column by name in data.table?

Any of the following will remove column foo from the data.table df3:

# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]

df3[, c("foo","bar"):=NULL] # remove two columns

myVar = "foo"
df3[, (myVar):=NULL] # lookup myVar contents

# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]

# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]

data.table also supports the following syntax:

## Method 3 (could then assign to df3, 
df3[, !"foo"]

though if you were actually wanting to remove column "foo" from df3 (as opposed to just printing a view of df3 minus column "foo") you'd really want to use Method 1 instead.

(Do note that if you use a method relying on grep() or grepl(), you need to set pattern="^foo$" rather than "foo", if you don't want columns with names like "fool" and "buffoon" (i.e. those containing foo as a substring) to also be matched and removed.)

Less safe options, fine for interactive use:

The next two idioms will also work -- if df3 contains a column matching "foo" -- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar", you'll end up with a zero-row data.table.

As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo". For programming purposes (or if you are wanting to actually remove the column(s) from df3 rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.

# Method 4:
df3[, .SD, .SDcols = !patterns("^foo$")]

Lastly there are approaches using with=FALSE, though data.table is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:

# Method 5a (like Method 3)
df3[, !"foo", with=FALSE]
# Method 5b (like Method 4)
df3[, !grep("^foo$", names(df3)), with=FALSE]
# Method 5b (another like Method 4)
df3[, !grepl("^foo$", names(df3)), with=FALSE]

How to drop columns by name in a data frame

You should use either indexing or the subset function. For example :

R> df <- data.frame(x=1:5, y=2:6, z=3:7, u=4:8)
R> df
x y z u
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8

Then you can use the which function and the - operator in column indexation :

R> df[ , -which(names(df) %in% c("z","u"))]
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

Or, much simpler, use the select argument of the subset function : you can then use the - operator directly on a vector of column names, and you can even omit the quotes around the names !

R> subset(df, select=-c(z,u))
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

Note that you can also select the columns you want instead of dropping the others :

R> df[ , c("x","y")]
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

R> subset(df, select=c(x,y))
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

How to delete a column in R dataframe

This: y$B <- NULL removes column B from dataframe y.

How do you remove columns from a data.frame?

I use data.table's := operator to delete columns instantly regardless of the size of the table.

DT[, coltodelete := NULL]

or

DT[, c("col1","col20") := NULL]

or

DT[, (125:135) := NULL]

or

DT[, (variableHoldingNamesOrNumbers) := NULL]

Any solution using <- or subset will copy the whole table. data.table's := operator merely modifies the internal vector of pointers to the columns, in place. That operation is therefore (almost) instant.

Remove selected columns in R dataframe

###Use subset command:###
dataframename <- subset(dataframename, select = -c(col1,col4) )

###One more approach is you can use list(NULL) to the dataframe:###
dataframename[,c("col1","col4")] <- list(NULL)


Related Topics



Leave a reply



Submit