Selecting Columns in R Data Frame Based on Those *Not* in a Vector

Select Subset of Columns based on Vector R

Use %in%:

names.use <- names(df)[!(names(df) %in% f)]

Then names.use will contain the names of all the columns which are not contained in your vector of names f.

To subset your data frame using the columns you want, you can use the following:

df.subset <- df[, names.use]

dplyr r : selecting columns whose names are in an external vector

We could use any_of with select

library(dplyr)
data %>%
select(any_of(col_names))

-output

 a b
1 1 e
2 4 e
3 13 f
4 8 m
5 10 z
6 3 y
...

How NOT to select columns using select() dplyr when you have character vector of colnames?

Edit: OP's actual question was about how to use a character vector to select or deselect columns from a dataframe. Use the one_of() helper function for that:

colnames(iris)

# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

cols <- c("Petal.Length", "Sepal.Length")

select(iris, one_of(cols)) %>% colnames

# [1] "Petal.Length" "Sepal.Length"

select(iris, -one_of(cols)) %>% colnames

# [1] "Sepal.Width" "Petal.Width" "Species"

You should have a look at the select helpers (type ?select_helpers) because they're incredibly useful. From the docs:

starts_with(): starts with a prefix

ends_with(): ends with a prefix

contains(): contains a literal string

matches(): matches a regular expression

num_range(): a numerical range like x01, x02, x03.

one_of(): variables in character vector.

everything(): all variables.


Given a dataframe with columns names a:z, use select like this:

select(-a, -b, -c, -d, -e)

# OR

select(-c(a, b, c, d, e))

# OR

select(-(a:e))

# OR if you want to keep b

select(-a, -(c:e))

# OR a different way to keep b, by just putting it back in

select(-(a:e), b)

So if I wanted to omit two of the columns from the iris dataset, I could say:

colnames(iris)

# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

select(iris, -c(Sepal.Length, Petal.Length)) %>% colnames()

# [1] "Sepal.Width" "Petal.Width" "Species"

But of course, the best and most concise way to achieve that is using one of select's helper functions:

select(iris, -ends_with(".Length")) %>% colnames()

# [1] "Sepal.Width" "Petal.Width" "Species"

P.S. It's weird that you are passing quoted values to dplyr, one of its big niceties is that you don't have to keep typing out quotes all the time. As you can see, bare values work fine with dplyr and ggplot2.

dplyr: select all variables except for those contained in vector

select(df, -any_of(excluded_vars))
is now the safest way to do this (the code will not break if a variable name that doesn't exist in df is included in excluded_vars)

Extracting specific columns from a data frame

Using the dplyr package, if your data.frame is called df1:

library(dplyr)

df1 %>%
select(A, B, E)

This can also be written without the %>% pipe as:

select(df1, A, B, E)

How to subset dataframe based on values that do not match in two columns in R?

You may use %in% to select rows where response_id is present in tweet_id.

subset(df, response_id %in% unique(tweet_id))

# tweet_id response_id time
#1 1 2 22:10:47
#2 3 1 22:08:27
#3 4 3 21:54:49
#4 5 4 21:49:35
#5 6 5 21:46:23
#6 8 6 21:30:45

If you want to use dplyr

library(dplyr)
df %>% filter(response_id %in% unique(tweet_id))

how do you subset a data frame based on column names?

Saving your dataframe to a variable df:

df <-
structure(
list(
Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"),
Date = structure(
1:6,
.Label = c(
"7/13/2017 15:01",
"7/13/2017 15:02",
"7/13/2017 15:03",
"7/13/2017 15:04",
"7/13/2017 15:05",
"7/13/2017 15:06"
),
class = "factor"
),
Host_CPU = c(
1.812950134,
2.288070679,
1.563278198,
1.925239563,
5.350669861,
2.612503052
),
UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19,
38.22),
jvm1 = c(10.91, 11.13, 11.34, 11.56, 11.77, 11.99),
jvm2 = c(11.47, 11.7, 11.91, 12.13, 12.35, 12.57),
jvm3 = c(75.65,
76.88, 56.93, 58.99, 65.29, 67.97),
jvm4 = c(39.43, 40.86,
42.27, 43.71, 45.09, 45.33),
jvm5 = c(27.42, 29.63, 31.02,
32.37, 33.72, 37.71)
),
.Names = c(
"Server",
"Date",
"Host_CPU",
"UsedMemPercent",
"jvm1",
"jvm2",
"jvm3",
"jvm4",
"jvm5"
),
class = "data.frame",
row.names = c(NA,-6L)
)

df[,select] should be what youre looking for

How can I select columns/rows with the opposite of the column/row names in R?

We can use setdiff between column names (colnames) and not_target_col to get the column names which do not match with not_target_col.

setdiff(colnames(mat), not_target_col)
#[1] "c" "d"

If we need to select those columns from the matrix

mat[, setdiff(colnames(mat), not_target_col)]

# c d
#[1,] 7 10
#[2,] 8 11
#[3,] 9 12


Related Topics



Leave a reply



Submit