Select Subset of Columns based on Vector R
Use %in%
:
names.use <- names(df)[!(names(df) %in% f)]
Then names.use
will contain the names of all the columns which are not contained in your vector of names f
.
To subset your data frame using the columns you want, you can use the following:
df.subset <- df[, names.use]
dplyr r : selecting columns whose names are in an external vector
We could use any_of
with select
library(dplyr)
data %>%
select(any_of(col_names))
-output
a b
1 1 e
2 4 e
3 13 f
4 8 m
5 10 z
6 3 y
...
How NOT to select columns using select() dplyr when you have character vector of colnames?
Edit: OP's actual question was about how to use a character vector to select or deselect columns from a dataframe. Use the one_of()
helper function for that:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
cols <- c("Petal.Length", "Sepal.Length")
select(iris, one_of(cols)) %>% colnames
# [1] "Petal.Length" "Sepal.Length"
select(iris, -one_of(cols)) %>% colnames
# [1] "Sepal.Width" "Petal.Width" "Species"
You should have a look at the select helpers (type ?select_helpers
) because they're incredibly useful. From the docs:
starts_with()
: starts with a prefix
ends_with()
: ends with a prefix
contains()
: contains a literal string
matches()
: matches a regular expression
num_range()
: a numerical range like x01, x02, x03.
one_of()
: variables in character vector.
everything()
: all variables.
Given a dataframe with columns names a:z, use select
like this:
select(-a, -b, -c, -d, -e)
# OR
select(-c(a, b, c, d, e))
# OR
select(-(a:e))
# OR if you want to keep b
select(-a, -(c:e))
# OR a different way to keep b, by just putting it back in
select(-(a:e), b)
So if I wanted to omit two of the columns from the iris
dataset, I could say:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
select(iris, -c(Sepal.Length, Petal.Length)) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
But of course, the best and most concise way to achieve that is using one of select
's helper functions:
select(iris, -ends_with(".Length")) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
P.S. It's weird that you are passing quoted values to dplyr
, one of its big niceties is that you don't have to keep typing out quotes all the time. As you can see, bare values work fine with dplyr
and ggplot2
.
dplyr: select all variables except for those contained in vector
select(df, -any_of(excluded_vars))
is now the safest way to do this (the code will not break if a variable name that doesn't exist in df is included in excluded_vars
)
Extracting specific columns from a data frame
Using the dplyr package, if your data.frame is called df1
:
library(dplyr)
df1 %>%
select(A, B, E)
This can also be written without the %>%
pipe as:
select(df1, A, B, E)
How to subset dataframe based on values that do not match in two columns in R?
You may use %in%
to select rows where response_id
is present in tweet_id
.
subset(df, response_id %in% unique(tweet_id))
# tweet_id response_id time
#1 1 2 22:10:47
#2 3 1 22:08:27
#3 4 3 21:54:49
#4 5 4 21:49:35
#5 6 5 21:46:23
#6 8 6 21:30:45
If you want to use dplyr
library(dplyr)
df %>% filter(response_id %in% unique(tweet_id))
how do you subset a data frame based on column names?
Saving your dataframe to a variable df:
df <-
structure(
list(
Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"),
Date = structure(
1:6,
.Label = c(
"7/13/2017 15:01",
"7/13/2017 15:02",
"7/13/2017 15:03",
"7/13/2017 15:04",
"7/13/2017 15:05",
"7/13/2017 15:06"
),
class = "factor"
),
Host_CPU = c(
1.812950134,
2.288070679,
1.563278198,
1.925239563,
5.350669861,
2.612503052
),
UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19,
38.22),
jvm1 = c(10.91, 11.13, 11.34, 11.56, 11.77, 11.99),
jvm2 = c(11.47, 11.7, 11.91, 12.13, 12.35, 12.57),
jvm3 = c(75.65,
76.88, 56.93, 58.99, 65.29, 67.97),
jvm4 = c(39.43, 40.86,
42.27, 43.71, 45.09, 45.33),
jvm5 = c(27.42, 29.63, 31.02,
32.37, 33.72, 37.71)
),
.Names = c(
"Server",
"Date",
"Host_CPU",
"UsedMemPercent",
"jvm1",
"jvm2",
"jvm3",
"jvm4",
"jvm5"
),
class = "data.frame",
row.names = c(NA,-6L)
)
df[,select]
should be what youre looking for
How can I select columns/rows with the opposite of the column/row names in R?
We can use setdiff
between column names (colnames
) and not_target_col
to get the column names which do not match with not_target_col
.
setdiff(colnames(mat), not_target_col)
#[1] "c" "d"
If we need to select those columns from the matrix
mat[, setdiff(colnames(mat), not_target_col)]
# c d
#[1,] 7 10
#[2,] 8 11
#[3,] 9 12
Related Topics
How to Match by Nearest Date from Two Data Frames
Is There a Logical Way to Think About List Indexing
Counting Number of Instances of a Condition Per Row R
Aggregate and Reshape from Long to Wide
How to Determine If Date Is a Weekend or Not (Not Using Lubridate)
How to Define More Line Types for Graphs in R (Custom Linetype)
Split One Row into Multiple Rows
Checking If Date Is Between Two Dates in R
Display Exact Value of a Variable in R
R: What Do You Call the :: and ::: Operators and How Do They Differ
Replace Empty Values with Value from Other Column in a Dataframe
Split the Title Onto Multiple Lines
Convert a Date Vector into Julian Day in R