How to Drop Columns by Name Pattern in R

How to drop columns with column names that contain specific string?

Using dplyr:

library(dplyr)

df %>%
select(-contains(c("epm", "enn", "jkk")))
#> name agelk
#> 1 Jon 23
#> 2 Bill 41
#> 3 Maria 32

Drop data frame columns by name

There's also the subset command, useful if you know which columns you want:

df <- data.frame(a = 1:10, b = 2:11, c = 3:12)
df <- subset(df, select = c(a, c))

UPDATED after comment by @hadley: To drop columns a,c you could do:

df <- subset(df, select = -c(a, c))

Drop columns and order the data by a specific columns' names

Solution with data.table

Since you're using data.table, here you can find a full data.table solution:

library(data.table)

# get files this way: it is preferable not to use setwd()
files <- list.files(dir, pattern = ".csv$", full.names = TRUE)

# fastest way to read your csv.
# drop here the column you don't want (I assumed it was ExamTitle)
# add id to reshape later
dt <- rbindlist(lapply(files, fread, drop = "ExamTitle"), idcol = "id")

# reshape with `data.table`
dcast(dt, id ~ ParametersName, value.var = "ParametersValue")
#> id CervicalLordosisDepth_SAG SagittalImbalance_SAG TrunkInclination_VPDM_SAG
#> 1: 1 30 -4 -0,49
#> 2: 2 30 -4 -0,49
#> TrunkLenght_VPDM_SAG
#> 1: 446
#> 2: 446

Solution with tidyverse

You can also use tidyverse. It depends on you and your project.

library(tidyverse)

# read and bind dataframes, add id
map_df(files, read_csv2, .id = "id") %>%

# remove column
select(-ExamTitle) %>%

# reshape
pivot_wider(names_from = ParametersName, values_from = ParametersValue)
#> # A tibble: 2 x 5
#> id TrunkLenght_VPDM_SAG TrunkInclination~ SagittalImbalan~ CervicalLordosi~
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 446 -0.49 -4 30
#> 2 2 446 -0.49 -4 30

Solution with Base R

And to conclude, you can also solve your problem with a one-line solution from base R

unstack(Reduce(rbind, lapply(files, read.csv2)), form = ParametersValue ~ ParametersName)
#> CervicalLordosisDepth_SAG SagittalImbalance_SAG TrunkInclination_VPDM_SAG TrunkLenght_VPDM_SAG
#> 1 30 -4 -0.49 446
#> 2 16 -4 -0.49 446

Reproducible example

Here, I'll leave a simple reproducible example to run my code.

dir <- tempdir()
write("ExamTitle;ParametersName;ParametersValue
Titolo nuovo esame;TrunkLenght_VPDM_SAG;446
Titolo nuovo esame;TrunkInclination_VPDM_SAG;-0,49
Titolo nuovo esame;SagittalImbalance_SAG;-4
Titolo nuovo esame;CervicalLordosisDepth_SAG;30",
file = file.path(dir, "tmp1.csv"))
write("ExamTitle;ParametersName;ParametersValue
Titolo nuovo esame;TrunkLenght_VPDM_SAG;446
Titolo nuovo esame;TrunkInclination_VPDM_SAG;-0,49
Titolo nuovo esame;SagittalImbalance_SAG;-4
Titolo nuovo esame;CervicalLordosisDepth_SAG;30",
file = file.path(dir, "tmp2.csv"))

R delete columns from data frame matching regex pattern

We can use grep for regex matching of a pattern in the column name. Here, the pattern is to check for letter 'd' at the start (^) of the string followed by one or more digits (\\d+) till the end ($) of the string, use the invert = TRUE (by default it is FALSE), and subset the columns with the numeric index

df[grep("^d\\d+$", names(df), invert = TRUE)]
# b c
#1 3 4
#2 4 5
#3 5 3

R: Dropping columns with names containing a substring anywhere except the start using regular expressions in dplyr

You need one or more of . at the beginning so you could write ^.{1,}.

df %>% dplyr::select(-matches("^.{1,}foo1"))
# bar foo1
# 1 -1.077056 -0.5649875


Related Topics



Leave a reply



Submit