Apply a function to all variables starting with specific pattern in R
To answer exactly what the OP asked for (mapply(c, test1, test2,..testn)
), do:
do.call(mapply, c(FUN = c, mget(paste0("test", 1:n))))
If you don't know how many (n
) lists you have and want to find them using a pattern:
do.call(mapply, c(FUN = c, mget(ls(pattern = "^test\\d+$"))))
Like the other answers so far, this method using ls
will not sort the objects properly if there are more than nine of them because they are sorted alphabetically. The longer but fully robust version would be:
test.lists <- ls(pattern = "^test\\d+$")
ordered.lists <- test.lists[order(as.integer(sub("test", "", test.lists)))]
do.call(mapply, c(FUN = c, mget(ordered.lists)))
apply function to all variables with string in name
Normally one tries to group such variables in a list but if not then we can do this:
for(nm in ls(pattern = "^VAR")) .GlobalEnv[[nm]] <- as.character(.GlobalEnv[[nm]])
Environment that is not the global environment
If you have these in an environment that is not the global environment then modify this as follows. The first line of the function body defines the test data, the next line puts the current environment in a variable e
for convenience and the line after that performs the transformations. Finally we check what the variables have been transformed to.
f <- function() {
VAR1 <- 1; VAR2 <- 2; VAR3 <- 3 # test data
e <- environment() # current environment
for(nm in ls(pattern = "^VAR")) e[[nm]] <- as.character(e[[nm]])
str(VAR1); str(VAR2); str(VAR3) # check results
}
f()
List
If you can arrange that these are in a list instead then:
L <- list(VAR1 = 1, VAR2 = 2, VAR3 = 3) # test data
L <- lapply(L, as.character)
or if there are some elements that are not to be processed:
L2 <- list(VAR1 = 1, VAR2 = 2, VAR3 = 3, other = 4) # test data
ix <- grep("^VAR", names(L2))
L2[ix] <- lapply(L2[ix], as.character)
If you don't want to overwrite L
and L2
-- overwriting tends to make debugging more difficult -- then use Lnew <- lapply(L, as.character)
and L2new <- replace(L2, ix, lapply(L2[ix], as.character))
instead.
How to get all variables with pattern in name into a list while inside function
You can create an environment and then create variables inside it. Then using ls()
function with the environment name and the correct pattern, you can see the list of variables in the environment that matches the given pattern.
test_function <- function(x) {
myenv <- new.env()
myenv$hello1 = "hello1"
myenv$hello2 = "hello2"
myenv$cello2 = "hello2"
mylist <- ls(name = myenv, pattern = "hello")
print(mylist)
}
test_function(1)
# [1] "hello1" "hello2"
You can use mget
to extract values for a list of variables inside an environment.
test_function <- function(x, y, z, pattern) {
myenv <- new.env()
ls_vars <- list( hello1 = x,
hello2 = y,
cello2 = z)
list2env( ls_vars, myenv ) # add list of variables to myenv environment
newvar <- "hello3"
assign(newvar, value = "dfsfsf", envir = myenv) # assign new variable
mylist <- ls(name = myenv, pattern = pattern)
return(mget(mylist, envir = myenv))
}
test_function(x = "hello1", y = "hello2", z = "sdfsd", pattern = "hello")
# $hello1
# [1] "hello1"
#
# $hello2
# [1] "hello2"
#
# $hello3
# [1] "dfsfsf"
test_function(x = "hello1", y = "hello2", z = "sdfsd", pattern = "cello")
# $cello2
# [1] "sdfsd"
Apply function to several variables with same name pattern
Just use grepl
to match the column names you want to operate on returning a logical vector, inside the [
operator to subset the dataframe. Because log10
is vectorised you can just do this....
df[ , grepl( "htotal_" , names( df ) ) ] <- -log10( df[ , grepl( "htotal_" , names( df ) ) ] )
Vectorised example
# Set up the data
df <- data.frame( matrix( sample( c(1,10,1000) , 16 , repl = TRUE ) , 4 , 4 ) )
names( df ) <- c("htotal_1" , "htotal_2" , "not1" , "not2" )
# htotal_1 htotal_2 not1 not2
#1 10 10 10 1000
#2 10 10 1 10
#3 1000 1 1 1000
#4 10 1000 10 1000
df[ , grepl( "htotal_" , names( df ) ) ] <- -log10( df[ , grepl( "htotal_" , names( df ) ) ] )
# htotal_1 htotal_2 not1 not2
#1 -1 -1 10 1000
#2 -1 -1 1 10
#3 -3 0 1 1000
#4 -1 -3 10 1000
Apply a function to every specified column in a data.table and update by reference
This seems to work:
dt[ , (cols) := lapply(.SD, "*", -1), .SDcols = cols]
The result is
a b d
1: -1 -1 1
2: -2 -2 2
3: -3 -3 3
There are a few tricks here:
- Because there are parentheses in
(cols) :=
, the result is assigned to the columns specified incols
, instead of to some new variable named "cols". .SDcols
tells the call that we're only looking at those columns, and allows us to use.SD
, theS
ubset of theD
ata associated with those columns.lapply(.SD, ...)
operates on.SD
, which is a list of columns (like all data.frames and data.tables).lapply
returns a list, so in the endj
looks likecols := list(...)
.
EDIT: Here's another way that is probably faster, as @Arun mentioned:
for (j in cols) set(dt, j = j, value = -dt[[j]])
Apply filter criteria to variables that contain/start with certain string in R
In base R you can use lapply
/sapply
:
d[Reduce(`|`, lapply(d[-1], grepl, pattern = 'd')), ]
#d[rowSums(sapply(d[-1], grepl, pattern = 'd')) > 0, ]
# ID test1 test2 test3 test4
#2 b b b c d
#4 d d a c a
#5 e a s d f
If you are interested in dplyr
solution you can use any of the below method :
library(dplyr)
library(stringr)
#1.
d %>%
filter_at(vars(starts_with('test')), any_vars(str_detect(., 'd')))
#2.
d %>%
rowwise() %>%
filter(any(str_detect(c_across(starts_with('test')), 'd')))
#3.
d %>%
filter(Reduce(`|`, across(starts_with('test'), str_detect, 'd')))
How to apply the same function to several variables in R?
Here is an option
library(dplyr)
library(stringr)
library(purrr)
map(actorlist, ~ df %>%
select(.x) %>%
filter(!str_detect(!! rlang::sym(.x), "^s\\d+$")) %>%
pull(1))
#[[1]]
#[1] "nons1" "nons2" "nons3" "nons4" "nons5"
#[[2]]
#[1] "nons2" "nons6" "nons1" "nons4"
It can be wrapped as a function as well. Note that the input is string, so instead of enquo
, use sym
to convert to symbol and then evaluate (!!
)
f1 <- function(dat, colNm) {
dat %>%
select(colNm) %>%
filter(!str_detect(!! rlang::sym(colNm), "^s\\d+$")) %>%
pull(1) %>%
unique
}
map(actorlist, f1, dat = df)
NOTE: This can be done more easily, but here we are using similar code from the OP's post
Another option is to use split
with grepl
in base R
and that returns a list
of both 'nons' and 's' after removing the NA
s
lapply(df[2:3], function(x) {
x1 <- x[!is.na(x)]
split(x1, grepl("nons", x1))})
set na all values that starts with certain string in dplyr environment is.na(), na_if(), startsWith(), regex
If you're able to do it for one column using mutate
, you should be able to do it for multiple columns using mutate_at()
or mutate_all()
, explained here: https://dplyr.tidyverse.org/reference/mutate_all.html
Without knowing what your data looks like, I think you'd want mutate_all()
to modify all columns which have data which matches your condition.
In this example using the iris
dataset, we replace all instances of 5
with the word five
:
iris %>%
tibble %>%
mutate_all(function(x) str_replace(x, '5', 'five'))
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<chr> <chr> <chr> <chr> <chr>
1 five.1 3.five 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.five 0.2 setosa
5 five 3.6 1.4 0.2 setosa
6 five.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 five 3.4 1.five 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.five 0.1 setosa
Or like your condition, we can do this only when the string starts with 5
, using ^5
regex language (^
indicates the start of the string, and 5
means a 5
at the beginning of the string).
iris %>%
tibble %>%
mutate_all(function(x) str_replace(x, '^5', 'five'))
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<chr> <chr> <chr> <chr> <chr>
1 five.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 five 3.6 1.4 0.2 setosa
6 five.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 five 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
Update To change the entire value, if it has a 5
at the start, you just need to change the str_replace
function to a function which can change the entire value. In this case, we use an ifelse
statement
iris %>%
tibble %>%
mutate_all(function(x) ifelse(str_detect(x, '^5'), 'had_five', x))
# A tibble: 150 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<chr> <dbl> <chr> <dbl> <int>
1 had_five 3.5 1.4 0.2 1
2 4.9 3 1.4 0.2 1
3 4.7 3.2 1.3 0.2 1
4 4.6 3.1 1.5 0.2 1
5 had_five 3.6 1.4 0.2 1
6 had_five 3.9 1.7 0.4 1
7 4.6 3.4 1.4 0.3 1
8 had_five 3.4 1.5 0.2 1
9 4.4 2.9 1.4 0.2 1
10 4.9 3.1 1.5 0.1 1
Another update From your comments, it sounds like you want to apply the function to only character columns. To do this, you can substitute mutate_all(your_fun)
for mutate_if(is.character, your_fun)
- as described in the help documentation at the start of this answer (the same info page describes mutate_all
, mutate_if
and mutate_at
).
Using your sample data as an example, we can set anything beginning with '0'
to NA. I am confused by your example though - do you want to look for '0'
or '0\n('
at the start of the string? Either way, this is how to do it:
# sample data
string <- c("asff", "1\n(", '0asfd', '0\n(asdf)')
num <- c(0,1,2,3)
df <- data.frame(string, num)
# for only a 0 at the start of the string
df %>%
mutate_if(is.character, function(x) ifelse(str_detect(x, '^0'), NA, x))
string num
1 asff 0
2 1\n( 1
3 <NA> 2
4 <NA> 3
# for '0\n(' at the start of the string
df %>%
mutate_if(is.character, function(x) ifelse(str_detect(x, '^0\\n\\('), NA, x))
string num
1 asff 0
2 1\n( 1
3 0asfd 2
4 <NA> 3
Function to look for different patterns in specific positions in a string in R
base R
rowSums(outer(strings, seq_len(nrow(mutations)),
function(st, i) {
substr(st, mutations$position[i], mutations$position[i]) == mutations$AA[i]
}))
# [1] 2 1 1
Walk-through:
outer
effectively just produces two vectors, an expansion of the cartesian product of the two arguments. If we insert abrowser()
as the first line of the inner anon-func, we'd seedata.frame(st, i)
# st i
# 1 EVQLVESGGGLAKPG 1
# 2 VQLVESGGGLAKPGGS 1
# 3 EVQLVESGGALAKPGGSLRLSCAAS 1
# 4 EVQLVESGGGLAKPG 2
# 5 VQLVESGGGLAKPGGS 2
# 6 EVQLVESGGALAKPGGSLRLSCAAS 2(Shown as a frame only for a columnar presentation. Both
st
andi
are simple vectors.)From here, knowing that
substr
is vectorized across all arguments, then a single call tosubstr
will find thei
th character in each of thest
rings.The result of the
substr
is a vector of letters. Continuing the samebrowser()
session from above,substr(st, mutations$position[i], mutations$position[i])
# [1] "G" "G" "G" "G" "L" "A"
mutations$AA[i]
# [1] "G" "G" "G" "G" "G" "G"
substr(st, mutations$position[i], mutations$position[i]) == mutations$AA[i]
# [1] TRUE TRUE TRUE TRUE FALSE FALSEThe
mutations$AA[i]
shows us what we're looking for. A nice thing of the vectorized method here is thatmutations$AA[i]
will always be the same length and in the expected order of letters retrieved bysubstr(.)
.The
outer
itself returns amatrix
, withlength(X)
rows andlength(Y)
columns (X
andY
are the first and second args toouter
, respective).outer(strings, seq_len(nrow(mutations)),
function(st, i) {
substr(st, mutations$position[i], mutations$position[i]) == mutations$AA[i]
})
# [,1] [,2]
# [1,] TRUE TRUE
# [2,] TRUE FALSE
# [3,] TRUE FALSEThe number of correct mutations found in each string is just a sum of each row. (Ergo
rowSums
.)
If you're concerned due to a large amount of mutations
and strings
, you can replace the outer
and iterate over each row of mutations
instead:
rowSums(sapply(seq_len(nrow(mutations)), function(i) substr(strings, mutations$position[i], mutations$position[i]) == mutations$AA[i]))
# [1] 2 1 1
This calls substr
once for each mutations
row, so if the outer
-explosion is too much, this might reduce the memory footprint.
Related Topics
Check Whether All Elements of a List Are in Equal in R
Compute Only Diagonals of Matrix Multiplication in R
How to Use an R Script from Github
Ggplot Inserting Space Before Degree Symbol on Axis Label
Split a File Path into Folder Names Vector
How to Put a Complicated Equation into a R Formula
From [Package] Import [Function] in R
R: Ggplot2: Adding Count Labels to Histogram with Density Overlay
Prevent Automatic Conversion of Single Column to Vector
R Doesn't Reset the Seed When "L'Ecuyer-Cmrg" Rng Is Used
Nls Troubles: Missing Value or an Infinity Produced When Evaluating the Model
Why Should Someone Use {} for Initializing an Empty Object in R
Why Doesn't "+" Operate on Characters in R
How to Extract Unique Elements from a Data.Frame in R
Remove Text Inside Brackets, Parens, And/Or Braces
R Data.Table Join: SQL "Select *" Alike Syntax in Joined Tables