How to Sweep Specific Columns with Dplyr

How do I sweep specific columns with dplyr?

From dplyr 1.0.0, you can do:

data %>%
rowwise() %>%
mutate(across(A:D)/factors)

ID Type A B C D
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 X 3 0.833 3.75 5.33
2 2 X 174 107. 82.5 76
3 3 X 6 1.67 2.5 5.33
4 4 Y 1377 849. 312. 335.
5 5 Y 537 353. 161. 165.
6 6 Y 173 116. 50 50.7

Sweep-like operations with dplyr/tidyverse

(For what it's worth, I think the down-votes are a bit harsh and unwarranted here. The problem statement is clear, and sample data has been included in an edit.)

You can achieve what you're after by converting data in numeric columns from wide to long (using gather), grouping by rows (using group_by), subtracting the minimum (using mutate), and converting back from long to wide (using spread).

library(tidyverse)
df %>%
gather(k, v, starts_with("X")) %>%
group_by(nm) %>%
mutate(v = v - min(v)) %>%
spread(k, v) %>%
select(names(df))
## A tibble: 5 x 7
## Groups: nm [5]
# nm X1799.38928 X1798.01526 X1796.64124 source color rep
# <fct> <dbl> <dbl> <dbl> <int> <fct> <int>
#1 s001c1 18.6 5.72 0. 1 c 1
#2 s001c2 14.2 0. 12.0 1 c 2
#3 s001c3 0. 16.8 21.8 1 c 3
#4 s001c4 0. 11.4 17.8 1 c 4
#5 s001c5 6.80 0. 3.58 1 c 5

Sample data

df <- read.table(text =
"nm X1799.38928 X1798.01526 X1796.64124 source color rep
1 s001c1 13901.944 13889.056 13883.334 01 c 1
2 s001c2 17293.586 17279.375 17291.365 01 c 2
3 s001c3 8011.764 8028.584 8033.548 01 c 3
4 s001c4 7499.272 7510.719 7517.064 01 c 4
5 s001c5 20300.408 20293.604 20297.185 01 c 5")

Selecting specific columns when using mutate_each function from dplyr

For these cases, matches would be more appropriate

  df %>%
mutate_each(funs(.*Freq), matches("^[A-Z]\\.", ignore.case=FALSE))

Here, I am assuming that you wanted to select only column names that start with a capital letter (^[A-Z]) followed by a .. We have to escape the . (\\.), otherwise it will be considered as any single character.

I am not changing anything except in the starts_with part. In the mutate_each if we need to pass a function, it can be passed inside a funs call. In the above code, we are multiplying each of the columns (.) selected by the matches with the 'Freq' column.

According to ?select

‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’

EDIT: Added @docendodiscimus comment's

Dividing selected columns by vector in dplyr

You can use rowwise() with c_across()

df1 %>%
rowwise() %>%
mutate(c_across(a1:a3) / df2, .keep = "unused") %>%
ungroup()

# # A tibble: 5 x 4
# x b1 b2 b3
# <dbl> <dbl> <dbl> <dbl>
# 1 19 0.333 4 0.4
# 2 38 0.667 8 0.8
# 3 57 1 12 1.2
# 4 76 1.33 16 1.6
# 5 95 1.67 20 2

Another base R option

df1[-1] <- t(t(df1[-1]) / unlist(df2))
df1

# # A tibble: 5 x 4
# x a1 a2 a3
# <dbl> <dbl> <dbl> <dbl>
# 1 19 0.333 4 0.4
# 2 38 0.667 8 0.8
# 3 57 1 12 1.2
# 4 76 1.33 16 1.6
# 5 95 1.67 20 2

How to select columns depending on multiple conditions in dplyr

Inside where, we need to supply functions that have logical results.

library(dplyr)

select(df1, \(x) all(x < 5))

# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))

a1
1 1
2 0
3 3
4 0

Data

df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0), 
a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA,
-4L))

How to exclude a column when applying the sweep function to a data set

You could do this using:

  nm1 <- setdiff(colnames(longley), "Year")
res1 <- longley[nm1]-colMeans(longley[nm1])[col(longley[nm1])]

Or using sweep

 res2 <- sweep(longley[nm1], 2, FUN=`-`, apply(longley[nm1], 2, mean))
identical(res1, res2)
#[1] TRUE

Or you can replace the apply with colMeans

 sweep(longley[nm1], 2, FUN=`-`, colMeans(longley[nm1]))

iterate repetitive operation of columns of certain class

We can use mutate_at to divide specific columns

library(dplyr)

t %>% mutate_at(vars(norWT, stateWT),list(avWT1 = ~./avWT))

# Sp norWT stateWT avWT norWT_avWT1 stateWT_avWT1
#1 ALF-01 4.00 4.00 1.1 3.636364 3.636364
#2 AMB-01 74.25 74.25 3.4 21.838235 21.838235

Using base R, you could do it directly as well.

cols <- c("norWT", "stateWT")
t[paste0(cols, "_avWT1")] <- t[cols]/t$avWT

Also t is a name of a function in R, so better to use some other name for dataframe.


If there are many more columns and we need to operate this only on numeric columns, we can use mutate_if

t %>%
mutate_if(is.numeric, list(avWT1 = ~./avWT)) %>%
select(-avWT_avWT1)

What is the tidy equivalent of using `sweep` across rows?

Ah, just as soon as I finally post, I found an answer.

tidyX %>% 
rowwise() %>%
mutate(across() * coefs)

I still find this syntax nonintuitive, but that does just what I'm looking for.

How to select only columns (type=factor) with less than n levels with dplyr?

Just pass a function to select_if, much like mutate_if -- see ?nlevels:

Titanic %>%
as_data_frame() %>%
mutate_if(is.character, factor) %>%
select_if(~ nlevels(.) < 4)

Note that you could also write this as: select_if(function(x) nlevels(x) < 4)

R sweep a dataframe for characters, but only in the parameter columns

You can try to detect values in param columns which have only numbers (with maybe one dot) and replace values that don't and then afterwards convert it to numeric.

Example:

df <- data.frame(
species = letters[1:5],
param1 = c("123.56", "23", "ds%", "12.ab", "123"),
param2 = c("%23", "43.23", "abc", "45", "0.23"),
stringsAsFactors = FALSE
)

library(dplyr)
library(stringr)

df %>%
mutate(
across(
matches("^param[0-9]+"),
~ifelse(str_detect(.x, "^[0-9]+\\.{0,1}[0-9]*$"), .x, NA_character_) %>%
as.numeric()
)
)

gives:

  species param1 param2
1 a 123.56 NA
2 b 23.00 43.23
3 c NA NA
4 d NA 45.00
5 e 123.00 0.23

where param columns are numeric.

Note: param columns must be character and not factors. If they are factors you need to convert them to characters.



Related Topics



Leave a reply



Submit