How to Sweep Specific Columns with Dplyr

How do I sweep specific columns with dplyr?

From dplyr 1.0.0, you can do:

data %>%
 rowwise() %>%
 mutate(across(A:D)/factors)

     ID Type      A       B      C      D
  <dbl> <chr> <dbl>   <dbl>  <dbl>  <dbl>
1     1 X         3   0.833   3.75   5.33
2     2 X       174 107.     82.5   76   
3     3 X         6   1.67    2.5    5.33
4     4 Y      1377 849.    312.   335.  
5     5 Y       537 353.    161.   165.  
6     6 Y       173 116.     50     50.7

Sweep-like operations with dplyr/tidyverse

(For what it's worth, I think the down-votes are a bit harsh and unwarranted here. The problem statement is clear, and sample data has been included in an edit.)

You can achieve what you're after by converting data in numeric columns from wide to long (using gather), grouping by rows (using group_by), subtracting the minimum (using mutate), and converting back from long to wide (using spread).

library(tidyverse)
df %>%
    gather(k, v, starts_with("X")) %>%
    group_by(nm) %>%
    mutate(v = v - min(v)) %>%
    spread(k, v) %>%
    select(names(df))
## A tibble: 5 x 7
## Groups:   nm [5]
#  nm     X1799.38928 X1798.01526 X1796.64124 source color   rep
#  <fct>        <dbl>       <dbl>       <dbl>  <int> <fct> <int>
#1 s001c1       18.6         5.72        0.        1 c         1
#2 s001c2       14.2         0.         12.0       1 c         2
#3 s001c3        0.         16.8        21.8       1 c         3
#4 s001c4        0.         11.4        17.8       1 c         4
#5 s001c5        6.80        0.          3.58      1 c         5

Sample data

df <- read.table(text =
    "nm X1799.38928 X1798.01526 X1796.64124 source color rep
1 s001c1   13901.944   13889.056   13883.334     01     c   1
2 s001c2   17293.586   17279.375   17291.365     01     c   2
3 s001c3    8011.764    8028.584    8033.548     01     c   3
4 s001c4    7499.272    7510.719    7517.064     01     c   4
5 s001c5   20300.408   20293.604   20297.185     01     c   5")

Selecting specific columns when using mutate_each function from dplyr

For these cases, matches would be more appropriate

  df %>%
      mutate_each(funs(.*Freq), matches("^[A-Z]\\.", ignore.case=FALSE))

Here, I am assuming that you wanted to select only column names that start with a capital letter (^[A-Z]) followed by a .. We have to escape the . (\\.), otherwise it will be considered as any single character.

I am not changing anything except in the starts_with part. In the mutate_each if we need to pass a function, it can be passed inside a funs call. In the above code, we are multiplying each of the columns (.) selected by the matches with the 'Freq' column.

According to ?select

‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’

EDIT: Added @docendodiscimus comment's

Dividing selected columns by vector in dplyr

You can use rowwise() with c_across()

df1 %>%
  rowwise() %>% 
  mutate(c_across(a1:a3) / df2, .keep = "unused") %>%
  ungroup()

# # A tibble: 5 x 4
#       x    b1    b2    b3
#   <dbl> <dbl> <dbl> <dbl>
# 1    19 0.333     4   0.4
# 2    38 0.667     8   0.8
# 3    57 1        12   1.2
# 4    76 1.33     16   1.6
# 5    95 1.67     20   2

Another base R option

df1[-1] <- t(t(df1[-1]) / unlist(df2))
df1

# # A tibble: 5 x 4
#       x    a1    a2    a3
#   <dbl> <dbl> <dbl> <dbl>
# 1    19 0.333     4   0.4
# 2    38 0.667     8   0.8
# 3    57 1        12   1.2
# 4    76 1.33     16   1.6
# 5    95 1.67     20   2

How to select columns depending on multiple conditions in dplyr

Inside where, we need to supply functions that have logical results.

library(dplyr)

select(df1, \(x) all(x < 5))

# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))

  a1
1  1
2  0
3  3
4  0

Data

df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0), 
    a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA, 
-4L))

How to exclude a column when applying the sweep function to a data set

You could do this using:

  nm1 <- setdiff(colnames(longley), "Year")
  res1 <- longley[nm1]-colMeans(longley[nm1])[col(longley[nm1])]

Or using sweep

 res2 <- sweep(longley[nm1], 2, FUN=`-`, apply(longley[nm1], 2, mean))
 identical(res1, res2)
 #[1] TRUE

Or you can replace the apply with colMeans

 sweep(longley[nm1], 2, FUN=`-`, colMeans(longley[nm1]))

iterate repetitive operation of columns of certain class

We can use mutate_at to divide specific columns

library(dplyr)

t %>% mutate_at(vars(norWT, stateWT),list(avWT1 = ~./avWT))

#     Sp norWT stateWT avWT norWT_avWT1 stateWT_avWT1
#1 ALF-01  4.00    4.00  1.1    3.636364      3.636364
#2 AMB-01 74.25   74.25  3.4   21.838235     21.838235

Using base R, you could do it directly as well.

cols <- c("norWT", "stateWT")
t[paste0(cols, "_avWT1")] <- t[cols]/t$avWT

Also t is a name of a function in R, so better to use some other name for dataframe.

If there are many more columns and we need to operate this only on numeric columns, we can use mutate_if

t %>%
  mutate_if(is.numeric, list(avWT1 = ~./avWT)) %>%
  select(-avWT_avWT1)

What is the tidy equivalent of using `sweep` across rows?

Ah, just as soon as I finally post, I found an answer.

tidyX %>% 
  rowwise() %>% 
  mutate(across() * coefs)

I still find this syntax nonintuitive, but that does just what I'm looking for.

How to select only columns (type=factor) with less than n levels with dplyr?

Just pass a function to select_if, much like mutate_if -- see ?nlevels:

Titanic %>%
  as_data_frame() %>%
  mutate_if(is.character, factor) %>%
  select_if(~ nlevels(.) < 4)

Note that you could also write this as: select_if(function(x) nlevels(x) < 4)

R sweep a dataframe for characters, but only in the parameter columns

You can try to detect values in param columns which have only numbers (with maybe one dot) and replace values that don't and then afterwards convert it to numeric.

Example:

df <- data.frame(
  species = letters[1:5],
  param1 = c("123.56", "23", "ds%", "12.ab", "123"),
  param2 = c("%23", "43.23", "abc", "45", "0.23"),
  stringsAsFactors = FALSE
)

library(dplyr)
library(stringr)

df %>%
  mutate(
    across(
      matches("^param[0-9]+"),
      ~ifelse(str_detect(.x, "^[0-9]+\\.{0,1}[0-9]*$"), .x, NA_character_) %>%
        as.numeric()
    )
  )

gives:

  species param1 param2
1       a 123.56     NA
2       b  23.00  43.23
3       c     NA     NA
4       d     NA  45.00
5       e 123.00   0.23

where param columns are numeric.

Note: param columns must be character and not factors. If they are factors you need to convert them to characters.