Mutate Multiple/Consecutive Columns (With Dplyr or Base R)

Mutate multiple / consecutive columns (with dplyr or base R)

Here is one way with the package zoo:

library(zoo)
t(rollapply(t(df), width = 10, by = 10, function(x) sum(x)/10))

Here is one way to do it with base R:

splits <- 1:100
dim(splits) <- c(10, 10)
splits <- split(splits, col(splits))
results <- do.call("cbind", lapply(splits, function(x) data.frame(rowSums(df[,x] / 10))))
names(results) <- paste0("wave_", 1:10)
results

Another very succinct way with base R (courtesy of G.Grothendieck):

t(apply(df, 1, tapply, gl(10, 10), mean))

And here is a solution with dplyr and tidyr:

library(dplyr)
library(tidyr)
df$row <- 1:nrow(df)
df2 <- df %>% gather(column, value, -row)
df2$column <- cut(as.numeric(gsub("X", "", df2$column)),breaks = c(0:10*10))
df2 <- df2 %>% group_by(row, column) %>% summarise(value = sum(value)/10)
df2 %>% spread(column, value) %>% select(-row)

Mutate multiple / consecutive columns (with dplyr)

If we are using rowSums, it can be used directly within mutate. Also, as this is doing the sum on each row, the group_by, is not needed. The distinct part without .keep_all = TRUE returns only the distinct rows of 'DESCRIPTION' column.

library(dplyr)
df1 %>%

mutate(Total = rowSums(.[4:17], na.rm = TRUE)) %>%
group_by(`ITEM#`) %>%
mutate(Total = sum(Total, na.rm = TRUE))

NOTE: By checking the 'DESCRIPTION' from the image, all the elements are unique, so distinct is not needed

Mutate multiple columns with conditions using dplyr

Although I prefer a solution with all variables in one column as suggested by @Patrick (although I would use something like %>% mutate(new_col = case_when(etc...)), here a way with for-loop

# I changed your data a tiny bit
df <- tibble("a" = sample(1990:2000, size = 10), # better to use 'sample' then 'runif' !
"event" = 1995) %>% mutate("relative_event" = a - event)

Now the actual work

for (i in min(df$relative_event):max(df$relative_event)) {

# the indexing value is your difference in years. So you have to run the index from the lowest difference to the highest.

if( i < 0 ) {
df[[paste0('event_b', abs(i))]] <- ifelse(i == df$relative_event, 1, 0)
}
if( i >= 0 ) {
df[[paste0('event_f', abs(i))]] <- ifelse(i == df$relative_event, 1, 0)
df
}
}

# A tibble: 10 x 14
a event relative_event event_b5 event_b4 event_b3 event_b2 event_b1
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1990 1995 -5 1 0 0 0 0
2 1992 1995 -3 0 0 1 0 0
3 1991 1995 -4 0 1 0 0 0
4 2000 1995 5 0 0 0 0 0
5 1998 1995 3 0 0 0 0 0
6 1993 1995 -2 0 0 0 1 0
7 1996 1995 1 0 0 0 0 0
8 1997 1995 2 0 0 0 0 0
9 1994 1995 -1 0 0 0 0 1
10 1999 1995 4 0 0 0 0 0
# ... with 6 more variables: event_f0 <dbl>, event_f1 <dbl>, event_f2 <dbl>,
# event_f3 <dbl>, event_f4 <dbl>, event_f5 <dbl>

If you don't want to run through every possible difference in years - (this will create 'empty' columns) - you could simply create a vector with unique(df$relative_event) and run i through this vector

Mutating multiple columns in a data frame using dplyr

You are really close.

df2 <- 
df %>%
mutate(v1v3 = v1 * v3,
v2v4 = v2 * v4)

such a beautifully simple language, right?

For more great tricks please see here.

EDIT:
Thanks to @Facottons pointer to this answer: https://stackoverflow.com/a/34377242/5088194, here is a tidy approach to resolving this issue. It keeps one from having to write a line to hard code in each new column desired. While it is a bit more verbose than the Base R approach, the logic is at least more immediately transparent/readable. It is also worth noting that there must be at least half as many rows as there are columns for this approach to work.

# prep the product column names (also acting as row numbers)
df <-
df %>%
mutate(prod_grp = paste0("v", row_number(), "v", row_number() + 2))

# converting data to tidy format and pairing columns to be multiplied together.
tidy_df <-
df %>%
gather(column, value, -prod_grp) %>%
mutate(column = as.numeric(sub("v", "", column)),
pair = column - 2) %>%
mutate(pair = if_else(pair < 1, pair + 2, pair))

# summarize the products for each column
prod_df <-
tidy_df %>%
group_by(prod_grp, pair) %>%
summarize(val = prod(value)) %>%
spread(prod_grp, val) %>%
mutate(pair = paste0("v", pair, "v", pair + 2)) %>%
rename(prod_grp = pair)

# put the original frame and summary frames together
final_df <-
df %>%
left_join(prod_df) %>%
select(-prod_grp)

How to select consecutive columns with across function dplyr

In across(), there are two basic arguments. The first argument are the columns that are to be modified, while the second argument is the function which should be applied to the columns. In addition, vars() is no longer needed to select the variables. Thus, the correct form is:

d %>%
mutate(across(V1:V4, ~ replace(., is.na(.), 0)))

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 2 6 0 6 5 6 10 5 3 1
2 2 9 2 4 10 6 9 4 NA NA
3 5 5 3 0 3 7 1 5 9 5
4 7 1 1 6 2 1 8 NA 8 4
5 3 5 3 0 2 3 4 2 3 NA
6 0 10 0 2 5 10 1 10 4 3
7 4 3 10 6 NA 5 9 3 3 9
8 9 9 8 5 8 1 3 1 NA 10
9 6 3 0 1 1 9 3 5 8 4
10 3 2 9 1 5 2 4 NA 6 1

Mutate across multiple columns using dplyr

Two possibilities using dplyr:

library(dplyr)

mtcars %>%
rowwise() %>%
mutate(varmean = mean(c_across(mpg:vs)))

This returns

# A tibble: 32 x 12
# Rowwise:
mpg cyl disp hp drat wt qsec vs am gear carb varmean
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 40.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 40.1
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 31.7
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 52.8
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 73.2
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 47.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 81.2
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 33.1
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 36.7
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 42.8
# ... with 22 more rows

and without rowwise() and using base Rs rowMeans():

mtcars %>% 
mutate(varmean = rowMeans(across(mpg:vs)))

returns

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb  varmean
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 39.99750
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 40.09938
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 31.69750
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 52.76687
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 73.16375
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 47.69250
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 81.24000
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 33.12250
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 36.69625
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 42.80750

R: mutate over multiple columns to create a new column

Test2<- Test %>%
dplyr::select(starts_with("Test"))%>%
mutate_all(function(x){x %in% c("DF60","DF61","DF62","DF63")})%>%
mutate(out = ifelse(rowSums(.)<1,0,1))

Adjustment after comment

If you want to keep other columns, mutate_at, as is proposed by yutannihilation, is far better. The problem then becomes doing the rowsums in mutate on a selection of the columns. No idea if the next thing is best practice, but it works (reworked an answer on a previous question of mine: dplyr mutate on column subset (one function on all these columns combined))

library(tidyverse)
library(anomalyDetection)

Test1<-c("DF64", "DF63", "DF89", "DF30", "DF70")
Test2<-c("DF61", "DF25", "DF00", "DF30", "DF99")
Test3<-c("DF80", "DF63", "DF60", "DF63", "DF70")
Test<-data.frame(Test1, Test2, Test3)

Test$ExtraCol<-LETTERS[1:5]

Test2<- Test %>%
mutate_at(vars(starts_with("Test")),funs(bin=.%in% c("DF60","DF61","DF62","DF63")))%>%
split(.,1<10)%>%
map_df(~mutate(.,out=rowSums(.[paste0("Test",1:3,"_bin")])>0))

Test1 Test2 Test3 ExtraCol Test1_bin Test2_bin Test3_bin out
DF64 DF61 DF80 A FALSE TRUE FALSE TRUE
DF63 DF25 DF63 B TRUE FALSE TRUE TRUE
DF89 DF00 DF60 C FALSE FALSE TRUE TRUE
DF30 DF30 DF63 D FALSE FALSE TRUE TRUE
DF70 DF99 DF70 E FALSE FALSE FALSE FALSE

Sum across multiple columns with dplyr

dplyr >= 1.0.0 using across

sum up each row using rowSums (rowwise works for any aggreation, but is slower)

df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(across(where(is.numeric))))

sum down each column

df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))

dplyr < 1.0.0

sum up each row

df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(.[1:5]))

sum down each column using superseeded summarise_all:

df %>%
replace(is.na(.), 0) %>%
summarise_all(funs(sum))


Related Topics



Leave a reply



Submit