How to Search for Columns with Same Name, Add The Column Values and Replace These Columns with Same Name by Their Sum? Using R

Sum and replace columns with same name R for a data frame containing different classes

Try

dat1 <- dat #to keep a copy of the original dataset 
indx <- sapply(dat, is.numeric)#check which columns are numeric
nm1 <- which(indx)#get the numeric index of the column
indx2 <- duplicated(names(nm1))#check which among the
# integer columns are duplicated
#use `Map` after splitting the "nm1" with its "names", do the `rowSums`
dat[ nm1[!indx2]] <- Map(function(x,y) rowSums(x[y]), list(dat),
split(nm1, names(nm1)))

dat[ -nm1[indx2]]

Update

Or to make it more efficient, only take the "duplicated" and "numeric" columns while leaving the others intact. Create an "index" (indx2) of columns that are duplicated. Subset the "nm1" based on the "indx2" and then do rowSums as described above. Finally, remove the unwanted columns (duplicated ones) by using the "indx3"

 indx2 <- duplicated(names(nm1))|duplicated(names(nm1),fromLast=TRUE)
nm2 <- nm1[indx2]
indx3 <- duplicated(names(nm2))
dat[nm2[!indx3]] <- Map(function(x,y) rowSums(x[y]),
list(dat),split(nm2, names(nm2)))
datN <- dat[ -nm2[indx3]]
datN
# col1 col2 col3 col4 col5
#1 16 23 2 10 10
#2 10 18 12 8 18
#3 21 23 15 6 10
#4 14 37 3 5 15
#5 29 39 5 1 11
#6 26 31 14 2 20
#7 25 31 2 8 10
#8 36 31 12 8 6
#9 32 26 13 6 4
#10 16 38 1 7 3

Checking the results

 rowSums(dat1[names(dat1) %in% 'col1'])
#[1] 16 10 21 14 29 26 25 36 32 16
rowSums(dat1[names(dat1) %in% 'col2'])
#[1] 23 18 23 37 39 31 31 31 26 38

data

dat <- structure(list(col1 = c(6L, 5L, 15L, 11L, 14L, 19L, 6L, 16L, 
17L, 6L), col2 = c(13L, 8L, 14L, 14L, 7L, 19L, 4L, 1L, 11L, 3L
), col3 = structure(c(2L, 5L, 8L, 3L, 4L, 7L, 2L, 5L, 6L, 1L), .Label = c("1",
"2", "3", "5", "12", "13", "14", "15"), class = "factor"), col2 = c(7L,
5L, 8L, 3L, 19L, 5L, 15L, 13L, 14L, 20L), col4 = structure(c(7L,
6L, 4L, 3L, 1L, 2L, 6L, 6L, 4L, 5L), .Label = c("1", "2", "5",
"6", "7", "8", "10"), class = "factor"), col5 = c(10L, 18L, 10L,
15L, 11L, 20L, 10L, 6L, 4L, 3L), col1 = c(10L, 5L, 6L, 3L, 15L,
7L, 19L, 20L, 15L, 10L), col2 = c(3L, 5L, 1L, 20L, 13L, 7L, 12L,
17L, 1L, 15L)), .Names = c("col1", "col2", "col3", "col2", "col4",
"col5", "col1", "col2"), row.names = c(NA, -10L), class = "data.frame")

Sum column with similar name

You could use

library(dplyr)
df %>%
mutate(across(starts_with("AB"),
~.x + df[[gsub("AB", "XB", cur_column())]],
.names = "sum_{.col}"))

This returns

# A tibble: 1 x 9
AB1 AB3 AB4 XB1 XB3 XB4 sum_AB1 sum_AB3 sum_AB4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12 34 0 5 3 7 17 37 7
  • We use across and mutate in this approach.
  • First we select all columns starting with AB. The desired sums are always ABn + XB2, so we can use this pattern.
  • Next we replace AB in the name of the current selected column with XB and sum those two columns. These sums are stored in a new column prefixed with sum_.

Row-wise sum of values grouped by columns with same name

We can transpose dat , calculate rowsum per group (colnames of the original dat), then transpose the result back to original structure.

t(rowsum(t(dat), group = colnames(dat), na.rm = T))
# A C G T
#1 1 0 1 0
#2 4 0 6 0
#3 0 1 0 1
#4 2 0 1 0
#5 1 0 1 0
#6 0 1 0 1
#7 0 1 0 1

Sum multiple columns that have specific name in columns

Another dplyr way is to use helper functions starts_with to select columns and then use rowSums to sum those columns.

library(dplyr)

df$Vars <- df %>% select(starts_with("Var")) %>% rowSums()
df$Cols <- df %>% select(starts_with("Col")) %>% rowSums()

df
# ID Var1 Var2 Col1 Col2 Vars Cols
#1 1 34 22 34 24 56 58
#2 2 3 25 54 65 28 119
#3 3 87 68 14 78 155 92
#4 4 66 98 98 100 164 198
#5 5 55 13 77 2 68 79

sum of one column (with same name) in all data frames in a list

You can use colSums, i.e.

do.call(rbind, lapply(l, function(i)colSums(i['X1'])))
# X1
#[1,] 6
#[2,] 8

Replace column based on column names

With ifelse and sapply:

df[2:ncol(df)] <- sapply(2:ncol(df), function(i) ifelse(df[i] == colnames(df[i]), 2, 1))

output

#> df
Name D A D E
1 Rose 2 1 1 1
2 Smith 1 2 2 1
3 Lora 1 2 2 1
4 Javid 1 1 2 1
5 Ahmed 1 2 1 1
6 Helen 1 2 2 1
7 Nadia 1 2 2 1

data

df <- structure(list(Name = c("Rose", "Smith", "Lora", "Javid", "Ahmed", 
"Helen", "Nadia"), D = c("D", "B", "A", "A", "C", "B", "A"),
A = c("D", "A", "A", "D", "A", "A", "A"), D = c("C", "D",
"D", "D", "E", "D", "D"), E = c("B", "D", "D", "B", "A",
"D", "A")), class = "data.frame", row.names = c(NA, -7L))


Related Topics



Leave a reply



Submit