R: Colsums When Not All Columns Are Numeric

How do I make a column that sums only numeric columns?

I ended up using this code:

df$SUMCOL <- rowSums(df[sapply(df, is.numeric)], na.rm = TRUE)

Sum all values in every column of a data.frame in R

You can use function colSums() to calculate sum of all values. [,-1] ensures that first column with names of people is excluded.

 colSums(people[,-1])
Height Weight
199 425

Assuming there could be multiple columns that are not numeric, or that your column order is not fixed, a more general approach would be:

colSums(Filter(is.numeric, people))

Sum over numeric columns and report indices of column sums that lie within a specified range

Find out the columns that are numeric and use colSums to get their sum.

cols <- sapply(dat, is.numeric)
column_sum <- colSums(dat[cols])
column_sum

# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 55 155 255 355 455 555 655 755 855 955

To find out values that are in range you can do :

column_sum[column_sum >= 655 & column_sum <= 855]
# X7 X8 X9
#655 755 855

#to get only the names
names(column_sum[column_sum >= 655 & column_sum <= 855])
#[1] "X7" "X8" "X9"

Using dplyr :

library(dplyr)
dat %>%
summarise(across(where(is.numeric), sum)) %>%
select(where(~between(., 655, 855)))

Normalizing columns in mixed numeric/non-numeric DataFrame with tidyverse (dplyr)?

First problem

test = df %>% mutate_if(is.numeric, ~./sum(.))
test %>% select_if(is.numeric) %>% colSums( ,na.rm = T)

test = df %>% mutate_if(is.numeric, function(x) x/sum(x))
test %>% select_if(is.numeric) %>% colSums()

You can handle your problem specifying na.rm = T such that you don't keep NA.
They occur because you divide by 0.
It is the same thing for the second syntax which does the same. mutate_if apply for each numeric column the desired operation so for the third one it returns Nan because of 0.

Second problem

test = df %>% mutate_if(is.numeric, function(x){ifelse(x > 0, x/sum(x), rep(0, length(x)))})
test %>% select_if(is.numeric) %>% colSums()

test = df %>% mutate_if(is.numeric, function(x) ifelse(sum(x)>0, x/sum(x), 0))
test %>% select_if(is.numeric) %>% colSums()

ifelse returns a value with the same shape as test so in your case because you check 'sum(x) > 0' you return only the first value. See :

https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/ifelse

Third problem

test = df %>% mutate_if(is.numeric, ~apply(., 2, function(x) x/sum(x)))

Here, it is tricky, mutate_if apply by vector and you want to use apply next but your object is a vector and apply is correct only for object like matrix or data.frame with at least two columns.

One good answer

test = df %>% mutate_if(is.numeric, function(x) if(sum(x)>0) x/sum(x))
test %>% select_if(is.numeric) %>% colSums()

Indeed it is a right syntax because if doesn't require to return a specific size of object.

However you could also use ifelse but with a vector condition indeed a sum of positive value isn't nul if at least one element is different from 0.

test = df %>% mutate_if(is.numeric, function(x){ifelse(x > 0, x/sum(x), rep(0, length(x)))})
test %>% select_if(is.numeric) %>% colSums()

I hope it helps you to understand what is going on when error appears. The solution isn't unique.

Edit 1 :

The reason is : you return something only if your sum is stricly greater than 0. You must specify what to do if not. Like this for instance :

test = df %>% mutate_if(is.numeric, function(x) if(sum(x)>0){x/sum(x)}else{0})

Delete columns in a matrix with value 0 when all cols are not numeric

If you have matrix it can have only one class so all the numbers would also turn into characters if you have any non-numeric element in it. In that case, you can do

mymatrix[, colSums(mymatrix != "0") != 0]

# d1 d3 d4 d5559 d5560 d5561
#[1,] "R1" "3" "4" "9" "grey" "simone"
#[2,] "R1" "2" "2" "7" "blue" "Emma"
#[3,] "R1" "1" "2" "4" "grey" "simone"
#[4,] "R1" "3" "2" "8" "red" "Evelyn"

Or other way around

mymatrix[, colSums(mymatrix == "0") == 0]

You can also use apply column-wise with same logic

mymatrix[, apply(mymatrix != "0", 2, any)]

and

mymatrix[, !apply(mymatrix == "0", 2, all)]

data

mymatrix <- structure(c("R1", "R1", "R1", "R1", "0", "0", "0", "0", "3", 
"2", "1", "3", "4", "2", "2", "2", "0", "0", "0", "0", "9", "7",
"4", "8", "grey", "blue", "grey", "red", "simone", "Emma", "simone",
"Evelyn"), .Dim = c(4L, 8L), .Dimnames = list(NULL, c("d1", "d2",
"d3", "d4", "d5", "d5559", "d5560", "d5561")))

Select or subset variables whose column sums are not zero

Try this:

df %>% select_if(~ !is.numeric(.) || sum(.) != 0)
# A C D
# 1 a 3 0
# 2 a 0 3
# 3 b 0 2
# 4 c 1 1
# 5 c 1 4
# 6 d 2 5

The rationale is that for || if the left-side is TRUE, the right-side won't be evaluated.

Note:

  • the second argument for select_if should be a function name or formula (lambda function). the ~ is necessary to tell select_if that !is.numeric(.) || sum(.) != 0 should be converted to a function.
  • As commented below by @zx8754, is.factor(.)should be used if one only wants to keep factor columns.

Edit: a base R solution

cols <- c('B', 'C', 'D')
cols.to.keep <- cols[colSums(df[cols]) != 0]
df[!names(df) %in% cols || names(df) %in% cols.to.keep]

How do I summarise all columns except one(s) I specify?

Edit:

Modified versions of the two methods below for dplyr version >= 1, since summarise_at is superseded

df %>% 
summarise(across(where(is.numeric) & !Registered, sum))

df %>%
summarise(across(-Registered, sum))

Original Answer:

I would use summarise_at, and just make a logical vector which is FALSE for non-numeric columns and Registered and TRUE otherwise, i.e.

df %>% 
summarise_at(which(sapply(df, is.numeric) & names(df) != 'Registered'), sum)

If you wanted to just summarise all but one column you could do

df %>% 
summarise_at(vars(-Registered), sum)

but in this case you have to check if it's numeric also.

Notes:

  • factors are technically numeric, so if you want to exclude non-numeric columns and factors, replace sapply(df, is.numeric) with sapply(df, function(x) is.numeric(x) & !is.factor(x))

  • If your data is big I think it is faster to use sapply(df[1,], is.numeric) instead of sapply(df, is.numeric). (Someone please correct me if I'm wrong)

add row to a data frame that calculates sums of all numeric columns

The janitor package has this ready to go:


library(janitor)

df %>%
adorn_totals("row", fill = "Total")

segment subSegment var.1 var.2 var.3 var.4
seg1 subseg1 100 200 50 60
seg1 subseg2 20 30 50 50
seg2 subseg1 30 30 40 35
seg2 subseg2 50 70 20 53
seg3 subseg1 40 30 30 42
seg3 subseg2 40 140 40 20
Total Total 280 500 230 260


Related Topics



Leave a reply



Submit