How do I make a column that sums only numeric columns?
I ended up using this code:
df$SUMCOL <- rowSums(df[sapply(df, is.numeric)], na.rm = TRUE)
Sum all values in every column of a data.frame in R
You can use function colSums()
to calculate sum of all values. [,-1]
ensures that first column with names of people is excluded.
colSums(people[,-1])
Height Weight
199 425
Assuming there could be multiple columns that are not numeric, or that your column order is not fixed, a more general approach would be:
colSums(Filter(is.numeric, people))
Sum over numeric columns and report indices of column sums that lie within a specified range
Find out the columns that are numeric and use colSums
to get their sum.
cols <- sapply(dat, is.numeric)
column_sum <- colSums(dat[cols])
column_sum
# X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 55 155 255 355 455 555 655 755 855 955
To find out values that are in range you can do :
column_sum[column_sum >= 655 & column_sum <= 855]
# X7 X8 X9
#655 755 855
#to get only the names
names(column_sum[column_sum >= 655 & column_sum <= 855])
#[1] "X7" "X8" "X9"
Using dplyr
:
library(dplyr)
dat %>%
summarise(across(where(is.numeric), sum)) %>%
select(where(~between(., 655, 855)))
Normalizing columns in mixed numeric/non-numeric DataFrame with tidyverse (dplyr)?
First problem
test = df %>% mutate_if(is.numeric, ~./sum(.))
test %>% select_if(is.numeric) %>% colSums( ,na.rm = T)
test = df %>% mutate_if(is.numeric, function(x) x/sum(x))
test %>% select_if(is.numeric) %>% colSums()
You can handle your problem specifying na.rm = T
such that you don't keep NA
.
They occur because you divide by 0.
It is the same thing for the second syntax which does the same. mutate_if
apply for each numeric column the desired operation so for the third one it returns Nan because of 0.
Second problem
test = df %>% mutate_if(is.numeric, function(x){ifelse(x > 0, x/sum(x), rep(0, length(x)))})
test %>% select_if(is.numeric) %>% colSums()
test = df %>% mutate_if(is.numeric, function(x) ifelse(sum(x)>0, x/sum(x), 0))
test %>% select_if(is.numeric) %>% colSums()
ifelse returns a value with the same shape as test so in your case because you check 'sum(x) > 0' you return only the first value. See :
https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/ifelse
Third problem
test = df %>% mutate_if(is.numeric, ~apply(., 2, function(x) x/sum(x)))
Here, it is tricky, mutate_if apply by vector and you want to use apply next but your object is a vector and apply is correct only for object like matrix
or data.frame
with at least two columns.
One good answer
test = df %>% mutate_if(is.numeric, function(x) if(sum(x)>0) x/sum(x))
test %>% select_if(is.numeric) %>% colSums()
Indeed it is a right syntax because if
doesn't require to return a specific size of object.
However you could also use ifelse
but with a vector condition indeed a sum of positive value isn't nul if at least one element is different from 0.
test = df %>% mutate_if(is.numeric, function(x){ifelse(x > 0, x/sum(x), rep(0, length(x)))})
test %>% select_if(is.numeric) %>% colSums()
I hope it helps you to understand what is going on when error appears. The solution isn't unique.
Edit 1 :
The reason is : you return something only if your sum is stricly greater than 0. You must specify what to do if not. Like this for instance :
test = df %>% mutate_if(is.numeric, function(x) if(sum(x)>0){x/sum(x)}else{0})
Delete columns in a matrix with value 0 when all cols are not numeric
If you have matrix it can have only one class so all the numbers would also turn into characters if you have any non-numeric element in it. In that case, you can do
mymatrix[, colSums(mymatrix != "0") != 0]
# d1 d3 d4 d5559 d5560 d5561
#[1,] "R1" "3" "4" "9" "grey" "simone"
#[2,] "R1" "2" "2" "7" "blue" "Emma"
#[3,] "R1" "1" "2" "4" "grey" "simone"
#[4,] "R1" "3" "2" "8" "red" "Evelyn"
Or other way around
mymatrix[, colSums(mymatrix == "0") == 0]
You can also use apply
column-wise with same logic
mymatrix[, apply(mymatrix != "0", 2, any)]
and
mymatrix[, !apply(mymatrix == "0", 2, all)]
data
mymatrix <- structure(c("R1", "R1", "R1", "R1", "0", "0", "0", "0", "3",
"2", "1", "3", "4", "2", "2", "2", "0", "0", "0", "0", "9", "7",
"4", "8", "grey", "blue", "grey", "red", "simone", "Emma", "simone",
"Evelyn"), .Dim = c(4L, 8L), .Dimnames = list(NULL, c("d1", "d2",
"d3", "d4", "d5", "d5559", "d5560", "d5561")))
Select or subset variables whose column sums are not zero
Try this:
df %>% select_if(~ !is.numeric(.) || sum(.) != 0)
# A C D
# 1 a 3 0
# 2 a 0 3
# 3 b 0 2
# 4 c 1 1
# 5 c 1 4
# 6 d 2 5
The rationale is that for ||
if the left-side is TRUE
, the right-side won't be evaluated.
Note:
- the second argument for
select_if
should be a function name or formula (lambda function). the~
is necessary to tellselect_if
that!is.numeric(.) || sum(.) != 0
should be converted to a function. - As commented below by @zx8754,
is.factor(.)
should be used if one only wants to keepfactor
columns.
Edit: a base R solution
cols <- c('B', 'C', 'D')
cols.to.keep <- cols[colSums(df[cols]) != 0]
df[!names(df) %in% cols || names(df) %in% cols.to.keep]
How do I summarise all columns except one(s) I specify?
Edit:
Modified versions of the two methods below for dplyr version >= 1, since summarise_at
is superseded
df %>%
summarise(across(where(is.numeric) & !Registered, sum))
df %>%
summarise(across(-Registered, sum))
Original Answer:
I would use summarise_at
, and just make a logical vector which is FALSE
for non-numeric columns and Registered
and TRUE
otherwise, i.e.
df %>%
summarise_at(which(sapply(df, is.numeric) & names(df) != 'Registered'), sum)
If you wanted to just summarise all but one column you could do
df %>%
summarise_at(vars(-Registered), sum)
but in this case you have to check if it's numeric also.
Notes:
factors are technically numeric, so if you want to exclude non-numeric columns and factors, replace
sapply(df, is.numeric)
withsapply(df, function(x) is.numeric(x) & !is.factor(x))
If your data is big I think it is faster to use
sapply(df[1,], is.numeric)
instead ofsapply(df, is.numeric)
. (Someone please correct me if I'm wrong)
add row to a data frame that calculates sums of all numeric columns
The janitor package has this ready to go:
library(janitor)
df %>%
adorn_totals("row", fill = "Total")
segment subSegment var.1 var.2 var.3 var.4
seg1 subseg1 100 200 50 60
seg1 subseg2 20 30 50 50
seg2 subseg1 30 30 40 35
seg2 subseg2 50 70 20 53
seg3 subseg1 40 30 30 42
seg3 subseg2 40 140 40 20
Total Total 280 500 230 260
Related Topics
R - Cumulative Sum by Condition
Ggplot2: Fill Color Behaviour of Geom_Ribbon
Flexdashboard - Change Title Bar Color
R Histogram with Multiple Populations
Rbindlist Two Data.Tables Where One Has Factor and Other Has Character Type for a Column
How to Draw Half-Filled Points in R (Preferably Using Ggplot)
Applying Gsub to Various Columns
Order Categorical Data in a Stacked Bar Plot with Ggplot2
Boxplot of Table Using Ggplot2
Draw Multiple Squares with Ggplot
How Does R's Ifelse Work with Character Data
Group_By() into Fill() Not Working as Expected
Retain Attributes When Using Gather from Tidyr (Attributes Are Not Identical)
Getting File Path from Shiny UI (Not Just Directory) Using Browse Button Without Uploading the File
Ordered Factors in Ggplot2 Bar Chart
How to Add Overlapping Histograms with Lattice
Running Out of Heap Space in Sparklyr, But Have Plenty of Memory