Sum of Two Columns of Data Frame with Na Values

Sum of two Columns of Data Frame with NA Values

dat$e <- rowSums(dat[,c("b", "c")], na.rm=TRUE)
dat
# a b c d e
# 1 1 2 3 4 5
# 2 5 NA 7 8 7

Sum two dataframes with NA values and factors

Base R Version:

library(dplyr) # only for pipe operator
rbind(data1, data2) %>%
split(.$NAMES) %>%
lapply(function(x){
data.frame(NAMES = unique(x$NAMES),as.list(colSums(x[,-1])))
}) %>%
do.call(rbind, .)

# NAMES X1 X2
# name1 name1 5 NA
# name2 name2 NA 22
# name3 name3 9 24

Notice that NAMES now also appears as rownames. This is because split outputs a named list. You can either keep the rownames and remove NAMES = unique(x$NAMES), or add an unname() pipe after split:

rbind(data1, data2) %>%
split(.$NAMES) %>%
lapply(function(x){
data.frame(as.list(colSums(x[,-1])))
}) %>%
do.call(rbind, .)

# X1 X2
# name1 5 NA
# name2 NA 22
# name3 9 24

rbind(data1, data2) %>%
split(.$NAMES) %>%
unname() %>%
lapply(function(x){
data.frame(NAMES = unique(x$NAMES),as.list(colSums(x[,-1])))
}) %>%
do.call(rbind, .)

# NAMES X1 X2
# 1 name1 5 NA
# 2 name2 NA 22
# 3 name3 9 24

To treat NA's as zeros, just add na.rm = TRUE to colSums:

rbind(data1, data2) %>%
split(.$NAMES) %>%
unname() %>%
lapply(function(x){
data.frame(NAMES = unique(x$NAMES),as.list(colSums(x[,-1], na.rm = TRUE)))
}) %>%
do.call(rbind, .)

# NAMES X1 X2
# 1 name1 5 10
# 2 name2 0 22
# 3 name3 9 24

dplyr + purrr Version:

library(purrr)
library(dplyr)

list(data1, data2) %>%
reduce(function(x, y) cbind(NAMES = x$NAMES, x[,-1] + y[-1]))

Result:

  NAMES X1 X2
1 name1 5 NA
2 name2 NA 22
3 name3 9 24

To treat NA's as zero:

list(data1, data2) %>%
map(function(x){
modify_if(x, is.numeric, function(y) ifelse(is.na(y), 0, y))
}) %>%
reduce(function(x, y) cbind(NAMES = x$NAMES, x[,-1] + y[-1]))

Result:

  NAMES X1 X2
1 name1 5 10
2 name2 0 22
3 name3 9 24

Important Note:

Replacing NA's with zeros is often a bad idea since they mean different things. NA could mean that the data is missing, not necessarily zero, so replacing all NA's with zeros could bias your results. Please only do it if you are sure that NA's mean zero in the context of your data.

Additional Notes:

  1. Both map and modify_if are from the purrr package. map applies a function to each element of a list and always returns a list. modify does the same except that it returns the same type as the input.
  2. modify_if only "maps" the elements that satisfy a condition.
  3. In the first pipe, I used map to "map" each element of list(data1, data2) with the modify_if function, while modify_if replaces NA's with zeros for each numeric column only. This way I can use the + operator in the next pipe without worrying about NA's.
  4. reduce does matrix addition on data1 and data2, then cbinds it with NAMES column from data1.

Pandas sum of two columns - dealing with nan-values correctly

From the documentation pandas.DataFrame.sum

By default, the sum of an empty or all-NA Series is 0.

>>> pd.Series([]).sum() # min_count=0 is the default 0.0

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

Change your code to

data.loc[:,'Sum'] = data.loc[:,['Surf1','Surf2']].sum(axis=1, min_count=1)

output

   Surf1  Surf2
0 10.0 22.0
1 NaN 8.0
2 8.0 15.0
3 NaN NaN
4 16.0 14.0
5 15.0 7.0
Surf1 Surf2 Sum
0 10.0 22.0 32.0
1 NaN 8.0 8.0
2 8.0 15.0 23.0
3 NaN NaN NaN
4 16.0 14.0 30.0
5 15.0 7.0 22.0

Pandas Summing Two Columns with Nan

You can use add to get your sums, with fill_value=0:

>>> d.col1.add(d.col2, fill_value=0)
0 1.0
1 4.0
dtype: float64

>>> d.col1.add(d.col3, fill_value=0)
0 6.0
1 NaN
dtype: float64

Pandas sum two columns, skipping NaN

with fillna()

frame['c'] = frame.fillna(0)['a'] + frame.fillna(0)['b']

or as suggested :

frame['c'] = frame.a.fillna(0) + frame.b.fillna(0)

giving :

    a   b  c
0 1 3 4
1 2 NaN 2
2 NaN 4 4

sum the column values(group_by) keeping NA values and not replacing with zero in R

You could use rank with na.last = "keep" to give rank as NA

library(dplyr)

df %>%
group_by(column2) %>%
summarise(column3 = if(all(is.na(column3))) NA else
sum(column3, na.rm = TRUE)) %>%
ungroup %>%
mutate(rank = rank(-column3, na.last = "keep"))

# column2 column3 rank
# <fct> <int> <dbl>
#1 gb 14 2
#2 Hs 83 1
#3 Rd NA NA

How to sum values in multiple rows to a new column in R?

Update II on new request:

library(dplyr)

df %>%
group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
  Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Apple 2 0.1 0.7
3 Apple 3 0.2 0.4
4 Apple 4 0.2 0.4
5 Apple 5 0.1 0.7
6 Apple 6 0.5 0.7
7 Blueberry 1 0.2 0.2
8 Blueberry 2 0.1 0.6
9 Blueberry 3 0.3 0.8
10 Blueberry 4 0.5 0.8
11 Blueberry 5 0.4 0.6
12 Blueberry 6 0.1 0.6

Update: on new request of OP. This solution is inspired fully by PaulS solution (credits to him):

library(dplyr)

df %>%
group_by(grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
  Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Blueberry 2 0.1 0.7
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.4
5 Eggplant 5 0.1 0.7
6 Fruits 6 0.5 0.7

First answer:
We could sum Gamma after identifying odd and even rows in an ifelse statement:
In this case row_number could be replaced by Topic

library(dplyr)

df %>%
mutate(new_variable = ifelse(row_number() %% 2 == 1,
sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
sum(Gamma[row_number() %% 2 == 0])) # even 2,4
)
  Observation Topic Gamma new_variable
1 Apple 1 0.1 0.4
2 Blueberry 2 0.1 0.3
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.3
5 Eggplant 5 0.1 0.4

data:

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))

Microbenchmark: AndrewGB's base R is fastest

Sample Image



Related Topics



Leave a reply



Submit