Sum of two Columns of Data Frame with NA Values
dat$e <- rowSums(dat[,c("b", "c")], na.rm=TRUE)
dat
# a b c d e
# 1 1 2 3 4 5
# 2 5 NA 7 8 7
Sum two dataframes with NA values and factors
Base R
Version:
library(dplyr) # only for pipe operator
rbind(data1, data2) %>%
split(.$NAMES) %>%
lapply(function(x){
data.frame(NAMES = unique(x$NAMES),as.list(colSums(x[,-1])))
}) %>%
do.call(rbind, .)
# NAMES X1 X2
# name1 name1 5 NA
# name2 name2 NA 22
# name3 name3 9 24
Notice that NAMES now also appears as rownames. This is because split
outputs a named list. You can either keep the rownames and remove NAMES = unique(x$NAMES)
, or add an unname()
pipe after split
:
rbind(data1, data2) %>%
split(.$NAMES) %>%
lapply(function(x){
data.frame(as.list(colSums(x[,-1])))
}) %>%
do.call(rbind, .)
# X1 X2
# name1 5 NA
# name2 NA 22
# name3 9 24
rbind(data1, data2) %>%
split(.$NAMES) %>%
unname() %>%
lapply(function(x){
data.frame(NAMES = unique(x$NAMES),as.list(colSums(x[,-1])))
}) %>%
do.call(rbind, .)
# NAMES X1 X2
# 1 name1 5 NA
# 2 name2 NA 22
# 3 name3 9 24
To treat NA's as zeros, just add na.rm = TRUE
to colSums
:
rbind(data1, data2) %>%
split(.$NAMES) %>%
unname() %>%
lapply(function(x){
data.frame(NAMES = unique(x$NAMES),as.list(colSums(x[,-1], na.rm = TRUE)))
}) %>%
do.call(rbind, .)
# NAMES X1 X2
# 1 name1 5 10
# 2 name2 0 22
# 3 name3 9 24
dplyr
+ purrr
Version:
library(purrr)
library(dplyr)
list(data1, data2) %>%
reduce(function(x, y) cbind(NAMES = x$NAMES, x[,-1] + y[-1]))
Result:
NAMES X1 X2
1 name1 5 NA
2 name2 NA 22
3 name3 9 24
To treat NA's as zero:
list(data1, data2) %>%
map(function(x){
modify_if(x, is.numeric, function(y) ifelse(is.na(y), 0, y))
}) %>%
reduce(function(x, y) cbind(NAMES = x$NAMES, x[,-1] + y[-1]))
Result:
NAMES X1 X2
1 name1 5 10
2 name2 0 22
3 name3 9 24
Important Note:
Replacing NA's with zeros is often a bad idea since they mean different things. NA could mean that the data is missing, not necessarily zero, so replacing all NA's with zeros could bias your results. Please only do it if you are sure that NA's mean zero in the context of your data.
Additional Notes:
- Both
map
andmodify_if
are from thepurrr
package.map
applies a function to each element of a list and always returns a list.modify
does the same except that it returns the same type as the input. modify_if
only "maps" the elements that satisfy a condition.- In the first pipe, I used
map
to "map" each element oflist(data1, data2)
with themodify_if
function, whilemodify_if
replaces NA's with zeros for each numeric column only. This way I can use the+
operator in the next pipe without worrying about NA's. reduce
does matrix addition ondata1
anddata2
, thencbind
s it withNAMES
column fromdata1
.
Pandas sum of two columns - dealing with nan-values correctly
From the documentation pandas.DataFrame.sum
By default, the sum of an empty or all-NA Series is 0.
>>> pd.Series([]).sum() # min_count=0 is the default 0.0
This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.
Change your code to
data.loc[:,'Sum'] = data.loc[:,['Surf1','Surf2']].sum(axis=1, min_count=1)
output
Surf1 Surf2
0 10.0 22.0
1 NaN 8.0
2 8.0 15.0
3 NaN NaN
4 16.0 14.0
5 15.0 7.0
Surf1 Surf2 Sum
0 10.0 22.0 32.0
1 NaN 8.0 8.0
2 8.0 15.0 23.0
3 NaN NaN NaN
4 16.0 14.0 30.0
5 15.0 7.0 22.0
Pandas Summing Two Columns with Nan
You can use add
to get your sums, with fill_value=0
:
>>> d.col1.add(d.col2, fill_value=0)
0 1.0
1 4.0
dtype: float64
>>> d.col1.add(d.col3, fill_value=0)
0 6.0
1 NaN
dtype: float64
Pandas sum two columns, skipping NaN
with fillna()
frame['c'] = frame.fillna(0)['a'] + frame.fillna(0)['b']
or as suggested :
frame['c'] = frame.a.fillna(0) + frame.b.fillna(0)
giving :
a b c
0 1 3 4
1 2 NaN 2
2 NaN 4 4
sum the column values(group_by) keeping NA values and not replacing with zero in R
You could use rank
with na.last = "keep"
to give rank
as NA
library(dplyr)
df %>%
group_by(column2) %>%
summarise(column3 = if(all(is.na(column3))) NA else
sum(column3, na.rm = TRUE)) %>%
ungroup %>%
mutate(rank = rank(-column3, na.last = "keep"))
# column2 column3 rank
# <fct> <int> <dbl>
#1 gb 14 2
#2 Hs 83 1
#3 Rd NA NA
How to sum values in multiple rows to a new column in R?
Update II on new request:
library(dplyr)
df %>%
group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Apple 2 0.1 0.7
3 Apple 3 0.2 0.4
4 Apple 4 0.2 0.4
5 Apple 5 0.1 0.7
6 Apple 6 0.5 0.7
7 Blueberry 1 0.2 0.2
8 Blueberry 2 0.1 0.6
9 Blueberry 3 0.3 0.8
10 Blueberry 4 0.5 0.8
11 Blueberry 5 0.4 0.6
12 Blueberry 6 0.1 0.6
Update: on new request of OP. This solution is inspired fully by PaulS solution (credits to him):
library(dplyr)
df %>%
group_by(grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Blueberry 2 0.1 0.7
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.4
5 Eggplant 5 0.1 0.7
6 Fruits 6 0.5 0.7
First answer:
We could sum Gamma
after identifying odd and even rows in an ifelse statement:
In this case row_number
could be replaced by Topic
library(dplyr)
df %>%
mutate(new_variable = ifelse(row_number() %% 2 == 1,
sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
sum(Gamma[row_number() %% 2 == 0])) # even 2,4
)
Observation Topic Gamma new_variable
1 Apple 1 0.1 0.4
2 Blueberry 2 0.1 0.3
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.3
5 Eggplant 5 0.1 0.4
data:
structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))
Microbenchmark: AndrewGB's base R is fastest
Related Topics
Text-Mining with the Tm-Package - Word Stemming
Coloring Boxplot Outlier Points in Ggplot2
Applying a Function to Each Row of a Data.Table
Trouble Passing on an Argument to Function Within Own Function
Use Rollapply and Zoo to Calculate Rolling Average of a Column of Variables
Documentation on Internal Variables in Ggplot, Esp. Panel
Return Pmin or Pmax of Data.Frame with Multiple Columns
Convert/Export Googleway Output to Data Frame
How to Read Huge CSV File into R by Row Condition
Source Script to Separate Environment in R, Not the Global Environment
Knit One Markdown File to Two Output Files
Represent Numeric Value with Typical Dollar Amount Format
Numbers as Column Names of Data Frames