How to sum rows by rows?
This is pretty similar to Jaap's comment, but a little more spelled out and uses the row names explicitly:
mat = as.matrix(dat[, 2:5])
row.names(mat) = dat$MUN
mat = rbind(mat, colSums(mat[c("Angra dos Reis (RJ)", "Areal (RJ)"), ], na.rm = T))
row.names(mat)[nrow(mat)] = "X"
mat
# X1990 X1991 X1992 X1993
# Angra dos Reis (RJ) 11 10 10 10
# Aperibé (RJ) NA NA NA NA
# Araruama (RJ) 12040 14589 14231 14231
# Areal (RJ) NA NA NA 3
# Armação dos Búzios (RJ) NA NA NA NA
# X 11 10 10 13
The result is a matrix
, you can convert it back to a data frame if needed:
dat_result = data.frame(MUN = row.names(mat), mat, row.names = NULL)
I dislike the format of your data as a data frame. I would either convert it to a matrix (as above) or convert it to long format with, e.g., tidyr::gather(dat, key = year, value = value, -MUN)
and work with it "by group" using data.table
or dplyr
.
Using this data:
dat = read.table(text = " MUN X1990 X1991 X1992 X1993
1 'Angra dos Reis (RJ)' 11 10 10 10
2 'Aperibé (RJ)' NA NA NA NA
3 'Araruama (RJ)' 12040 14589 14231 14231
4 'Areal (RJ)' NA NA NA 3
5 'Armação dos Búzios (RJ)' NA NA NA NA", header= T)
Summing Entries in Multiple Unequally-Sized Data Frames With Some (but not All) Rows and Columns the Same
I think this should work. With row AND column names and one data type, I prefer matrices to data frames, but you can convert the final matrix back to a data frame if you need.
# put things in a list
df_list = list(df1, df2, df3)
# get the complete set of row and column names
all_rows = unique(unlist(lapply(df_list, rownames)))
all_cols = unique(unlist(lapply(df_list, colnames)))
# initialize a final matrix to NA
final_mat = matrix(NA, nrow = length(all_rows), ncol = length(all_cols))
rownames(final_mat) = all_rows
colnames(final_mat) = all_cols
# go through each df in the list
for(i in seq_along(df_list)) {
# set any NAs in the selection to 0
final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])][is.na(final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])])] = 0
# add the data frame to the selection
final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])] = final_mat[rownames(df_list[[i]]), colnames(df_list[[i]])] + as.matrix(df_list[[i]])
}
final_mat
# A B D C E F
# row1 1 7 4 1 2 NA
# row2 2 4 5 NA NA NA
# row3 15 28 6 2 3 2
# row4 4 6 7 NA NA NA
# row5 5 13 8 3 4 NA
# row6 6 8 9 NA NA NA
# row7 7 16 10 4 5 NA
# row8 8 10 11 NA NA NA
# row9 19 27 12 NA NA 4
# row10 21 29 13 NA NA 3
# row11 NA 8 NA 5 6 NA
# row12 13 19 NA NA NA 1
Row Wise Addition in R
If it is just across all columns, then you can just use rowSums
.
library(dplyr)
df %>%
mutate(score = rowSums(df))
Or in base R
df$score <- rowSums(df)
Output
all_d tit_for_tat perm_retal random joss tester tit_for_two score
1 200 199 199 102 199 198 198 1295
2 204 600 600 483 215 599 600 3301
3 204 600 600 113 208 202 600 2527
4 579 440 608 445 481 429 381 3363
5 204 250 245 417 201 251 538 2106
6 208 599 207 453 225 598 303 2593
7 208 600 600 610 639 798 600 4055
8 1807 3288 3059 2623 2168 3075 3220 19240
sum all rows pairwise in two data frames and save to matrix
Make it to a matrix and add up them.
Directly add two data.frame
also works as well.
df1 = data.frame(colA = c(30, 3, 15), colB = c(2, 100, 9))
df2 = data.frame(colA = c(10, 0, 55), colB = c(200, 10, 1))
as.matrix(df1)+ as.matrix(df2)
df1+df2
> as.matrix(df1)+ as.matrix(df2)
colA colB
[1,] 40 202
[2,] 3 110
[3,] 70 10
> df1+df2
colA colB
1 40 202
2 3 110
3 70 10
calculate row sum and product in data.frame
Try
transform(df, sum=rowSums(df), prod=x*y*z)
# x y z sum prod
#1 1 2 3 6 6
#2 2 3 4 9 24
#3 5 1 2 8 10
Or
transform(df, sum=rowSums(df), prod=Reduce(`*`, df))
# x y z sum prod
#1 1 2 3 6 6
#2 2 3 4 9 24
#3 5 1 2 8 10
Another option would be to use rowProds
from matrixStats
library(matrixStats)
transform(df, sum=rowSums(df), prod=rowProds(as.matrix(df)))
If you are using apply
df[,c('sum', 'prod')] <- t(apply(df, 1, FUN=function(x) c(sum(x), prod(x))))
df
# x y z sum prod
#1 1 2 3 6 6
#2 2 3 4 9 24
#3 5 1 2 8 10
Filtering out columns from an R data.frame based on the sum of its values for all rows
We can use select
with a condition on the sum
i.e. if the sum
of that column greater than threshold, then select it
library(dplyr)
subDf <- df %>%
select(where( ~ sum(.) >= pestCutoff))
NOTE: Here we assume that the condition should be applied to all columns and all of them are numeric. If there are other columns types and do not want to select, create a short circuit
subDf <- df %>%
select(where(~ is.numeric(.) && sum(.) >= pestCutoff))
Or in base R
with Filter
Filter(function(x) sum(x) >= pestCutoff, df)
Or with colSums
df[colSums(df) >= pestCutoff]
Related Topics
Transform Year/Week to Date Object
Increasing (Or Decreasing) the Memory Available to R Processes
Replace Values in a Dataframe Based on Lookup Table
Predict() - Maybe I'M Not Understanding It
How to Use Pivot_Longer to Reshape from Wide-Type Data to Long-Type Data With Multiple Variables
Grep Using a Character Vector With Multiple Patterns
How to Convert Excel Date Format to Proper Date in R
How to Escape Backslashes in R String
Include Levels of Zero Count in Result of Table()
Formatting Decimal Places in R
Select/Assign to Data.Table When Variable Names Are Stored in a Character Vector
Cbind a Dataframe With an Empty Dataframe - Cbind.Fill
Can Dplyr Package Be Used For Conditional Mutating
Select the N Most Frequent Values in a Variable
Rcpp Package Doesn't Include Rcpp_Precious_Remove