Sum columns by group (row names) in a matrix
Here's a vectorized base solution
rowsum(df, row.names(x))
# Mon Tue Wed Thurs
# Cake 2 1 1 2
# Pie 0 0 3 3
Or data.table
version using keep.rownames = TRUE
in order to convert your row names to a column
library(data.table)
as.data.table(x, keep.rownames = TRUE)[, lapply(.SD, sum), by = rn]
# rn Mon Tue Wed Thurs
# 1: Cake 2 1 1 2
# 2: Pie 0 0 3 3
row and column matrix sum in R by group
We can do the sum
with xtabs
after changing the dimnames
with the substr
of 1st 4 characters
dimnames(m1) <- lapply(dimnames(m1), substr, 1, 4)
xtabs(Freq~ Var1 + Var2, as.data.frame.table(m1))
# Var2
#Var1 UKC1 UKC2
# UKC1 14 22
# UKC2 46 54
data
m1 <- structure(c(1L, 5L, 9L, 13L, 2L, 6L, 10L, 14L, 3L, 7L, 11L, 15L,
4L, 8L, 12L, 16L), .Dim = c(4L, 4L), .Dimnames = list(c("UKC1_SS1",
"UKC1_SS2", "UKC2_SS1", "UKC2_SS2"), c("UKC1_SS1", "UKC1_SS2",
"UKC2_SS1", "UKC2_SS1.1")))
Sum row-wise values that are grouped by column name but keep all columns in R?
You can try ave
like below (with aids of col
+ row
)
> ave(myMat,colnames(myMat)[col(myMat)], row(myMat), FUN = sum)
x y x y
[1,] 1 3 1 3
[2,] 5 9 5 9
[3,] 4 13 4 13
Sum values in rows with same names in R
We can use rowsum
. Assume that the dataset showed is matrix
and not data.frame
as data.frame
cannot have duplicated row names
rowsum(df1, row.names(df1))
Or using aggregate
aggregate(df1, list(row.names(df1)), sum)
data
df1 <- structure(c(5L, 3L, 7L, 1L, 3L, 6L, 6L, 4L, 2L, 7L), .Dim = c(5L,
2L), .Dimnames = list(c("bacteria", "bacteria", "bacteria", "archaea",
"archaea"), c("category1", "category2")))
How to calculate sum of values in each column based on row names in R?
We can use colSums
with startsWith
colSums(mat[startsWith(row.names(mat), "A"),])
Calculate summary statistics for each row of a matrix based on columns grouped by column names
This is the best scenario to use tapply
:
tapply(t(data), list(col(data), array(colnames(data), dim(t(data)))), mean)
A B
1 3 8
2 13 18
3 23 28
4 33 38
5 43 48
6 53 58
7 63 68
8 73 78
9 83 88
10 93 98
tapply(data, list(t(colnames(data))[rep(1,nrow(data)), ], row(data)), mean)
1 2 3 4 5 6 7 8 9 10
A 3 13 23 33 43 53 63 73 83 93
B 8 18 28 38 48 58 68 78 88 98
tapply(t(data), interaction(colnames(data), col(data)), mean)
A.1 B.1 A.2 B.2 A.3 B.3 A.4 B.4 A.5 B.5 A.6 B.6 A.7 B.7 A.8 B.8 A.9 B.9 A.10 B.10
3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98
More base R solutions:
sapply(split.default(data.frame(data), colnames(data)), rowMeans)
A B
[1,] 3 8
[2,] 13 18
[3,] 23 28
[4,] 33 38
[5,] 43 48
[6,] 53 58
[7,] 63 68
[8,] 73 78
[9,] 83 88
[10,] 93 98
data.frame(data) |>
reshape(split(1:ncol(data), colnames(data)), dir = 'long') |>
(\(x)aggregate(.~id, x, mean))()
id time A B
1 1 3 3 8
2 2 3 13 18
3 3 3 23 28
4 4 3 33 38
5 5 3 43 48
6 6 3 53 58
7 7 3 63 68
8 8 3 73 78
9 9 3 83 88
10 10 3 93 98
R sum rows of matrix by column name
You can use rowsum
with the column names as group
variable:
t(rowsum(t(z), colnames(z)))
# a b c
#[1,] 8 20 9
#[2,] 11 7 3
#[3,] 8 18 8
#[4,] 8 11 10
Row-wise sum of values grouped by columns with same name
We can transpose dat
, calculate rowsum
per group (colnames
of the original dat
), then transpose the result back to original structure.
t(rowsum(t(dat), group = colnames(dat), na.rm = T))
# A C G T
#1 1 0 1 0
#2 4 0 6 0
#3 0 1 0 1
#4 2 0 1 0
#5 1 0 1 0
#6 0 1 0 1
#7 0 1 0 1
Related Topics
Solve Homogenous System Ax = 0 for Any M * N Matrix a in R (Find Null Space Basis for A)
Ggplot2: Geom_Smooth Confidence Band Does Not Extend to Edge of Graph, Even with Fullrange=True
Separate a Column into Multiple Columns Using Tidyr::Separate with Sep=""
How to Read Column Names 'As Is' from CSV File
R Packages Fail to Compile with Gcc
Using Jupyter R Kernel with Visual Studio Code
Convert Month's Number to Month Name
In R, Switch Uppercase to Lowercase and Vice-Versa in a String
Finding If Boolean Is Ever True by Groups in R
R Programming: Read.Csv() Skips Lines Unexpectedly
How Is Ggplot2 Plus Operator Defined
Why Does Withcallinghandlers Still Stops Execution
Read Column Names as Date Format
Calculating the Distance Between Points in Different Data Frames