R: Calculating row mean based on column name partial match
We can split
and get the rowMeans
sapply(split.default(df1, sub("\\..*", "", names(df1))), rowMeans)
# as er op rt
#a 34.66667 3.5 87 4
#b 22.66667 4.5 9 5
#c 5.00000 7.5 6 9
#d 6.00000 0.5 6 3
#e 3.00000 8.0 7 89
Calculate row means based on (partial) matching column names
I'm sure it can be done more elegantly, but this is one possibility that seems to work.
# declare the column names
colnames = c("V1", "V2", "V3", "V4")
# calculate the means
means = lapply(colnames, function(name) { apply(DF[,grep(name, names(DF))], 1, mean) })
# build the result
result = do.call(cbind, means)
result = as.data.frame(t(result))
rownames(result) = DF$Date
I should also describe, what I did.
First, I declared the column names to be partially matched.
Then, using the grep
command to partially select the columns in your data frame (that matched the particular substring). The apply
command calculates the means and lapply
does it for all columns partially matched by the substring.
Using do.call
and cbind
(as suggested by DWin), we concatenate individual columns.
Finally, we set the column names from the Date
column of the original data frame.
The problem can be solved more elgantly and efficiently, see solutions by DWin and Maiasaura.
Calculate column means based on (partial) matching row names
Here's an alternative approach using tidyverse
X<-c(1,2,3,4,5,6,7,8,9)
Y<-c(2,4,6,8,10,12,14,16,18)
Z<-c(1,3,5,7,10,13,16,19,22)
Name<-c("AA.11", "AA.12", "AA.13", "AB.11", "AB.12", "AB.13", "AC.11", "AC.12", "AC.13")
DF<-data.frame(Name,X,Y,Z)
library(tidyverse)
df_res = tibble(rownames = c("AA", "AB", "AC", ".11", ".12", ".13")) %>% # save rownames of interest as a tibble
mutate(d = map(rownames, ~DF[grep(.x, DF$Name),]), # get the corresponding subsets
c_means = map(d, ~colMeans(.x[-1]))) %>% # get the corresponding column means
print() # print result
# # A tibble: 6 x 3
# rownames d c_means
# <chr> <list> <list>
# 1 AA <data.frame [3 x 4]> <dbl [3]>
# 2 AB <data.frame [3 x 4]> <dbl [3]>
# 3 AC <data.frame [3 x 4]> <dbl [3]>
# 4 .11 <data.frame [3 x 4]> <dbl [3]>
# 5 .12 <data.frame [3 x 4]> <dbl [3]>
# 6 .13 <data.frame [3 x 4]> <dbl [3]>
So far you have an output that has stored the subsets (d
) the process used to get the column means and the column means themselves (c_means
), for each rowname
you provided.
You can access those stored variables like this:
# see the subset used for AC
df_res$d[df_res$rownames == "AC"]
# [[1]]
# Name X Y Z
# 7 AC.11 7 14 16
# 8 AC.12 8 16 19
# 9 AC.13 9 18 22
# see the column means for AC
df_res$c_means[df_res$rownames == "AC"]
# [[1]]
# X Y Z
# 8 16 19
# see all column means
df_res$c_means
# [[1]]
# X Y Z
# 2 4 3
#
# [[2]]
# X Y Z
# 5 10 10
#
# [[3]]
# X Y Z
# 8 16 19
#
# [[4]]
# X Y Z
# 4 8 8
#
# [[5]]
# X Y Z
# 5.00000 10.00000 10.66667
#
# [[6]]
# X Y Z
# 6.00000 12.00000 13.33333
How to grep columns matching a pattern and calculate the row means of those columns and add the mean values as a new column to the data frame in r?
An option is to remove the digits at the end (\\d+$
) with sub
, use that to split
the dataset into a list
of data.frame
s, get the rowMeans
and assign it to new columns in the dataset
nm1 <- sub("\\d+$", "", names(df))
df[paste0(unique(nm1), "_mean")] <- sapply(split.default(df, nm1), rowMeans)
Match column names by sequential numeric pattern, and calculate rowMeans for all matches
a data.table
approach using .SDcols
with a regex-patterns for column selection
library(data.table)
setDT(DF)
DF[, Redbax := rowMeans(.SD, na.rm = TRUE), .SDcols = patterns("^Red.*bax$")]
DF[, Redstr := rowMeans(.SD, na.rm = TRUE), .SDcols = patterns("^Red.*str$")]
DF[, N34 := rowMeans(.SD, na.rm = TRUE), .SDcols = patterns("^N34.*str$")]
# A2 B3 st ba N34_303str Red1str Red2str Red7bax Red2bax Red9str Red1bax N34_22str Redbax Redstr N34
# 1: 1 3 1 3 2 3 2 3 2 3 3 3 2.666667 2.666667 2.5
# 2: 2 4 2 4 4 4 3 4 3 4 4 4 3.666667 3.666667 4.0
# 3: 2 3 2 3 2 5 4 5 4 5 5 2 4.666667 4.666667 2.0
# 4: 3 2 3 2 6 6 5 6 NA 6 6 6 6.000000 5.666667 6.0
R: Sorting columns based on partial match of column names with row names
We can loop through the rownames and grep
to find the index of the column names that match, unlist
and use that to arrange the columns
df1[unlist(lapply(gsub("\\d+", "", row.names(df1)), function(x) grep(x, names(df1))))]
#W3_FA_22 RR_BI_12 YU_BI_21 T2_KL_21 PL_EW_12 A1_LC_11 RT_LC_22
#FA 3 4 7 1 5 2 6
#BI 3 4 7 1 5 2 6
#KL 3 4 7 1 5 2 6
#EW 3 4 7 1 5 2 6
#LC 3 4 7 1 5 2 6
Subset data based on partial match of column names
You mentioned you may be looking for symbols, so for this particular example we can use [[:punct:]]
as our regular expression. This will find all the strings with punctuation symbols in the column names.
d <- data.frame(1:3, 3:1, 11:13, 13:11, rep(1, 3))
names(d) <- c("FullColName1", "FullColName2", "FullColName3",
"PartString1()","PartString2()")
d[grepl("[[:punct:]]", names(d))]
# PartString1() PartString2()
# 1 13 1
# 2 12 1
# 3 11 1
This last part just illustrates another way to do this with other string processing functions from stringr
library(stringr)
d[str_detect(names(d), "[[:punct:]]")]
# PartString1() PartString2()
# 1 13 1
# 2 12 1
# 3 11 1
ADD per OPs comment
d[grepl("ring[12()]", names(d))]
to get either of the substrings ring1()
or ring2()
from the names vector
How do you identify column numbers based on partial string match in a way that can be used with lapply?
I'm not sure what your expected output is, but it seems that I can't reproduce the problem described above. Your approach seems to work here, even with non-syntactical names:
cols <- c(grep("support", colnames(df)))
df[cols] <- lapply(df[cols], factor,
levels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"),
labels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"))
str(df)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 25 obs. of 7 variables:
#> $ mission support : Factor w/ 5 levels "Strongly Oppose",..: 5 5 4 4 5 4 4 5 4 5 ...
#> $ mission opposition : chr NA NA NA NA ...
#> $ ed support : Factor w/ 5 levels "Strongly Oppose",..: 5 5 4 4 5 4 4 5 4 4 ...
#> $ ed mission opposition: chr NA NA NA NA ...
#> $ non-agency engagement: chr "Yes" "No" "No" "No" ...
#> $ agency knowledge : chr "Yes" "Yes" "Yes" "Yes" ...
#> $ agency engagement : chr "Yes" "Yes" "Yes" "Yes" ...
Created on 2021-12-28 by the reprex package (v0.3.0)
Another option is to use dplyr::across
instead of lapply
:
library(dplyr)
df %>%
mutate(across(contains("support"),
factor,
levels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"),
labels = c("Strongly Oppose", "Oppose", "Neither oppose nor support", "Support", "Strongly Support"))
) %>%
glimpse
#> Rows: 25
#> Columns: 7
#> $ `mission support` <fct> Strongly Support, Strongly Support, Support, S…
#> $ `mission opposition` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ `ed support` <fct> Strongly Support, Strongly Support, Support, S…
#> $ `ed mission opposition` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ `non-agency engagement` <chr> "Yes", "No", "No", "No", "Maybe", "No", "No", …
#> $ `agency knowledge` <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No",…
#> $ `agency engagement` <chr> "Yes", "Yes", "Yes", "Yes", "Yes", NA, NA, "Ye…
Created on 2021-12-28 by the reprex package (v0.3.0)
Related Topics
Extract English Words from a Text in R
Solve Homogenous System Ax = 0 for Any M * N Matrix a in R (Find Null Space Basis for A)
Error When Mapping in Ggmap with API Key (403 Forbidden)
Interleave Columns of Two Data Frames
How to Edit Column Names in Datatable Function When Running R Shiny App
Visual Bug When Changing Robinson Projection's Central Meridian with Ggplot2
Code Folding for Individual Chunks in R Markdown
Usage of Dot/Period in R Functions
R: "Make" Not Found When Installing a R-Package from Local Tar.Gz
R Table Function - How to Remove 0 Counts
How to Annotate Ggplot2 Qplot Outside of Legend and Plotarea? (Similar to Mtext())
How to Display Line Numbers for Code Chunks in Rmarkdown HTML and PDF
How to Pass R Variable into SQLdf
Cumulative Number of Unique Values in a Column Up to Current Row
Replacing for Loop with Foreach Loop