How to Divide Each Row of a Dataframe by Rows of Corresponding Columns in a Dataframe While Recycling Values

How to divide each row of a dataframe by rows of corresponding columns in a dataframe while recycling values

Using tidyverse:

library(tidyverse)
res <- cbind(df[1],
             `/`(df %>% select(ends_with('x')),
                 df %>% select(ends_with('y')) %>% 
                   fill(everything())))

#   Gene.names  mean_0.x  mean_1.x  mean_2.x
# 1        ESR 3.0000000 6.0000000 0.5000000
# 2      ESR.1 2.0000000 2.0000000 0.3333333
# 3      ESR.2 5.0000000 5.0000000 1.5000000
# 4      ESR.3 9.0000000 1.0000000 1.5000000
# 5        PKB 0.3333333 3.0000000 1.5000000
# 6      PKB.1 0.6666667 0.3333333 1.7500000
# 7      PKB.2 1.0000000 0.3333333 0.7500000
# 8      PKB.3 1.1666667 3.0000000 0.7500000

And this would be the idiomatic way:

df %>%
  fill(ends_with('y')) %>%
  gather(,,-1) %>%
  separate(key,c("key","xy"),sep="\\.") %>%
  spread(xy,value) %>%
  transmute(Gene.names,key, value=x /y) %>%
  spread(key,value) 

#   Gene.names    mean_0    mean_1    mean_2
# 1        ESR 3.0000000 6.0000000 0.5000000
# 2      ESR.1 2.0000000 2.0000000 0.3333333
# 3      ESR.2 5.0000000 5.0000000 1.5000000
# 4      ESR.3 9.0000000 1.0000000 1.5000000
# 5        PKB 0.3333333 3.0000000 1.5000000
# 6      PKB.1 0.6666667 0.3333333 1.7500000
# 7      PKB.2 1.0000000 0.3333333 0.7500000
# 8      PKB.3 1.1666667 3.0000000 0.7500000

R: Divide all values in columns starting with stage by values in a column called time: rowwise

1) across Use across:

library(dplyr)

df %>% mutate(across(starts_with("stage"), ~ . / time))

It could alternately be written like this:

df %>% mutate(across(starts_with("stage"), `/`, time))

2) pivot Another way to do this is to reshape into long form, perform the division and then reshape back.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(starts_with("stage")) %>%
  mutate(value = value / time) %>%
  pivot_wider

3) base R It can also be done readily in base R:

ok <- startsWith(names(df), "stage")
replace(df, ok, df[ok] / df$time)

4) ftransformv The collapse package has ftransformv to apply the indicated function to the selected columns. It is written in C/C++ and runs 13x faster than the base solution, 112x faster than the dplyr solution and 363x faster than the tidyr solution when I benchmarked it.

library(collapse)
ftransformv(df, startsWith(names(df), "stage"), `/`, time)

Transform multiple rows for each variable observation into single columns in R

You could use a data.table dcast:

library(data.table)
rm(list = ls())

dt <- data.table(
  VAR = c('VAR 1', 'VAR 1', 'VAR 1','VAR 1', 'VAR 2', 'VAR 2', 'VAR 2', 'VAR 2'),
  Country = c('France', 'France', 'UK', 'UK', 'France', 'France', 'UK', 'UK'),
  Year = c(2014, 2018, 2014, 2018, 2014, 2018, 2014, 2018),
  Value = c(1, 2, 5, 6, 2, 3, 7, 8)
)

dt.wide <- dcast(
  dt,
  Country + Year ~ VAR, value.var = 'Value'
)


dt.wide

> dt.wide
   Country Year VAR 1 VAR 2
1:  France 2014     1     2
2:  France 2018     2     3
3:      UK 2014     5     7
4:      UK 2018     6     8

How to add a new column in dataframe based on a single character in dataframe names?

Considering name restriction in R, I remove brackets from dataframe name, so I think the reproducible example should be something like this :

df_list <- list()
Height <- c(175,180,179)
Weight <- c(75,85,79)
Person <- c('Alex','Gerard','Clyde')
df_list$'1.3.A' <- data.frame(Person,Height,Weight)
Person <- c('Missy','Britany','Sussie')
df_list$'2.2.A' <- data.frame(Person,Height,Weight)
Person <- c('Luke','Alex','Haley')
df_list$'1.1.B' <- data.frame(Person,Height,Weight)

$`1.3.A`
  Person Height Weight
1   Alex    175     75
2 Gerard    180     85
3  Clyde    179     79

$`2.2.A`
   Person Height Weight
1   Missy    175     75
2 Britany    180     85
3  Sussie    179     79

$`1.1.B`
  Person Height Weight
1   Luke    175     75
2   Alex    180     85
3  Haley    179     79

Without using any libraries, similar to Anoushiravan R, I use strsplit and Map to extract Trial, Day and Group value from dataframe name :

ColAdd <- \(DF,Names){
  DF[,c('Trial','Day','Group')] <- 
    (Names |> strsplit(split = '.', fixed = T))[[1]] |> 
    rep(nrow(DF)) |> 
    matrix(ncol=3,byrow = T)
  return(DF)
}

df_list <- Map(ColAdd,df_list,names(df_list))

$`1.3.A`
  Person Height Weight Trial Day Group
1   Alex    175     75     1   3     A
2 Gerard    180     85     1   3     A
3  Clyde    179     79     1   3     A

$`2.2.A`
   Person Height Weight Trial Day Group
1   Missy    175     75     2   2     A
2 Britany    180     85     2   2     A
3  Sussie    179     79     2   2     A

$`1.1.B`
  Person Height Weight Trial Day Group
1   Luke    175     75     1   1     B
2   Alex    180     85     1   1     B
3  Haley    179     79     1   1     B

Delete rows in a data frame that have a specific value in a corresponding row of another data frame

You could try to generate a TRUE/FALSE vetor instead of your indic vector, which contains 0/1. This makes the final filtering more obvious.

> data
  col1 col2 col3 col4
1    1    3    1    2
2    3    4    3     
3    1    2          
4    3    3          
5    2

The use of any will give you easy access to the row's content of 1 or 2. A second any will tell you, if one of the two conditions was met. The apply() the runs through all the rows, if the second parameter is set to 1.

indic <- apply(data, 1, function(row) {
    any(c(any(row == 1), any(row == 2)))
})


> indic
[1]  TRUE FALSE  TRUE FALSE  TRUE

> data[indic,]
  col1 col2 col3 col4
1    1    3    1    2
3    1    2          
5    2

As the title of your question suggests the indic-vector could be also applied to another data frame, but here it's important to watch out for the data frame and indic-vector having the same dimensions or aiming for vector recycling.

Picking up @nicola's suggestion to use vectorization.

data[rowSums(data=="1" | data=="2")>0,]

This would do the job most efficiently sparing loops and creating indic. Although the TRUE/FALSE vector emitted from rowSums(data=="1" | data=="2")>0 could still be saved in a variable.

extracting variables in R using frequencies

Here's one way, using subset to omit any x that occur less than 3 times, and then aggregate to find the maximum value by group:

d <- read.table(text='x y
a 1
b 2
a 3
a 4
b 5
c 6
a 7
d 8
a 9
b 10
e 12
b 13
c 15', header=TRUE)


with(subset(d, x %in% names(which(table(d$x) >= 3))),
     aggregate(list(y=y), list(x=x), max))

#   x  y
# 1 a  9
# 2 b 13

And for good measure, a dplyr approach:

library(dplyr)
d %>% 
  group_by(x) %>% 
  filter(n() >= 3) %>% 
  summarise(max(y))


# Source: local data frame [2 x 2]
# 
#    x max(y)
# 1 a      9
# 2 b     13

Extend last observed values using na.locf for specific country/variable pairs

One option is a join with data.table on the 'Country' column, then use Map to apply the na.locf on the second dataset columns ('nm1') based on the value of the corresponding columns of first dataset and assign (:=) the output back to the columns

library(zoo)
library(data.table)
nm1 <- c('acctm', 'acctf')
nm2 <- paste0("i.", nm1)
setDT(df2)[df1, (nm1)  := Map(function(x, y) if(y == 1) na.locf0(x)
      else x, mget(nm1), mget(nm2)), on = .(Country), by = .EACHI]
df2
#   Country Year acctm acctf
#1:     FRA 2010    20    20
#2:     FRA 2020    30    20
#3:     DEU 2010    10    15
#4:     DEU 2020    NA    15
#5:     CHE 2010    20    40
#6:     CHE 2020    20    NA

subtract a constant vector from each row in a matrix in r

This is exactly what sweep was made for:

b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)

b[,3:5] <- sweep(b[,3:5], 2, x)
b

#     [,1] [,2] [,3] [,4] [,5]
#[1,]    1    5    4    7   10
#[2,]    2    6    5    8   11
#[3,]    3    7    6    9   12
#[4,]    4    8    7   10   13

..or even without subsetting or reassignment:

sweep(b, 2, c(0,0,x))

How to Divide Each Row of a Dataframe by Rows of Corresponding Columns in a Dataframe While Recycling Values