How to Divide Each Row of a Dataframe by Rows of Corresponding Columns in a Dataframe While Recycling Values

How to divide each row of a dataframe by rows of corresponding columns in a dataframe while recycling values

Using tidyverse:

library(tidyverse)
res <- cbind(df[1],
`/`(df %>% select(ends_with('x')),
df %>% select(ends_with('y')) %>%
fill(everything())))

# Gene.names mean_0.x mean_1.x mean_2.x
# 1 ESR 3.0000000 6.0000000 0.5000000
# 2 ESR.1 2.0000000 2.0000000 0.3333333
# 3 ESR.2 5.0000000 5.0000000 1.5000000
# 4 ESR.3 9.0000000 1.0000000 1.5000000
# 5 PKB 0.3333333 3.0000000 1.5000000
# 6 PKB.1 0.6666667 0.3333333 1.7500000
# 7 PKB.2 1.0000000 0.3333333 0.7500000
# 8 PKB.3 1.1666667 3.0000000 0.7500000

And this would be the idiomatic way:

df %>%
fill(ends_with('y')) %>%
gather(,,-1) %>%
separate(key,c("key","xy"),sep="\\.") %>%
spread(xy,value) %>%
transmute(Gene.names,key, value=x /y) %>%
spread(key,value)

# Gene.names mean_0 mean_1 mean_2
# 1 ESR 3.0000000 6.0000000 0.5000000
# 2 ESR.1 2.0000000 2.0000000 0.3333333
# 3 ESR.2 5.0000000 5.0000000 1.5000000
# 4 ESR.3 9.0000000 1.0000000 1.5000000
# 5 PKB 0.3333333 3.0000000 1.5000000
# 6 PKB.1 0.6666667 0.3333333 1.7500000
# 7 PKB.2 1.0000000 0.3333333 0.7500000
# 8 PKB.3 1.1666667 3.0000000 0.7500000

R: Divide all values in columns starting with stage by values in a column called time: rowwise

1) across Use across:

library(dplyr)

df %>% mutate(across(starts_with("stage"), ~ . / time))

It could alternately be written like this:

df %>% mutate(across(starts_with("stage"), `/`, time))

2) pivot Another way to do this is to reshape into long form, perform the division and then reshape back.

library(dplyr)
library(tidyr)

df %>%
pivot_longer(starts_with("stage")) %>%
mutate(value = value / time) %>%
pivot_wider

3) base R It can also be done readily in base R:

ok <- startsWith(names(df), "stage")
replace(df, ok, df[ok] / df$time)

4) ftransformv The collapse package has ftransformv to apply the indicated function to the selected columns. It is written in C/C++ and runs 13x faster than the base solution, 112x faster than the dplyr solution and 363x faster than the tidyr solution when I benchmarked it.

library(collapse)
ftransformv(df, startsWith(names(df), "stage"), `/`, time)

Transform multiple rows for each variable observation into single columns in R

You could use a data.table dcast:

library(data.table)
rm(list = ls())

dt <- data.table(
VAR = c('VAR 1', 'VAR 1', 'VAR 1','VAR 1', 'VAR 2', 'VAR 2', 'VAR 2', 'VAR 2'),
Country = c('France', 'France', 'UK', 'UK', 'France', 'France', 'UK', 'UK'),
Year = c(2014, 2018, 2014, 2018, 2014, 2018, 2014, 2018),
Value = c(1, 2, 5, 6, 2, 3, 7, 8)
)

dt.wide <- dcast(
dt,
Country + Year ~ VAR, value.var = 'Value'
)


dt.wide

> dt.wide
Country Year VAR 1 VAR 2
1: France 2014 1 2
2: France 2018 2 3
3: UK 2014 5 7
4: UK 2018 6 8

How to add a new column in dataframe based on a single character in dataframe names?

Considering name restriction in R, I remove brackets from dataframe name, so I think the reproducible example should be something like this :

df_list <- list()
Height <- c(175,180,179)
Weight <- c(75,85,79)
Person <- c('Alex','Gerard','Clyde')
df_list$'1.3.A' <- data.frame(Person,Height,Weight)
Person <- c('Missy','Britany','Sussie')
df_list$'2.2.A' <- data.frame(Person,Height,Weight)
Person <- c('Luke','Alex','Haley')
df_list$'1.1.B' <- data.frame(Person,Height,Weight)
$`1.3.A`
Person Height Weight
1 Alex 175 75
2 Gerard 180 85
3 Clyde 179 79

$`2.2.A`
Person Height Weight
1 Missy 175 75
2 Britany 180 85
3 Sussie 179 79

$`1.1.B`
Person Height Weight
1 Luke 175 75
2 Alex 180 85
3 Haley 179 79

Without using any libraries, similar to Anoushiravan R, I use strsplit and Map to extract Trial, Day and Group value from dataframe name :

ColAdd <- \(DF,Names){
DF[,c('Trial','Day','Group')] <-
(Names |> strsplit(split = '.', fixed = T))[[1]] |>
rep(nrow(DF)) |>
matrix(ncol=3,byrow = T)
return(DF)
}

df_list <- Map(ColAdd,df_list,names(df_list))
$`1.3.A`
Person Height Weight Trial Day Group
1 Alex 175 75 1 3 A
2 Gerard 180 85 1 3 A
3 Clyde 179 79 1 3 A

$`2.2.A`
Person Height Weight Trial Day Group
1 Missy 175 75 2 2 A
2 Britany 180 85 2 2 A
3 Sussie 179 79 2 2 A

$`1.1.B`
Person Height Weight Trial Day Group
1 Luke 175 75 1 1 B
2 Alex 180 85 1 1 B
3 Haley 179 79 1 1 B

Delete rows in a data frame that have a specific value in a corresponding row of another data frame

You could try to generate a TRUE/FALSE vetor instead of your indic vector, which contains 0/1. This makes the final filtering more obvious.

> data
col1 col2 col3 col4
1 1 3 1 2
2 3 4 3
3 1 2
4 3 3
5 2

The use of any will give you easy access to the row's content of 1 or 2. A second any will tell you, if one of the two conditions was met. The apply() the runs through all the rows, if the second parameter is set to 1.

indic <- apply(data, 1, function(row) {
any(c(any(row == 1), any(row == 2)))
})


> indic
[1] TRUE FALSE TRUE FALSE TRUE

> data[indic,]
col1 col2 col3 col4
1 1 3 1 2
3 1 2
5 2

As the title of your question suggests the indic-vector could be also applied to another data frame, but here it's important to watch out for the data frame and indic-vector having the same dimensions or aiming for vector recycling.

Picking up @nicola's suggestion to use vectorization.

data[rowSums(data=="1" | data=="2")>0,]

This would do the job most efficiently sparing loops and creating indic. Although the TRUE/FALSE vector emitted from rowSums(data=="1" | data=="2")>0 could still be saved in a variable.

extracting variables in R using frequencies

Here's one way, using subset to omit any x that occur less than 3 times, and then aggregate to find the maximum value by group:

d <- read.table(text='x y
a 1
b 2
a 3
a 4
b 5
c 6
a 7
d 8
a 9
b 10
e 12
b 13
c 15', header=TRUE)


with(subset(d, x %in% names(which(table(d$x) >= 3))),
aggregate(list(y=y), list(x=x), max))

# x y
# 1 a 9
# 2 b 13

And for good measure, a dplyr approach:

library(dplyr)
d %>%
group_by(x) %>%
filter(n() >= 3) %>%
summarise(max(y))


# Source: local data frame [2 x 2]
#
# x max(y)
# 1 a 9
# 2 b 13

Extend last observed values using na.locf for specific country/variable pairs

One option is a join with data.table on the 'Country' column, then use Map to apply the na.locf on the second dataset columns ('nm1') based on the value of the corresponding columns of first dataset and assign (:=) the output back to the columns

library(zoo)
library(data.table)
nm1 <- c('acctm', 'acctf')
nm2 <- paste0("i.", nm1)
setDT(df2)[df1, (nm1) := Map(function(x, y) if(y == 1) na.locf0(x)
else x, mget(nm1), mget(nm2)), on = .(Country), by = .EACHI]
df2
# Country Year acctm acctf
#1: FRA 2010 20 20
#2: FRA 2020 30 20
#3: DEU 2010 10 15
#4: DEU 2020 NA 15
#5: CHE 2010 20 40
#6: CHE 2020 20 NA

subtract a constant vector from each row in a matrix in r

This is exactly what sweep was made for:

b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)

b[,3:5] <- sweep(b[,3:5], 2, x)
b

# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 4 7 10
#[2,] 2 6 5 8 11
#[3,] 3 7 6 9 12
#[4,] 4 8 7 10 13

..or even without subsetting or reassignment:

sweep(b, 2, c(0,0,x))


Related Topics



Leave a reply



Submit