How to divide each row of a dataframe by rows of corresponding columns in a dataframe while recycling values
Using tidyverse
:
library(tidyverse)
res <- cbind(df[1],
`/`(df %>% select(ends_with('x')),
df %>% select(ends_with('y')) %>%
fill(everything())))
# Gene.names mean_0.x mean_1.x mean_2.x
# 1 ESR 3.0000000 6.0000000 0.5000000
# 2 ESR.1 2.0000000 2.0000000 0.3333333
# 3 ESR.2 5.0000000 5.0000000 1.5000000
# 4 ESR.3 9.0000000 1.0000000 1.5000000
# 5 PKB 0.3333333 3.0000000 1.5000000
# 6 PKB.1 0.6666667 0.3333333 1.7500000
# 7 PKB.2 1.0000000 0.3333333 0.7500000
# 8 PKB.3 1.1666667 3.0000000 0.7500000
And this would be the idiomatic way:
df %>%
fill(ends_with('y')) %>%
gather(,,-1) %>%
separate(key,c("key","xy"),sep="\\.") %>%
spread(xy,value) %>%
transmute(Gene.names,key, value=x /y) %>%
spread(key,value)
# Gene.names mean_0 mean_1 mean_2
# 1 ESR 3.0000000 6.0000000 0.5000000
# 2 ESR.1 2.0000000 2.0000000 0.3333333
# 3 ESR.2 5.0000000 5.0000000 1.5000000
# 4 ESR.3 9.0000000 1.0000000 1.5000000
# 5 PKB 0.3333333 3.0000000 1.5000000
# 6 PKB.1 0.6666667 0.3333333 1.7500000
# 7 PKB.2 1.0000000 0.3333333 0.7500000
# 8 PKB.3 1.1666667 3.0000000 0.7500000
R: Divide all values in columns starting with stage by values in a column called time: rowwise
1) across Use across:
library(dplyr)
df %>% mutate(across(starts_with("stage"), ~ . / time))
It could alternately be written like this:
df %>% mutate(across(starts_with("stage"), `/`, time))
2) pivot Another way to do this is to reshape into long form, perform the division and then reshape back.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(starts_with("stage")) %>%
mutate(value = value / time) %>%
pivot_wider
3) base R It can also be done readily in base R:
ok <- startsWith(names(df), "stage")
replace(df, ok, df[ok] / df$time)
4) ftransformv The collapse package has ftransformv to apply the indicated function to the selected columns. It is written in C/C++ and runs 13x faster than the base solution, 112x faster than the dplyr solution and 363x faster than the tidyr solution when I benchmarked it.
library(collapse)
ftransformv(df, startsWith(names(df), "stage"), `/`, time)
Transform multiple rows for each variable observation into single columns in R
You could use a data.table
dcast
:
library(data.table)
rm(list = ls())
dt <- data.table(
VAR = c('VAR 1', 'VAR 1', 'VAR 1','VAR 1', 'VAR 2', 'VAR 2', 'VAR 2', 'VAR 2'),
Country = c('France', 'France', 'UK', 'UK', 'France', 'France', 'UK', 'UK'),
Year = c(2014, 2018, 2014, 2018, 2014, 2018, 2014, 2018),
Value = c(1, 2, 5, 6, 2, 3, 7, 8)
)
dt.wide <- dcast(
dt,
Country + Year ~ VAR, value.var = 'Value'
)
dt.wide
> dt.wide
Country Year VAR 1 VAR 2
1: France 2014 1 2
2: France 2018 2 3
3: UK 2014 5 7
4: UK 2018 6 8
How to add a new column in dataframe based on a single character in dataframe names?
Considering name restriction in R, I remove brackets from dataframe name, so I think the reproducible example should be something like this :
df_list <- list()
Height <- c(175,180,179)
Weight <- c(75,85,79)
Person <- c('Alex','Gerard','Clyde')
df_list$'1.3.A' <- data.frame(Person,Height,Weight)
Person <- c('Missy','Britany','Sussie')
df_list$'2.2.A' <- data.frame(Person,Height,Weight)
Person <- c('Luke','Alex','Haley')
df_list$'1.1.B' <- data.frame(Person,Height,Weight)
$`1.3.A`
Person Height Weight
1 Alex 175 75
2 Gerard 180 85
3 Clyde 179 79
$`2.2.A`
Person Height Weight
1 Missy 175 75
2 Britany 180 85
3 Sussie 179 79
$`1.1.B`
Person Height Weight
1 Luke 175 75
2 Alex 180 85
3 Haley 179 79
Without using any libraries, similar to Anoushiravan R, I use strsplit and Map to extract Trial, Day and Group value from dataframe name :
ColAdd <- \(DF,Names){
DF[,c('Trial','Day','Group')] <-
(Names |> strsplit(split = '.', fixed = T))[[1]] |>
rep(nrow(DF)) |>
matrix(ncol=3,byrow = T)
return(DF)
}
df_list <- Map(ColAdd,df_list,names(df_list))
$`1.3.A`
Person Height Weight Trial Day Group
1 Alex 175 75 1 3 A
2 Gerard 180 85 1 3 A
3 Clyde 179 79 1 3 A
$`2.2.A`
Person Height Weight Trial Day Group
1 Missy 175 75 2 2 A
2 Britany 180 85 2 2 A
3 Sussie 179 79 2 2 A
$`1.1.B`
Person Height Weight Trial Day Group
1 Luke 175 75 1 1 B
2 Alex 180 85 1 1 B
3 Haley 179 79 1 1 B
Delete rows in a data frame that have a specific value in a corresponding row of another data frame
You could try to generate a TRUE/FALSE vetor instead of your indic vector, which contains 0/1. This makes the final filtering more obvious.
> data
col1 col2 col3 col4
1 1 3 1 2
2 3 4 3
3 1 2
4 3 3
5 2
The use of any
will give you easy access to the row's content of 1
or 2
. A second any
will tell you, if one of the two conditions was met. The apply()
the runs through all the rows, if the second parameter is set to 1.
indic <- apply(data, 1, function(row) {
any(c(any(row == 1), any(row == 2)))
})
> indic
[1] TRUE FALSE TRUE FALSE TRUE
> data[indic,]
col1 col2 col3 col4
1 1 3 1 2
3 1 2
5 2
As the title of your question suggests the indic-vector could be also applied to another data frame, but here it's important to watch out for the data frame and indic-vector having the same dimensions or aiming for vector recycling.
Picking up @nicola's suggestion to use vectorization.
data[rowSums(data=="1" | data=="2")>0,]
This would do the job most efficiently sparing loops and creating indic
. Although the TRUE/FALSE vector emitted from rowSums(data=="1" | data=="2")>0
could still be saved in a variable.
extracting variables in R using frequencies
Here's one way, using subset
to omit any x
that occur less than 3 times, and then aggregate
to find the maximum value by group:
d <- read.table(text='x y
a 1
b 2
a 3
a 4
b 5
c 6
a 7
d 8
a 9
b 10
e 12
b 13
c 15', header=TRUE)
with(subset(d, x %in% names(which(table(d$x) >= 3))),
aggregate(list(y=y), list(x=x), max))
# x y
# 1 a 9
# 2 b 13
And for good measure, a dplyr
approach:
library(dplyr)
d %>%
group_by(x) %>%
filter(n() >= 3) %>%
summarise(max(y))
# Source: local data frame [2 x 2]
#
# x max(y)
# 1 a 9
# 2 b 13
Extend last observed values using na.locf for specific country/variable pairs
One option is a join with data.table
on
the 'Country' column, then use Map
to apply the na.locf
on the second dataset columns ('nm1') based on the value of the corresponding columns of first dataset and assign (:=
) the output back to the columns
library(zoo)
library(data.table)
nm1 <- c('acctm', 'acctf')
nm2 <- paste0("i.", nm1)
setDT(df2)[df1, (nm1) := Map(function(x, y) if(y == 1) na.locf0(x)
else x, mget(nm1), mget(nm2)), on = .(Country), by = .EACHI]
df2
# Country Year acctm acctf
#1: FRA 2010 20 20
#2: FRA 2020 30 20
#3: DEU 2010 10 15
#4: DEU 2020 NA 15
#5: CHE 2010 20 40
#6: CHE 2020 20 NA
subtract a constant vector from each row in a matrix in r
This is exactly what sweep
was made for:
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- sweep(b[,3:5], 2, x)
b
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 4 7 10
#[2,] 2 6 5 8 11
#[3,] 3 7 6 9 12
#[4,] 4 8 7 10 13
..or even without subsetting or reassignment:
sweep(b, 2, c(0,0,x))
Related Topics
How to Delete Rows Where All the Columns Are Zero
Convert Dataframe Column to 1 or 0 for "True"/"False" Values and Assign to Dataframe
Choose the Top Five Values from Each Group in R
Replacing Nas With Latest Non-Na Value
Evaluate Expression Given as a String
How to Specifically Order Ggplot2 X Axis Instead of Alphabetical Order
Dcast Warning: 'Aggregation Function Missing: Defaulting to Length'
Force the Origin to Start At 0
Storing Ggplot Objects in a List from Within Loop in R
Conditionally Replace Values of Subset of Rows With Column Name in R Using Only Tidy
Duplicating Rows in R Merge Function
For Each Row Return the Column Name of the Largest Value
Split Column At Delimiter in Data Frame
Ggplot2 - Annotate Outside of Plot