Separate columns with constant numbers and condense them to one row in R data.frame
May be we need
library(dplyr)
d %>%
group_by(study.name) %>%
slice(1)
Or in base R
after grouping by 'study.name', get the first row while specifying the na.action = NULL
as the default option is na.omit
which can omit any row having NA
in any of the columns
aggregate(.~ study.name, d, head, 1, na.action = NULL)
If we want to subset the columns
nm1 <- names(which(!colSums(!do.call(rbind, by(d[-1], d$study.name,
FUN = function(x) lengths(sapply(x, unique)) == 1)))))
unique(d[c("study.name", nm1)])
Follow-up: Separate columns with constant numbers and condense them to one row in R data.frame
Assuming that the NA
rows should be preserved, apply duplicated
by looping over the list
as well as if all the elements of a particular are NA
, then keep that row
lapply(L, function(x) x[(rowSums(is.na(x)) == ncol(x))|!duplicated(x),])
#$A
# A A.1
#1 1 3
#$B
# B B.1
#1 4 7
#$X
# X X.1
#1 2 1
#2 2 NA
#3 1 3
#4 1 1
#5 NA NA
#6 NA NA
If we also need a check for constant value
is_constant <- function(x) length(unique(x)) == 1L
lapply(L, function(x) if(all(sapply(x, is_constant))) x[1,, drop = FALSE] else x)
#$A
# A A.1
#1 1 3
#$B
# B B.1
#1 4 7
#$X
# X X.1
#1 2 1
#2 2 NA
#3 1 3
#4 1 1
#5 NA NA
#6 NA NA
Extracting columns with constant numbers in R data.frames
We can first find constant columns and then use lapply
to loop over them and select only their first row in each study.name
.
is_constant <- function(x) length(unique(x)) == 1L
cols <- names(Filter(all, aggregate(.~study.name, DATA, is_constant)[-1]))
L[cols] <- lapply(L[cols], function(x)
x[ave(x[[1]], DATA$study.name, FUN = seq_along) == 1, ])
L
#$ESL
# ESL ESL.1
#1 1 1
#7 2 2
#9 1 1
#17 1 1
#23 1 1
#35 1 1
#37 2 2
#49 2 2
#$prof
# prof prof.1
#1 2 2
#7 2 2
#9 3 3
#17 2 2
#23 2 2
#35 2 2
#37 NA NA
#49 2 2
#.....
Collapsing rows with consecutive ranges in two separate columns
There are several ways to achieve this, here is one:
library(tidyverse)
genomic_ranges %>%
group_by(sample_ID) %>%
summarize(start = min(start),
end = max(end),
feature = feature[1])
which gives:
# A tibble: 3 x 4
sample_ID start end feature
<chr> <dbl> <dbl> <chr>
1 A 1 5 normal
2 B 20 70 DUP
3 C 250 400 DUP
Select set of columns so that each row has at least one non-NA entry
Using a while
loop, this should work to get the minimum set of variables with at least one non-NA per row.
best <- function(df){
best <- which.max(colSums(sapply(df, complete.cases)))
while(any(rowSums(sapply(df[best], complete.cases)) == 0)){
best <- c(best, which.max(sapply(df[is.na(df[best]), ], \(x) sum(complete.cases(x)))))
}
best
}
testing
best(df)
#d c
#4 3
df[best(df)]
# d c
#1 1 1
#2 1 NA
#3 1 NA
#4 1 NA
#5 NA 1
First, select the column with the least NAs (stored in best
). Then, update the vector with the column that has the highest number of non-NA rows on the remaining rows (where best has still NAs), until you get every rows with a complete case.
Condensing/combining multiple columns with same name and logical values
ding ding ding!
l <- sapply(df, is.logical)
cbind(df[!l], lapply(split(as.list(df[l]), names(df)[l]), Reduce, f = `|`))
Merging r dataset columns by copying rows
You could pivot your data, so that the column names (AH, SS, QS) appear in one column and the logical values in another column, and then you filter this dataset for rows that have the value TRUE in the new logical column. This can be done by using pivot_longer
from the tidyr
package:
library(tidyr)
library(dplyr)
data %>%
pivot_longer(cols = AH:QS, # columns that will be pivotted
names_to = "Variable", # Column name of the 'variable' column
values_to = "LogVal") %>% # column name of the logical value column
filter(LogVal) %>% # filter only rows that contain a TRUE
select(-LogVal) # remove the logical column
How to create a new column using the values of the next n rows?
You could just apply across all the rows, ensuring your desired length doesn't overrun the number of rows and paste together.
n <- nrow(df)
df$des_column <- sapply(
1:n,
\(x) paste(df$a[x:min(x+parm-1, n)], collapse = ",")
)
df
#> a des_column
#> 1 A A,B,C
#> 2 B B,C,D
#> 3 C C,D,E
#> 4 D D,E
#> 5 E E
\(x)
is shorthand for function(x)
released in R 4.1
Extract information from multiple columns at different positions using R
We can use separate_rows
library(dplyr)
library(tidyr)
df %>%
separate_rows(ID, value_1, convert = TRUE)
-output
# A tibble: 8 x 3
# ID value_1 value_2
# <chr> <int> <chr>
#1 sample1 10 130
#2 sample2 20 130
#3 sample3 30 130
#4 sample3 30 130
#5 sample3 30 130
#6 sample4 40 130
#7 sample5 50 130
#8 sample6 60 130
Or using cSplit
library(splitstackshape)
cSplit(df, c("ID", "value_1"), ";", "long")
Related Topics
Difference Between Mean(C(1,2,21)) and Mean(1,2,21)
How to Compare a Value in a Column to the Previous One Using R
Effectively Debugging Shiny Apps
Change the Index Number of a Dataframe
How Exactly Does R Parse '->', the Right-Assignment Operator
Convert Ggplot Object to Plotly in Shiny Application
Dplyr - Mean for Multiple Columns
How to Dynamically Wrap Facet Label Using Ggplot2
Regarding SQLdf Package/Regexp Function
How to Use the Box-Cox Power Transformation in R
How to Make Object Created Within Function Usable Outside
R: Legend with Points and Lines Being Different Colors (For the Same Legend Item)
Change Background Color of R Plot