Separate Columns with Constant Numbers and Condense Them to One Row in R Data.Frame

Separate columns with constant numbers and condense them to one row in R data.frame

May be we need

library(dplyr)
d %>%
group_by(study.name) %>%
slice(1)

Or in base R after grouping by 'study.name', get the first row while specifying the na.action = NULL as the default option is na.omit which can omit any row having NA in any of the columns

aggregate(.~ study.name, d, head, 1, na.action = NULL)

If we want to subset the columns

nm1 <- names(which(!colSums(!do.call(rbind, by(d[-1], d$study.name,
FUN = function(x) lengths(sapply(x, unique)) == 1)))))
unique(d[c("study.name", nm1)])

Follow-up: Separate columns with constant numbers and condense them to one row in R data.frame

Assuming that the NA rows should be preserved, apply duplicated by looping over the list as well as if all the elements of a particular are NA, then keep that row

lapply(L, function(x)  x[(rowSums(is.na(x)) == ncol(x))|!duplicated(x),])
#$A
# A A.1
#1 1 3

#$B
# B B.1
#1 4 7

#$X
# X X.1
#1 2 1
#2 2 NA
#3 1 3
#4 1 1
#5 NA NA
#6 NA NA

If we also need a check for constant value

is_constant <- function(x) length(unique(x)) == 1L
lapply(L, function(x) if(all(sapply(x, is_constant))) x[1,, drop = FALSE] else x)
#$A
# A A.1
#1 1 3

#$B
# B B.1
#1 4 7

#$X
# X X.1
#1 2 1
#2 2 NA
#3 1 3
#4 1 1
#5 NA NA
#6 NA NA

Extracting columns with constant numbers in R data.frames

We can first find constant columns and then use lapply to loop over them and select only their first row in each study.name.

is_constant <- function(x) length(unique(x)) == 1L 
cols <- names(Filter(all, aggregate(.~study.name, DATA, is_constant)[-1]))

L[cols] <- lapply(L[cols], function(x)
x[ave(x[[1]], DATA$study.name, FUN = seq_along) == 1, ])
L

#$ESL
# ESL ESL.1
#1 1 1
#7 2 2
#9 1 1
#17 1 1
#23 1 1
#35 1 1
#37 2 2
#49 2 2

#$prof
# prof prof.1
#1 2 2
#7 2 2
#9 3 3
#17 2 2
#23 2 2
#35 2 2
#37 NA NA
#49 2 2
#.....

Collapsing rows with consecutive ranges in two separate columns

There are several ways to achieve this, here is one:

library(tidyverse)
genomic_ranges %>%
group_by(sample_ID) %>%
summarize(start = min(start),
end = max(end),
feature = feature[1])

which gives:

# A tibble: 3 x 4
sample_ID start end feature
<chr> <dbl> <dbl> <chr>
1 A 1 5 normal
2 B 20 70 DUP
3 C 250 400 DUP

Select set of columns so that each row has at least one non-NA entry

Using a while loop, this should work to get the minimum set of variables with at least one non-NA per row.

best <- function(df){
best <- which.max(colSums(sapply(df, complete.cases)))
while(any(rowSums(sapply(df[best], complete.cases)) == 0)){
best <- c(best, which.max(sapply(df[is.na(df[best]), ], \(x) sum(complete.cases(x)))))
}
best
}

testing

best(df)
#d c
#4 3

df[best(df)]
# d c
#1 1 1
#2 1 NA
#3 1 NA
#4 1 NA
#5 NA 1

First, select the column with the least NAs (stored in best). Then, update the vector with the column that has the highest number of non-NA rows on the remaining rows (where best has still NAs), until you get every rows with a complete case.

Condensing/combining multiple columns with same name and logical values

ding ding ding!

l <- sapply(df, is.logical)

cbind(df[!l], lapply(split(as.list(df[l]), names(df)[l]), Reduce, f = `|`))

Merging r dataset columns by copying rows

You could pivot your data, so that the column names (AH, SS, QS) appear in one column and the logical values in another column, and then you filter this dataset for rows that have the value TRUE in the new logical column. This can be done by using pivot_longer from the tidyr package:

  library(tidyr)
library(dplyr)

data %>%
pivot_longer(cols = AH:QS, # columns that will be pivotted
names_to = "Variable", # Column name of the 'variable' column
values_to = "LogVal") %>% # column name of the logical value column
filter(LogVal) %>% # filter only rows that contain a TRUE
select(-LogVal) # remove the logical column

How to create a new column using the values of the next n rows?

You could just apply across all the rows, ensuring your desired length doesn't overrun the number of rows and paste together.

n <- nrow(df)

df$des_column <- sapply(
1:n,
\(x) paste(df$a[x:min(x+parm-1, n)], collapse = ",")
)

df
#> a des_column
#> 1 A A,B,C
#> 2 B B,C,D
#> 3 C C,D,E
#> 4 D D,E
#> 5 E E

\(x) is shorthand for function(x) released in R 4.1

Extract information from multiple columns at different positions using R

We can use separate_rows

library(dplyr)
library(tidyr)
df %>%
separate_rows(ID, value_1, convert = TRUE)

-output

# A tibble: 8 x 3
# ID value_1 value_2
# <chr> <int> <chr>
#1 sample1 10 130
#2 sample2 20 130
#3 sample3 30 130
#4 sample3 30 130
#5 sample3 30 130
#6 sample4 40 130
#7 sample5 50 130
#8 sample6 60 130

Or using cSplit

library(splitstackshape)
cSplit(df, c("ID", "value_1"), ";", "long")


Related Topics



Leave a reply



Submit