How to preserve base data frame rownames upon filtering in dplyr chain
you can convert rownames to a column and revert back after filtering:
library(dplyr)
library(tibble) # for `rownames_to_column` and `column_to_rownames`
df %>%
rownames_to_column('gene') %>%
filter_if(is.numeric, all_vars(. >= 8)) %>%
column_to_rownames('gene')
# BoneMarrow Pulmonary
# ATP1B1 30 3380
# PRR11 2703 27
How to mutate columns but keep rownames in R pipe?
That is because mutate
or in general dplyr
readjusts rownames from 1 after any operation hence, it does not maintain the original rownames.
If you need them for further manipulation store them as a column.
library(dplyr)
iris %>%
.[which(as.numeric(rownames(.))%%3!=0),] %>%
mutate(row = rownames(.),
Sepal.Length=Sepal.Length+1) %>%
pull(row)
# [1] "1" "2" "4" "5" "7" "8" "10" "11" "13" "14" "16" "17" "19" "20" "22" "23" "25" "26"
# [19] "28" "29" "31" "32" "34" "35" "37" "38" "40" "41" "43" "44" "46" "47" "49" "50" "52" "53"
# [37] "55" "56" "58" "59" "61" "62" "64" "65" "67" "68" "70" "71" "73" "74" "76" "77" "79" "80"
# [55] "82" "83" "85" "86" "88" "89" "91" "92" "94" "95" "97" "98" "100" "101" "103" "104" "106" "107"
# [73] "109" "110" "112" "113" "115" "116" "118" "119" "121" "122" "124" "125" "127" "128" "130" "131" "133" "134"
# [91] "136" "137" "139" "140" "142" "143" "145" "146" "148" "149"
how can I avoid rowSums() dropping rownames?
dplyr
(or tidyverse
in general) don't allow rownames.
A way to preserve rownames would be to add rownames as new column perform the data manipulation that you want and move the rownames back.
library(dplyr)
library(tibble)
x %>%
rownames_to_column() %>%
mutate(Total = rowSums(.[-1])) %>%
column_to_rownames()
# x1 x2 Total
#a 1 2 3
#b 0 4 4
#c 2 5 7
#d 3 0 3
#e 4 9 13
Filter rows in dplyr chain if a set of rows doesn't contain a specific word
We create a grouping column based on the condition that every fourth row is a new block (gl
), then filter
out the groups where the first
element of 'name' is not a _number
or _slider
, then ungroup
and remove the temporary 'grp' column created
library(dplyr)
df %>%
group_by(grp = as.integer(gl(n(), 4, n()))) %>%
filter(!str_detect(first(name), "_(number|slider)")) %>%
ungroup %>%
select(-grp)
Update
Based on the comments from the OP i.e. blocks are determined by their common prefix, then extract the first word
, use that as grouping variable and do the filter
as before
library(stringr)
df %>%
group_by(grp = word(name, 1, sep="_")) %>%
filter(!str_detect(first(name), "_(number|slider)"))
and the ungroup
part remains the same as previous
If there are repeating prefixes i.e. non-adjacent prefixes and needs to be considered as separate blocks, then use rleid
from data.table
to create the grouping variable
df %>%
group_by(grp = rleid(word(name, 1, sep="_"))) %>%
filter(!str_detect(first(name), "_(number|slider)"))
filtering data.frame based on row_number()
Actually dplyr's slice
function is made for this kind of subsetting:
df %>% slice(2:7)
(I'm a little late to the party but thought I'd add this for future readers)
filter for complete cases in data.frame using dplyr (case-wise deletion)
Try this:
df %>% na.omit
or this:
df %>% filter(complete.cases(.))
or this:
library(tidyr)
df %>% drop_na
If you want to filter based on one variable's missingness, use a conditional:
df %>% filter(!is.na(x1))
or
df %>% drop_na(x1)
Other answers indicate that of the solutions above na.omit
is much slower but that has to be balanced against the fact that it returns row indices of the omitted rows in the na.action
attribute whereas the other solutions above do not.
str(df %>% na.omit)
## 'data.frame': 2 obs. of 2 variables:
## $ x1: num 1 2
## $ x2: num 1 2
## - attr(*, "na.action")= 'omit' Named int 3 4
## ..- attr(*, "names")= chr "3" "4"
ADDED Have updated to reflect latest version of dplyr and comments.
ADDED Have updated to reflect latest version of tidyr and comments.
How to correctly write class methods in R6 and chain them
If you want to chain member functions, you need those member functions to return self
. This means that the R6 object has to modify the data it contains. Since the benefit of R6 is to reduce copies, I would probably keep a full copy of the data, and have select_func
and filter_func
update some row and column indices:
library(R6)
dataFrame <- R6Class("dataFrame",
public = list(
data = data.frame(),
rows = 0,
columns = 0,
initialize = function(data) {
self$data <- data
self$rows <- seq(nrow(data))
self$columns <- seq_along(data)
},
get_data = function() {self$data[self$columns][self$rows,]},
select_func = function(cols) {
if(is.character(cols)) cols <- match(cols, names(self$data))
self$columns <- cols
self
},
filter_func = function(r) {
if(is.logical(r)) r <- which(r)
self$rows <- r
self
})
)
This allows us to chain the filter and select methods:
dataFrame$new(iris)$filter_func(1:5)$select_func(1:2)$get_data()
#> Sepal.Length Sepal.Width
#> 1 5.1 3.5
#> 2 4.9 3.0
#> 3 4.7 3.2
#> 4 4.6 3.1
#> 5 5.0 3.6
and our select method can take names too:
dataFrame$new(mtcars)$select_func(c("mpg", "wt"))$get_data()
#> mpg wt
#> Mazda RX4 21.0 2.620
#> Mazda RX4 Wag 21.0 2.875
#> Datsun 710 22.8 2.320
#> Hornet 4 Drive 21.4 3.215
#> Hornet Sportabout 18.7 3.440
#> Valiant 18.1 3.460
#> Duster 360 14.3 3.570
#> Merc 240D 24.4 3.190
#> Merc 230 22.8 3.150
#> Merc 280 19.2 3.440
#> Merc 280C 17.8 3.440
#> Merc 450SE 16.4 4.070
#> Merc 450SL 17.3 3.730
#> Merc 450SLC 15.2 3.780
#> Cadillac Fleetwood 10.4 5.250
#> Lincoln Continental 10.4 5.424
#> Chrysler Imperial 14.7 5.345
#> Fiat 128 32.4 2.200
#> Honda Civic 30.4 1.615
#> Toyota Corolla 33.9 1.835
#> Toyota Corona 21.5 2.465
#> Dodge Challenger 15.5 3.520
#> AMC Javelin 15.2 3.435
#> Camaro Z28 13.3 3.840
#> Pontiac Firebird 19.2 3.845
#> Fiat X1-9 27.3 1.935
#> Porsche 914-2 26.0 2.140
#> Lotus Europa 30.4 1.513
#> Ford Pantera L 15.8 3.170
#> Ferrari Dino 19.7 2.770
#> Maserati Bora 15.0 3.570
#> Volvo 142E 21.4 2.780
For completeness, you need some type safety, and I would also add a reset method to remove all filtering. This effectively gives you a data frame where the filtering and selecting are non-destructive, which could actually be very useful.
Created on 2022-05-01 by the reprex package (v2.0.1)
R: Select rows by value and always include previous row
Create a position index where 'time' value is 13 using which
and then subtract 1 from the index and concatenate both to subset
i1 <- which(df1$time == 13)
ind <- sort(unique(i1 - rep(c(1, 0), each = length(i1))))
ind <- ind[ind >0]
df1[ind,]
-output
ID speed dist time
2 B 7 10 8
3 C 7 18 13
4 C 8 4 5
5 A 5 6 13
6 D 6 2 13
data
df1 <- structure(list(ID = c("A", "B", "C", "C", "A", "D", "E"), speed = c(4L,
7L, 7L, 8L, 5L, 6L, 7L), dist = c(12L, 10L, 18L, 4L, 6L, 2L,
2L), time = c(4L, 8L, 13L, 5L, 13L, 13L, 9L)),
class = "data.frame", row.names = c(NA,
-7L))
Related Topics
How to Do a Data.Table Merge Operation
Remove Rows Where All Variables Are Na Using Dplyr
Declaring a Const Variable in R
Convert from Lowercase to Uppercase All Values in All Character Variables in Dataframe
Categorical Bubble Plot for Mapping Studies
Check If R Is Running in Rstudio
How to Expand an Ellipsis (...) Argument Without Evaluating It in R
Dynamically Adjust Height And/Or Width of Shiny-Plotly Output Based on Window Size
Using 'Rvest' to Extract Links
Plotting Data from an Svm Fit - Hyperplane
Generating a Vector of Difference Between Two Vectors
Ggplot2 Theme with No Axes or Grid
Passing List of Named Parameters to Function
R Dplyr Filter Not Masking Base Filter
Creating Legend with Circles Leaflet R