How to Preserve Base Data Frame Rownames Upon Filtering in Dplyr Chain

How to preserve base data frame rownames upon filtering in dplyr chain

you can convert rownames to a column and revert back after filtering:

library(dplyr)
library(tibble)  # for `rownames_to_column` and `column_to_rownames`

df %>%
    rownames_to_column('gene') %>%
    filter_if(is.numeric, all_vars(. >= 8)) %>%
    column_to_rownames('gene')

#        BoneMarrow Pulmonary
# ATP1B1         30      3380
# PRR11        2703        27

How to mutate columns but keep rownames in R pipe?

That is because mutate or in general dplyr readjusts rownames from 1 after any operation hence, it does not maintain the original rownames.

If you need them for further manipulation store them as a column.

library(dplyr)

iris %>% 
   .[which(as.numeric(rownames(.))%%3!=0),] %>%
    mutate(row = rownames(.),
            Sepal.Length=Sepal.Length+1) %>%
    pull(row)

#  [1] "1"   "2"   "4"   "5"   "7"   "8"   "10"  "11"  "13"  "14"  "16"  "17"  "19"  "20"  "22"  "23"  "25"  "26" 
# [19] "28"  "29"  "31"  "32"  "34"  "35"  "37"  "38"  "40"  "41"  "43"  "44"  "46"  "47"  "49"  "50"  "52"  "53" 
# [37] "55"  "56"  "58"  "59"  "61"  "62"  "64"  "65"  "67"  "68"  "70"  "71"  "73"  "74"  "76"  "77"  "79"  "80" 
# [55] "82"  "83"  "85"  "86"  "88"  "89"  "91"  "92"  "94"  "95"  "97"  "98"  "100" "101" "103" "104" "106" "107"
# [73] "109" "110" "112" "113" "115" "116" "118" "119" "121" "122" "124" "125" "127" "128" "130" "131" "133" "134"
# [91] "136" "137" "139" "140" "142" "143" "145" "146" "148" "149"

how can I avoid rowSums() dropping rownames?

dplyr (or tidyverse in general) don't allow rownames.

A way to preserve rownames would be to add rownames as new column perform the data manipulation that you want and move the rownames back.

library(dplyr)
library(tibble)

x %>%
  rownames_to_column() %>%
  mutate(Total = rowSums(.[-1])) %>% 
  column_to_rownames()

#  x1 x2 Total
#a  1  2     3
#b  0  4     4
#c  2  5     7
#d  3  0     3
#e  4  9    13

Filter rows in dplyr chain if a set of rows doesn't contain a specific word

We create a grouping column based on the condition that every fourth row is a new block (gl), then filter out the groups where the first element of 'name' is not a _number or _slider, then ungroup and remove the temporary 'grp' column created

library(dplyr)
df %>% 
    group_by(grp = as.integer(gl(n(), 4, n()))) %>% 
    filter(!str_detect(first(name), "_(number|slider)")) %>%
    ungroup %>%
    select(-grp)

Update

Based on the comments from the OP i.e. blocks are determined by their common prefix, then extract the first word, use that as grouping variable and do the filter as before

library(stringr)
df %>%
  group_by(grp = word(name, 1, sep="_")) %>% 
  filter(!str_detect(first(name), "_(number|slider)"))

and the ungroup part remains the same as previous

If there are repeating prefixes i.e. non-adjacent prefixes and needs to be considered as separate blocks, then use rleid from data.table to create the grouping variable

df %>%
  group_by(grp = rleid(word(name, 1, sep="_"))) %>%
  filter(!str_detect(first(name), "_(number|slider)"))

filtering data.frame based on row_number()

Actually dplyr's slice function is made for this kind of subsetting:

df %>% slice(2:7)

(I'm a little late to the party but thought I'd add this for future readers)

filter for complete cases in data.frame using dplyr (case-wise deletion)

Try this:

df %>% na.omit

or this:

df %>% filter(complete.cases(.))

or this:

library(tidyr)
df %>% drop_na

If you want to filter based on one variable's missingness, use a conditional:

df %>% filter(!is.na(x1))

df %>% drop_na(x1)

Other answers indicate that of the solutions above na.omit is much slower but that has to be balanced against the fact that it returns row indices of the omitted rows in the na.action attribute whereas the other solutions above do not.

str(df %>% na.omit)
## 'data.frame':   2 obs. of  2 variables:
##  $ x1: num  1 2
##  $ x2: num  1 2
##  - attr(*, "na.action")= 'omit' Named int  3 4
##    ..- attr(*, "names")= chr  "3" "4"

ADDED Have updated to reflect latest version of dplyr and comments.

ADDED Have updated to reflect latest version of tidyr and comments.

How to correctly write class methods in R6 and chain them

If you want to chain member functions, you need those member functions to return self. This means that the R6 object has to modify the data it contains. Since the benefit of R6 is to reduce copies, I would probably keep a full copy of the data, and have select_func and filter_func update some row and column indices:

library(R6)

dataFrame <- R6Class("dataFrame", 
                      public = list(
  data = data.frame(),
  rows = 0,
  columns = 0,
  initialize = function(data) { 
    self$data <- data
    self$rows <- seq(nrow(data))
    self$columns <- seq_along(data)
  },
  get_data = function() {self$data[self$columns][self$rows,]},
  select_func = function(cols) {
    if(is.character(cols))  cols <- match(cols, names(self$data))
    self$columns <- cols
    self
  },
  filter_func = function(r) {
    if(is.logical(r)) r <- which(r)
    self$rows <- r
    self
  })
)

This allows us to chain the filter and select methods:

dataFrame$new(iris)$filter_func(1:5)$select_func(1:2)$get_data()
#>   Sepal.Length Sepal.Width
#> 1          5.1         3.5
#> 2          4.9         3.0
#> 3          4.7         3.2
#> 4          4.6         3.1
#> 5          5.0         3.6

and our select method can take names too:

dataFrame$new(mtcars)$select_func(c("mpg", "wt"))$get_data()
#>                      mpg    wt
#> Mazda RX4           21.0 2.620
#> Mazda RX4 Wag       21.0 2.875
#> Datsun 710          22.8 2.320
#> Hornet 4 Drive      21.4 3.215
#> Hornet Sportabout   18.7 3.440
#> Valiant             18.1 3.460
#> Duster 360          14.3 3.570
#> Merc 240D           24.4 3.190
#> Merc 230            22.8 3.150
#> Merc 280            19.2 3.440
#> Merc 280C           17.8 3.440
#> Merc 450SE          16.4 4.070
#> Merc 450SL          17.3 3.730
#> Merc 450SLC         15.2 3.780
#> Cadillac Fleetwood  10.4 5.250
#> Lincoln Continental 10.4 5.424
#> Chrysler Imperial   14.7 5.345
#> Fiat 128            32.4 2.200
#> Honda Civic         30.4 1.615
#> Toyota Corolla      33.9 1.835
#> Toyota Corona       21.5 2.465
#> Dodge Challenger    15.5 3.520
#> AMC Javelin         15.2 3.435
#> Camaro Z28          13.3 3.840
#> Pontiac Firebird    19.2 3.845
#> Fiat X1-9           27.3 1.935
#> Porsche 914-2       26.0 2.140
#> Lotus Europa        30.4 1.513
#> Ford Pantera L      15.8 3.170
#> Ferrari Dino        19.7 2.770
#> Maserati Bora       15.0 3.570
#> Volvo 142E          21.4 2.780

For completeness, you need some type safety, and I would also add a reset method to remove all filtering. This effectively gives you a data frame where the filtering and selecting are non-destructive, which could actually be very useful.

^{Created on 2022-05-01 by the reprex package (v2.0.1)}

R: Select rows by value and always include previous row

Create a position index where 'time' value is 13 using which and then subtract 1 from the index and concatenate both to subset

i1 <- which(df1$time == 13) 
ind <- sort(unique(i1 - rep(c(1, 0), each = length(i1))))
ind <- ind[ind >0]
df1[ind,]

-output

  ID speed dist time
2  B     7   10    8
3  C     7   18   13
4  C     8    4    5
5  A     5    6   13
6  D     6    2   13

data

df1 <- structure(list(ID = c("A", "B", "C", "C", "A", "D", "E"), speed = c(4L, 
7L, 7L, 8L, 5L, 6L, 7L), dist = c(12L, 10L, 18L, 4L, 6L, 2L, 
2L), time = c(4L, 8L, 13L, 5L, 13L, 13L, 9L)), 
class = "data.frame", row.names = c(NA, 
-7L))

How to Preserve Base Data Frame Rownames Upon Filtering in Dplyr Chain