﻿ Filter Data.Frame Rows by a Logical Condition - ITCodar

# Filter Data.Frame Rows by a Logical Condition

## Filter data.frame rows by a logical condition

To select rows according to one 'cell_type' (e.g. 'hesc'), use `==`:

``expr[expr\$cell_type == "hesc", ]``

To select rows according to two or more different 'cell_type', (e.g. either 'hesc' or 'bj fibroblast'), use `%in%`:

``expr[expr\$cell_type %in% c("hesc", "bj fibroblast"), ]``

## R: filter rows based on a condition in one column

You can first establish the indices of the first-pair parts using `which`:

``library(dplyr)inds <- which(df\$c == 3 & lead(df\$c) == 1 & lead(df\$d) - df\$d < 10)``

and then subset your dataframe on the indices plus 1:

``df[sort(unique(c(inds, inds + 1))),]     d   b c a2 3403 100 3 13 3407 100 1 18 3436 100 3 19 3445 100 1 1``

Alternatively, you can do:

``library(dplyr)df1 <- df %>%                                        # get the first row  filter(c == 3 & lead(c) == 1 & lead(d) - d < 10) df2 <- df %>%                                        # get the second row  filter(lag(c) == 3 & c == 1 & d - lag(d) < 10)arrange(rbind(df1, df2), d)                          # bind the two together and arange by d``

## Subset and filter a dataframe by logical operators and select the foregoing rows

As noted in a comment, it does not make sense to filter rows that do not exist (there are none before row #1). Therefore, here's a solution for a filtering with slightly different parameters. Say, you want to filter target rows where `A == 11 & B == 90` (this value combination also occurs 3 times in your data) and you want to get the five rows preceding the target rows. You can first define a function to get the indices of the rows in question:

``Sequ <- function(col1, col2) {  # get row indices of target row with function `which`  inds <- which(col1 == 11 & col2 == 90)   # sort row indices of the rows before target row AND target row itself  sort(unique(c(inds-5, inds-4, inds-3,inds-2, inds-1, inds)))}``

Next you can use this function as input for `slice`:

``library(dplyr)Sample_Data %>%  slice(Sequ(col1 = A, col2 = B))    A  B1   6 952   7 943   8 934   9 925  10 916  11 907   6 958   7 949   8 9310  9 9211 10 9112 11 9013  6 9514  7 9415  8 9316  9 9217 10 9118 11 90``

## Subset / filter rows in a data frame based on a condition in a column

Here are the two main approaches. I prefer this one for its readability:

``bar <- subset(foo, location == "there")``

Note that you can string together many conditionals with `&` and `|` to create complex subsets.

The second is the indexing approach. You can index rows in R with either numeric, or boolean slices. `foo\$location == "there"` returns a vector of `T` and `F` values that is the same length as the rows of `foo`. You can do this to return only rows where the condition returns true.

``foo[foo\$location == "there", ]``

## How to filter via a logical expression that filters via a variable

As an initial matter, it looks like you have a vector instead of a data frame (only one column). If you really do have a data frame and only ran str() on one column, the very similar technique at the end will work for you.

The first thing to know is that your dates are stored as character strings, while your yesterday object is in the Date format. R will not let you compare objects of different types, so you need to convert at least one of the two objects.

I suggest converting both to the POSIXct format so that you do not lose any information in your dates column but can still compare it to yesterday. Make sure to set the timezone to the same as your system time (mine is "America/New_York").

``Dates <- c("2021-09-09T06:04:35.689Z", "2021-09-09T06:04:35.690Z", "2021-09-09T06:04:35.260Z", "2021-09-24T06:04:35.260Z")Dates <- gsub("T", " ", Dates)Dates <- gsub("Z", "", Dates)Dates <- as.POSIXct(Dates, '%Y-%m-%d %H:%M:%OS', tz = "America/New_York")yesterday <- Sys.time()-86400 #the number of seconds in one day``

Now you can tell R to ignore the time any only compare the dates.

``trunc(Dates, units = c("days")) == trunc(yesterday, units = c("days"))]``

The other part of your question was about filtering. The easiest way to filter is subsetting. You first ask R for the indices of the matching values in your vector (or column) by wrapping your comparison in the `which()` function.

``Indices <- which(trunc(Dates, units = c("days")) == trunc(yesterday, units = c("days"))])``

None of the dates in your str() results match yesterday, so I added one at the end that matches. Calling `which()` returns a 4 to tell you that the fourth item in your vector matches yesterday's date. If more dates matched, it would have more values. I saved the results in "Indices"

We can then use the Indices from `which()` to subset your vector or dataframe.

``Filtered_Dates <- Dates[Indices]Filtered_Dataframe <- df[Indices,] #note the comma, which indicates that we are filtering rows instead of columns.``

## is it possible to subset a data.frame based on a row range AND a logical condition in r?

You can do the subsetting using either of the way.

1. Based on logical vector :
``mtcars[seq(nrow(mtcars)) %in% 1:5 & mtcars\$cyl==6,]#                mpg cyl disp  hp drat    wt  qsec vs am gear carb#Mazda RX4      21.0   6  160 110 3.90 2.620 16.46  0  1    4    4#Mazda RX4 Wag  21.0   6  160 110 3.90 2.875 17.02  0  1    4    4#Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1``

1. Based on row range :
``mtcars[intersect(1:5, which(mtcars\$cyl==6)),]``

## Filtering rows of a dataframe by column values

The `==` with `&` is not going to work anyway as we don't find the different 'Species' in the same cell. With that code, it would be `|` instead of `&`. But, this can be done more easily with `%in%` on a vector of values e.g.

``subset(df1, Species %in% c("Mallard", "Wood-pigeon"))``

the

``c("Mallard", "Wood-pigeon")``

can be extended to any number of Species

## Consecutively filter rows satisfyingly if condition in R dataframe

Create the sequence in the order that you want to check the values so first 1 to 20 and then 0 and -1. `arrange` the data so that the data is ordered by the correct sequence and select rows which is similar to first `dt_diff` in the dataframe.

``librayr(dplyr)pbl_dt_seq <- c(1:20, 0, -1)dt_df %>%  arrange(match(dt_diff, pbl_dt_seq)) %>%  filter(dt_diff == first(dt_diff))#   date       ref_date   dt_diff#  <date>     <date>       <dbl>#1 2003-07-24 2003-07-26       2#2 2003-07-24 2003-07-26       2#3 2003-07-24 2003-07-26       2#4 2003-07-24 2003-07-26       2#5 2003-07-24 2003-07-26       2#6 2003-07-24 2003-07-26       2#7 2003-07-24 2003-07-26       2#8 2003-07-24 2003-07-26       2``

## pandas: filter rows of DataFrame with operator chaining

I'm not entirely sure what you want, and your last line of code does not help either, but anyway:

"Chained" filtering is done by "chaining" the criteria in the boolean index.

``In [96]: dfOut[96]:   A  B  C  Da  1  4  9  1b  4  5  0  2c  5  5  1  0d  1  3  9  6In [99]: df[(df.A == 1) & (df.D == 6)]Out[99]:   A  B  C  Dd  1  3  9  6``

If you want to chain methods, you can add your own mask method and use that one.

``In [90]: def mask(df, key, value):   ....:     return df[df[key] == value]   ....:In [92]: pandas.DataFrame.mask = maskIn [93]: df = pandas.DataFrame(np.random.randint(0, 10, (4,4)), index=list('abcd'), columns=list('ABCD'))In [95]: df.ix['d','A'] = df.ix['a', 'A']In [96]: dfOut[96]:   A  B  C  Da  1  4  9  1b  4  5  0  2c  5  5  1  0d  1  3  9  6In [97]: df.mask('A', 1)Out[97]:   A  B  C  Da  1  4  9  1d  1  3  9  6In [98]: df.mask('A', 1).mask('D', 6)Out[98]:   A  B  C  Dd  1  3  9  6``