Filtering observations in dplyr in combination with grepl
I didn't understand your second regex, but this more basic regex seems to do the trick:
df1 %>% filter(!grepl("^x|xx$", fruit))
###
fruit group
1 apple A
2 orange B
3 banxana A
4 appxxle B
And I assume you know this, but you don't have to use dplyr
here at all:
df1[!grepl("^x|xx$", df1$fruit), ]
###
fruit group
1 apple A
2 orange B
7 banxana A
8 appxxle B
The regex is looking for strings that start with x
OR end with xx
. The ^
and $
are regex anchors for the beginning and ending of the string respectively. |
is the OR operator. We're negating the results of grepl
with the !
so we're finding strings that don't match what's inside the regex.
using grepl * to return NA values in R dplyr
I ended up creating a custom function to do this:
greplna <- function(data, reg="*", var="Discount code"){
if(reg == "*"){
tmp <- grepl("*", as.list(data[var])[[1]]) | is.na(as.list(data[var])[[1]])
}else{
tmp <- grepl(reg, as.list(data[var])[[1]])
}
return(tmp)
}
You can then use this in a dplyr statement:
df %>% filter(greplna(., search, "Discount code"))
but don't use it after a group, as the .
gets the whole dataset, not the grouped datasets
Filtering according to combination of matching data across variables in R
You can filter
by the Group column after this,
df <-as.data.frame(df)
df$v <- sapply(seq(df[,1]),function(x)
paste(sort(c(df[x,1],df[x,2])),collapse=""))
l <- data.frame(v=unique(df$v),
Group=paste0("Group",seq(unique(df$v))))
df <- merge(df,l,by="v")[,-1]
df
Word1 Word2 distance speaker session Group
1 WordA WordX 1.40 JB 1 Group1
2 WordX WordA 0.23 JB 1 Group1
3 WordB WordY 2.10 JB 1 Group2
4 WordY WordB 2.30 JB 1 Group2
5 WordC WordZ 4.70 JB 1 Group3
6 WordZ WordC 0.51 JB 1 Group3
Conditional filtering using grepl and relative row position in group
Try this:
library(dplyr)
Dataset %>%
group_by(Journal_ref, Journal_type) %>%
summarise(Journal_value = last(Journal_value)) %>%
ungroup() %>% group_by(Journal_ref) %>%
filter(!(n() > 1 & Journal_type == "Rev"))
Output:
Journal_ref Journal_type Journal_value
<fct> <fct> <dbl>
1 1111 Adj 90
2 2222 Adj 12000
3 3333 Rev 500
4 4444 Adj 2500
filtering some strings but some of them not! with grepl
The OP has requested to filter out 'AxxBy' strings but wants to keep string 'AxxByy' (where 'x' and 'y' denote digits.
Often it is easier to specify what to keep than what to remove. To keep strings which obey the pattern 'AxxByy' the regular expression
"^A\\d{2}B\\d{2}$"
can be used where ^
denotes the begin of the string, \\d{2}
a sequence of exactly two digits, and $
the end of the string. A
and B
stand for themselves.
With this regular expression, dplyr
, and grepl()
can be used to filter the input data frame DF
:
library(dplyr)
#which rows are kept?
kept <- DF %>%
+ filter(grepl("^A\\d{2}B\\d{2}$", pair))
kept
# pair
#1 A10B33
#2 A11B44
# which rows are removed?
removed <- DF %>%
+ filter(!grepl("^A\\d{2}B\\d{2}$", pair))
removed
# pair
#1 A1B2
#2 A2B3
#3 A3B4
#4 A4B22
#5 AB
#6 A
#7 B
#8 A1
#9 A12
#10 B1
#11 B12
#12 AA12B34
#13 A12BB34
Note that I've added some edge cases for demonstration.
BTW: dplyr
is not required if only the vector pair
needs to be filtered. So, in base R the alternative expressions
pair[grepl("^A\\d{2}B\\d{2}$", pair)]
grep("^A\\d{2}B\\d{2}$", pair, value = TRUE)
both return the strings to keep:
[1] "A10B33" "A11B44"
while
pair[!grepl("^A\\d{2}B\\d{2}$", pair)]
returns the removed strings:
[1] "A1B2" "A2B3" "A3B4" "A4B22" "AB" "A" "B" "A1"
[9] "A12" "B1" "B12" "AA12B34" "A12BB34"
Data
As given by the OP but with some edge cases appended:
# create vector of test patterns using paste0() instead of paste(..., sep = "")
pair <- paste0("A", c(1:4, 10, 11), "B", c(2, 3, 4, 22, 33, 44))
# alternatvely use sprintf()
pair <- sprintf("A%iB%i", c(1:4, 10, 11), c(2, 3, 4, 22, 33, 44))
# add some edge cases
pair <- append(pair, c("AB", "A", "B", "A1", "A12", "B1", "B12", "AA12B34", "A12BB34"))
# create data frame
DF <- data.frame(pair)
DF
# pair
#1 A1B2
#2 A2B3
#3 A3B4
#4 A4B22
#5 A10B33
#6 A11B44
#7 AB
#8 A
#9 B
#10 A1
#11 A12
#12 B1
#13 B12
#14 AA12B34
#15 A12BB34
Filtering multiple string columns based on 2 different criteria - questions about grepl and starts_with
We can use filter
with across
. where we loop over the columns using c_across
specifying the column name match in select_helpers (starts_with
), get a logical output with grepl
checking for either "C18" or (|
) the number that starts with (^
) 153
library(dplyr) #1.0.0
library(stringr)
df %>%
# // do a row wise grouping
rowwise() %>%
# // subset the columns that starts with 'DGN' within c_across
# // apply grepl condition on the subset
# // wrap with any for any column in a row meeting the condition
filter(any(grepl("C18|^153", c_across(starts_with("DGN")))))
Or with filter_at
df %>%
# //apply the any_vars along with grepl in filter_at
filter_at(vars(starts_with("DGN")), any_vars(grepl("C18|^153", .)))
data
df <- data.frame(ID = 1:3, DGN1 = c("2_C18", 32, "1532"),
DGN2 = c("24", "C18_2", "23"))
Find a specific string with grepl across all columns in R dplyr
Use if_any
to match a row if any of the column (i.e. at least one among all) matches the pattern. With if_all
, every column would have to match the pattern.
mpg |>
filter(if_any(.cols = everything(), ~ grepl("audi", .)))
dplyr slice ifelse grepl filter in r: unexpected outcome
After grouping by 'ID', filter
those having either all
elements in 'Commnets' have substring 'Audited' or |
all
'Unaudited' or else
return the first 'Audited'
library(dplyr)
df %>%
mutate(Date = as.Date(Date)) %>%
arrange(ID,Commnets,desc(Date)) %>%
group_by(ID = trimws(ID)) %>%
mutate(flag = all(grepl('\\bAudited',
Commnets))|all(grepl('\\bUnaudited', Commnets))) %>%
filter(flag| (!flag & grepl('\\bAudited', Commnets))) %>%
filter(if(all(!flag)) row_number() == 1 else TRUE) %>%
ungroup %>%
select(-flag)
# A tibble: 7 x 4
# ID rating Commnets Date
# <chr> <chr> <chr> <date>
#1 H2 D Audited 2018-11-10
#2 H3 C+ Unaudited 2018-10-02
#3 H1 C Audited 2018-12-10
#4 H2 C Audited 2018-11-10
#5 H3 C+ Unaudited 2018-10-02
#6 H3 C Unaudited Co 2018-10-10
#7 H4 C Audited 2020-09-03
Or if we wanted to keep all the 'Audited', just remove the second filter
df %>%
mutate(Date = as.Date(Date)) %>%
arrange(ID,Commnets,desc(Date)) %>%
group_by(ID = trimws(ID)) %>%
mutate(flag = all(grepl('\\bAudited', Commnets))|all(grepl('\\bUnaudited', Commnets))) %>%
filter(flag| (!flag & grepl('\\bAudited', Commnets))) %>%
ungroup %>%
select(-flag)
# A tibble: 8 x 4
# ID rating Commnets Date
# <chr> <chr> <chr> <date>
#1 H2 D Audited 2018-11-10
#2 H3 C+ Unaudited 2018-10-02
#3 H1 C Audited 2018-12-10
#4 H1 C Audited Co 2018-12-10
#5 H2 C Audited 2018-11-10
#6 H3 C+ Unaudited 2018-10-02
#7 H3 C Unaudited Co 2018-10-10
#8 H4 C Audited 2020-09-03
Related Topics
Add One Column Below Another in a Data.Frame in R
Knitr: How to Show Two Plots of Different Sizes Next to Each Other
How to Export an Excel Sheet Range to a Picture, from Within R
How to Add an External Legend to Ggpairs()
Speedup Conversion of 2 Million Rows of Date Strings to Posix.Ct
Creating a Heat Map from (X,Y) Corrdinates in R
R: Why Does Read.Table Stop Reading a File
Floating Point Arithmetic and Reproducibility
Add Textbox to Facet Wrapped Layout in Ggplot2
Find the Probability Density of a New Data Point Using "Density" Function in R
Geom_Boxplot() from Ggplot2:Forcing an Empty Level to Appear
R Dplyr Join by Range or Virtual Column
Sum Nlayers of a Rasterstack in R
Generate Observers for Dynamic Number of Inputs
Dodging Points and Error Bars with Ggplot