R - Delete Consecutive (Only) Duplicates

How to delete only consecutive duplicate rows?

You can use rleid function from data.table which will give you a unique number for every consecutive event values, then using duplicated keep only the first one.

res <- df[!duplicated(data.table::rleid(df$Event_type)), ]
res

#   Subject Trial Event_type                    Code    Time
#23 VP02_RP    15    Picture                face01_n  887969
#24 VP02_RP    15      Sound         mpossound_test5  888260
#25 VP02_RP    15    Picture            pospic_test5  906623
#26 VP02_RP    15    Nothing    ev_mnegpos_adj_onset  928623
#27 VP02_RP    15   Response                      15  958962
#28 VP02_RP    18    Picture                face01_p  987666
#29 VP02_RP    18      Sound         mpossound_test6  987668
#30 VP02_RP    18    Picture            negpic_test6 1006031
#31 VP02_RP    18    Nothing ev_mposnegpos_adj_onset 1028031
#32 VP02_RP    18   Response                      15 1076642

rleid function in base R can be written with rle -

res <- df[!duplicated(with(rle(df$Event_type),rep(seq_along(values), lengths))),]
res

R - delete consecutive (ONLY) duplicates

You just need to check in there is no duplicate following a number, i.e x[i+1] != x[i] and note the last value will always be present.

df[c(df$x[-1] != df$x[-nrow(df)],TRUE),]
  x  y z
3 1 30 3
5 2 49 5
6 4 13 6
8 2 49 8
9 1 30 9

Remove/collapse consecutive duplicate values in sequence

One easy way is to use rle:

Here's your sample data:

x <- scan(what = character(), text = "a a a b c c d e a a b b b e e d d")
# Read 17 items

rle returns a list with two values: the run length ("lengths"), and the value that is repeated for that run ("values").

rle(x)$values
# [1] "a" "b" "c" "d" "e" "a" "b" "e" "d"

Update: For a `data.frame`

If you are working with a data.frame, try something like the following:

## Sample data
mydf <- data.frame(
  V1 = c("a", "a", "a", "b", "c", "c", "d", "e", 
         "a", "a", "b", "b", "e", "e", "d", "d"),
  V2 = c(1, 2, 3, 2, 4, 1, 3, 9, 
         4, 8, 10, 199, 2, 5, 4, 10)
)

## Use rle, as before
X <- rle(mydf$V1)
## Identify the rows you want to keep
Y <- cumsum(c(1, X$lengths[-length(X$lengths)]))
Y
# [1]  1  4  5  7  8  9 11 13 15
mydf[Y, ]
#    V1 V2
# 1   a  1
# 4   b  2
# 5   c  4
# 7   d  3
# 8   e  9
# 9   a  4
# 11  b 10
# 13  e  2
# 15  d  4

Update 2

The "data.table" package has a function rleid that lets you do this quite easily. Using mydf from above, try:

library(data.table)
as.data.table(mydf)[, .SD[1], by = rleid(V1)]
#    rleid V2
# 1:     1  1
# 2:     2  2
# 3:     3  4
# 4:     4  3
# 5:     5  9
# 6:     6  4
# 7:     7 10
# 8:     8  2
# 9:     9  4

How to remove all consecutive data but keep only the first row

Here are a few options.

First, you can use rle to get indices of consecutive values. To keep the first value in a series of consecutive numbers, start with index of 1, and add the other indices cumulatively.

lens <- rle(df$x)$lengths
df[cumsum(c(1, lens[-length(lens)])), ]

As an alternative, using tidyverse you can create groups where there is a difference in x by rows. You could keep the first value in each group.

library(dplyr)

df %>%
  group_by(grp = c(T, diff(x) != 0)) %>%
  filter(grp) %>%
  ungroup %>%
  select(-grp)

Or with data.table you can use rleid (function to gerate run-length type group id). Duplicates are FALSE. Keep rows where not FALSE allows you to keep the first row among repeats.

library(data.table)

setDT(df)[!duplicated(rleid(x))]

Remove duplicates within consecutive runs of characters

We can use gsub

gsub("([A-Z*]>)\\1+", "\\1", tst)
#[1] "C>B>C>*>C"

In order to get the second result, remove the >

gsub(">", "", gsub("([A-Z*]\\>)\\1+", "\\1", tst) ,fixed = TRUE)
#[1] "CBC*C"

Based on the OP's comments below, may be

gsub("(.)\\1+", "\\1", gsub(">", "", tst))
#[1] "CBC*C"
gsub("(.)\\1+", "\\1", gsub(">", "", "A>"))
#[1] "A"
gsub("(.)\\1+", "\\1", gsub(">", "", "A>A"))
#[1] "A"
gsub("(.)\\1+", "\\1", gsub(">", "", "A>A>A>A"))
#[1] "A"

Remove consecutive duplicates per row with RLE and check logic of sequence in R

Step 1:

df[,-1] <- data.frame(t(apply(df[,-1], 1, function(z) {
  r <- rle(z)
  c(r$values, rep(NA, length(z) - length(r$values)))
})))
df
#   Patient    Area1       Area2  Area3  Area4   Area5
# 1       1 Arrival1      Area A Area B   Ward    <NA>
# 2       2 Arrival1 Diagnostics   Ward   <NA>    <NA>
# 3       3 Arrival2      Area A Area B   Ward    <NA>
# 4       4 Arrival1      Area B Area A Area C Arrival
# 5       5 Arrival2        <NA>   <NA>   <NA>    <NA>

Step 2: (tbd, pending "possible pathways")

How to remove consecutive duplicate characters

Here is an option based on strsplit and rle:

x <- c("Organic > Paid Search > Paid Search > Paid Search > Direct > Direct > Organic")
x <- trimws(strsplit(x, ">")[[1]], "both")
paste(rle(x)$values, collapse = " > ")
# output
[1] "Organic > Paid Search > Direct > Organic"

Using a data.frame, try out:

df <- data.frame(Path = c("Organic > Paid Search > Paid Search > Paid Search > Direct > Direct > Organic",
                          "Organic > Paid Search >  Paid Search > Direct > Organic > Direct",
                          "Organic > Organic > Paid Search > Paid Search > Direct > Direct"),
                 conversions = c(6, 5, 3), stringsAsFactors = F)
# Solution
df$Path2 <- sapply(strsplit(df$Path, ">"),
                   function(x) paste(rle(trimws(strsplit(x, ">"), "both"))$values,
                                     collapse = " > "))
df # output
                                                                           Path conversions                                             Path2
1 Organic > Paid Search > Paid Search > Paid Search > Direct > Direct > Organic           6          Organic > Paid Search > Direct > Organic
2              Organic > Paid Search >  Paid Search > Direct > Organic > Direct           5 Organic > Paid Search > Direct > Organic > Direct
3               Organic > Organic > Paid Search > Paid Search > Direct > Direct           3                    Organic > Paid Search > Direct

Hope this helps !

Remove consecutive duplicates from a vector, only if more than 5 consecutive

We can create a logical index to subset both the values and lengths

with(rle(x), rep(values[lengths<=5], lengths[lengths<=5]))
#[1]   1   1   2   1   3 -99 -99   3   1   2   2   0   1 -99

If we want to replace the elements that have lengths greater than 5 to NA

 inverse.rle(within.list(rle(x), values[lengths>5] <- NA))
 #[1]   1   1   2   1   3 -99 -99   3  NA  NA  NA  NA  NA  NA  NA  NA  NA   1   2   2   0   1 -99

How to remove duplicate consecutive text in R separated by :

You can do this with gsub and a regular expression

gsub("\\b(\\w+)(\\:\\1)+\\b", "\\1", DAT$agent)
[1] "A"   "A"   "B"   "C"   "A:C" "A:C" "A:C"

Your Data

DAT = read.table(text="  id  agent    final_col
1  1   A:A         A
2  1   A:A         A
3  2     B         B
4  3     C         C
5  4 A:C:C       A:C
6  4 A:C:C       A:C
7  4 A:C:C       A:C",
header=TRUE, stringsAsFactors=FALSE)

R - Delete Consecutive (Only) Duplicates

How to delete only consecutive duplicate rows?

R - delete consecutive (ONLY) duplicates

Remove/collapse consecutive duplicate values in sequence

Update: For a `data.frame`

Update 2

How to remove all consecutive data but keep only the first row

Remove duplicates within consecutive runs of characters

Remove consecutive duplicates per row with RLE and check logic of sequence in R

How to remove consecutive duplicate characters

Remove consecutive duplicates from a vector, only if more than 5 consecutive

How to remove duplicate consecutive text in R separated by :

Your Data

Related Topics

Leave a reply

How to delete only consecutive duplicate rows?

R - delete consecutive (ONLY) duplicates

Remove/collapse consecutive duplicate values in sequence

Update: For a data.frame

Update 2

How to remove all consecutive data but keep only the first row

Remove duplicates within consecutive runs of characters

Remove consecutive duplicates per row with RLE and check logic of sequence in R

How to remove consecutive duplicate characters

Remove consecutive duplicates from a vector, only if more than 5 consecutive

How to remove duplicate consecutive text in R separated by :

Your Data

Related Topics

Leave a reply

Update: For a `data.frame`