Removing Duplicate Values Row-Wise in R

Removing duplicate values row-wise in R

Yuo can identify duplicated person ids across the rows using apply

dat$dups <- apply(dat[-1], 1, function(i) any(duplicated(i[!is.na(i)])))

or as Simon O'Hanlon pointed out in the comments

dat$dups <- apply(dat[-1], 1, function(i) any(duplicated(i, incomparables = NA)))

You could then use this to either find the team numbers that have duplicates or to exclude them:

# Return teams that have duplicate person ids
dat$Team[ dat$dups ]
# Exclude rows with duplicates
dat[ ! dat$dups , ]

Remove duplicate values across a few columns but keep rows

Base R way using apply :

cols <- grep('z_\\d+', names(dat))
dat[cols] <- t(apply(dat[cols], 1, function(x)  replace(x, duplicated(x), 0)))

#  id z_1 z_2 z_3 z_4 z_5 z_6
#1  1 100  20   0   0  23   0
#2  2 290   0   0   0   0   0
#3  3  38   0   0   0  25   0
#4  4 129   0   0 127   0   0
#5  5   0   0   0  38   0   0
#6  6 290   0  98  78   0   9

tidyverse way without reshaping can be done using pmap :

library(tidyverse)

dat %>%
  mutate(result = pmap(select(., matches('z_\\d+')), ~{
    x <- c(...)
    replace(x, duplicated(x), 0)
    })) %>%
  select(id, result) %>%
  unnest_wider(result)

Since tests performed by @thelatemail suggests reshaping is a better option than handling the data rowwise you might want to consider it.

dat %>%
  pivot_longer(cols = matches('z_\\d+')) %>%
  group_by(id) %>%
  mutate(value = replace(value, duplicated(value), 0)) %>%
  pivot_wider()

Remove duplicate elements by row in a data frame

Here is a base R option where we loop through the rows, replace the duplicated elements with NA and concatenate (c) the non-NA elements with the NA elements, transpose (t) and assign the output back to the original dataset

df1[] <- t(apply(df1, 1, function(x) {
        x1 <- replace(x, duplicated(x), NA)
        c(x1[!is.na(x1)], x1[is.na(x1)])
        }))
df1
# A tibble: 4 x 3
#       x     y     z
#  <dbl> <dbl> <dbl>
#1     1     2     3
#2     1    NA    NA
#3     4     1    NA
#4     2     3    NA

Remove duplicate rows ignoring column order? in R

df <- data.frame(
    var1 = c("a", "b", "a", "c", "b", "c"), 
    var2 = c("b", "a", "c", "a", "c", "b"), 
    value = c(0.576, 0.576, 0.987, 0.987, 0.034, 0.034)
)

A one-liner base-r solution:

df_unique <- df[!duplicated(apply(df[,1:2], 1, function(row) paste(sort(row), collapse=""))),]

df_unique
  var1 var2 value
1    a    b 0.576
3    a    c 0.987
5    b    c 0.034

What it does: work across the first 2 columns row-wise (apply with MARGIN = 1), sort (alphabetically) the content, paste into a single string, remove all indices where the string has already occurred before (!duplicated).

Another (probably better) approach, stepping back, is to take your original matrix and clear out the bottom half using lower.tri. This way only half of the combinations will have non-0 values:

mat <- matrix(c(0, 0.576, 0.987, 0.576, 0, 0.034, 0.987, 0.034, 0), 
              nrow=3, dimnames=list(letters[1:3], letters[1:3]))

mat[lower.tri(mat, diag = TRUE)] <- NA
mat
   a     b     c
a NA 0.576 0.987
b NA    NA 0.034
c NA    NA    NA

tidy way to remove duplicates per row

Maybe you want something like this:

library(dplyr)
df %>%
  rowwise() %>% 
  do(data.frame(replace(., duplicated(unlist(.)), NA)))

Output:

# A tibble: 4 × 3
# Rowwise: 
      x     y     z
  <dbl> <dbl> <dbl>
1     1    NA     2
2     2     3    NA
3     3     4     5
4     4     5     6

How to remove duplicated characters from each rows of a column?

Using tidyverse we can first add rownames as column, separate comma separated string into separate_rows, group_by rowname and remove duplicated values and convert them to comma separated string again using toString.

library(tidyverse)

df %>%
  rownames_to_column() %>%
  separate_rows(name, sep = ",") %>%
  group_by(rowname) %>%
  filter(!duplicated(name)) %>%
  summarise(name = toString(name)) %>%
  column_to_rownames()

#        name
#A a, b, c, d
#B    a, b, f
#C          d
#D          a

Base R approach using sapply which is quite same as @tmfmnk

sapply(strsplit(as.character(df$name), ","), function(x) toString(unique(x)))
#[1] "a, b, c, d" "a, b, f"    "d"          "a"

Remove duplicate cell in a row

Try this base R using apply

data.frame(Team=df1$Team, t(apply(df1[,-1], 1, function(x)
  ifelse(!is.na(x)&duplicated(as.vector(x)),NA,x))))
      Team  Person1  Person2  Person3  Person4 Person5  Person6  Person7
1  6594794 37505959 37469784       NA       NA      NA       NA       NA
2  6595053 30113392 33080042 21537147 32293683      NA       NA       NA
3  6595201   697417 22860111       NA       NA      NA       NA       NA
4  6595380 24432987 32370372 11521625   362790      NA 22312802 32432267
5  6595382 12317669 25645492       NA       NA      NA       NA       NA
6  6595444  8114419   236357 32545314 22247108      NA       NA       NA
7  6595459  2135269 32332907       NA 32436550      NA       NA       NA
8  6595468 33590928 10905322 32319555 10439608      NA       NA       NA
9  6595485 33080810 33162061       NA       NA      NA       NA       NA
10 6595496 36901773 34931641       NA       NA      NA       NA       NA
11 6595523   512193  8747403       NA       NA      NA       NA       NA
12 6595524 32393404   113514       NA       NA      NA       NA       NA
13 6595526 37855554 37855512       NA       NA      NA       NA       NA
14 6595536 18603977  1882599   332261 10969771  712339  2206680   768785

Removing Duplicate Values Row-Wise in R