Removing Duplicate Values Row-Wise in R

Removing duplicate values row-wise in R

Yuo can identify duplicated person ids across the rows using apply

dat$dups <- apply(dat[-1], 1, function(i) any(duplicated(i[!is.na(i)])))

or as Simon O'Hanlon pointed out in the comments

dat$dups <- apply(dat[-1], 1, function(i) any(duplicated(i, incomparables = NA)))

You could then use this to either find the team numbers that have duplicates or to exclude them:

# Return teams that have duplicate person ids
dat$Team[ dat$dups ]
# Exclude rows with duplicates
dat[ ! dat$dups , ]

Remove duplicate values across a few columns but keep rows

Base R way using apply :

cols <- grep('z_\\d+', names(dat))
dat[cols] <- t(apply(dat[cols], 1, function(x) replace(x, duplicated(x), 0)))

# id z_1 z_2 z_3 z_4 z_5 z_6
#1 1 100 20 0 0 23 0
#2 2 290 0 0 0 0 0
#3 3 38 0 0 0 25 0
#4 4 129 0 0 127 0 0
#5 5 0 0 0 38 0 0
#6 6 290 0 98 78 0 9

tidyverse way without reshaping can be done using pmap :

library(tidyverse)

dat %>%
mutate(result = pmap(select(., matches('z_\\d+')), ~{
x <- c(...)
replace(x, duplicated(x), 0)
})) %>%
select(id, result) %>%
unnest_wider(result)

Since tests performed by @thelatemail suggests reshaping is a better option than handling the data rowwise you might want to consider it.

dat %>%
pivot_longer(cols = matches('z_\\d+')) %>%
group_by(id) %>%
mutate(value = replace(value, duplicated(value), 0)) %>%
pivot_wider()

Remove duplicate elements by row in a data frame

Here is a base R option where we loop through the rows, replace the duplicated elements with NA and concatenate (c) the non-NA elements with the NA elements, transpose (t) and assign the output back to the original dataset

df1[] <- t(apply(df1, 1, function(x) {
x1 <- replace(x, duplicated(x), NA)
c(x1[!is.na(x1)], x1[is.na(x1)])
}))
df1
# A tibble: 4 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 1 2 3
#2 1 NA NA
#3 4 1 NA
#4 2 3 NA

Remove duplicate rows ignoring column order? in R

df <- data.frame(
var1 = c("a", "b", "a", "c", "b", "c"),
var2 = c("b", "a", "c", "a", "c", "b"),
value = c(0.576, 0.576, 0.987, 0.987, 0.034, 0.034)
)

A one-liner base-r solution:

df_unique <- df[!duplicated(apply(df[,1:2], 1, function(row) paste(sort(row), collapse=""))),]

df_unique
var1 var2 value
1 a b 0.576
3 a c 0.987
5 b c 0.034

What it does: work across the first 2 columns row-wise (apply with MARGIN = 1), sort (alphabetically) the content, paste into a single string, remove all indices where the string has already occurred before (!duplicated).

Another (probably better) approach, stepping back, is to take your original matrix and clear out the bottom half using lower.tri. This way only half of the combinations will have non-0 values:

mat <- matrix(c(0, 0.576, 0.987, 0.576, 0, 0.034, 0.987, 0.034, 0), 
nrow=3, dimnames=list(letters[1:3], letters[1:3]))

mat[lower.tri(mat, diag = TRUE)] <- NA
mat
a b c
a NA 0.576 0.987
b NA NA 0.034
c NA NA NA

tidy way to remove duplicates per row

Maybe you want something like this:

library(dplyr)
df %>%
rowwise() %>%
do(data.frame(replace(., duplicated(unlist(.)), NA)))

Output:

# A tibble: 4 × 3
# Rowwise:
x y z
<dbl> <dbl> <dbl>
1 1 NA 2
2 2 3 NA
3 3 4 5
4 4 5 6

How to remove duplicated characters from each rows of a column?

Using tidyverse we can first add rownames as column, separate comma separated string into separate_rows, group_by rowname and remove duplicated values and convert them to comma separated string again using toString.

library(tidyverse)

df %>%
rownames_to_column() %>%
separate_rows(name, sep = ",") %>%
group_by(rowname) %>%
filter(!duplicated(name)) %>%
summarise(name = toString(name)) %>%
column_to_rownames()

# name
#A a, b, c, d
#B a, b, f
#C d
#D a

Base R approach using sapply which is quite same as @tmfmnk

sapply(strsplit(as.character(df$name), ","), function(x) toString(unique(x)))
#[1] "a, b, c, d" "a, b, f" "d" "a"

Remove duplicate cell in a row

Try this base R using apply

data.frame(Team=df1$Team, t(apply(df1[,-1], 1, function(x)
ifelse(!is.na(x)&duplicated(as.vector(x)),NA,x))))
Team Person1 Person2 Person3 Person4 Person5 Person6 Person7
1 6594794 37505959 37469784 NA NA NA NA NA
2 6595053 30113392 33080042 21537147 32293683 NA NA NA
3 6595201 697417 22860111 NA NA NA NA NA
4 6595380 24432987 32370372 11521625 362790 NA 22312802 32432267
5 6595382 12317669 25645492 NA NA NA NA NA
6 6595444 8114419 236357 32545314 22247108 NA NA NA
7 6595459 2135269 32332907 NA 32436550 NA NA NA
8 6595468 33590928 10905322 32319555 10439608 NA NA NA
9 6595485 33080810 33162061 NA NA NA NA NA
10 6595496 36901773 34931641 NA NA NA NA NA
11 6595523 512193 8747403 NA NA NA NA NA
12 6595524 32393404 113514 NA NA NA NA NA
13 6595526 37855554 37855512 NA NA NA NA NA
14 6595536 18603977 1882599 332261 10969771 712339 2206680 768785


Related Topics



Leave a reply



Submit