Removing duplicate values row-wise in R
Yuo can identify duplicated person ids across the rows using apply
dat$dups <- apply(dat[-1], 1, function(i) any(duplicated(i[!is.na(i)])))
or as Simon O'Hanlon pointed out in the comments
dat$dups <- apply(dat[-1], 1, function(i) any(duplicated(i, incomparables = NA)))
You could then use this to either find the team numbers that have duplicates or to exclude them:
# Return teams that have duplicate person ids
dat$Team[ dat$dups ]
# Exclude rows with duplicates
dat[ ! dat$dups , ]
Remove duplicate values across a few columns but keep rows
Base R way using apply
:
cols <- grep('z_\\d+', names(dat))
dat[cols] <- t(apply(dat[cols], 1, function(x) replace(x, duplicated(x), 0)))
# id z_1 z_2 z_3 z_4 z_5 z_6
#1 1 100 20 0 0 23 0
#2 2 290 0 0 0 0 0
#3 3 38 0 0 0 25 0
#4 4 129 0 0 127 0 0
#5 5 0 0 0 38 0 0
#6 6 290 0 98 78 0 9
tidyverse
way without reshaping can be done using pmap
:
library(tidyverse)
dat %>%
mutate(result = pmap(select(., matches('z_\\d+')), ~{
x <- c(...)
replace(x, duplicated(x), 0)
})) %>%
select(id, result) %>%
unnest_wider(result)
Since tests performed by @thelatemail suggests reshaping is a better option than handling the data rowwise you might want to consider it.
dat %>%
pivot_longer(cols = matches('z_\\d+')) %>%
group_by(id) %>%
mutate(value = replace(value, duplicated(value), 0)) %>%
pivot_wider()
Remove duplicate elements by row in a data frame
Here is a base R
option where we loop through the rows, replace
the duplicated
elements with NA
and concatenate (c
) the non-NA elements with the NA
elements, transpose (t
) and assign the output back to the original dataset
df1[] <- t(apply(df1, 1, function(x) {
x1 <- replace(x, duplicated(x), NA)
c(x1[!is.na(x1)], x1[is.na(x1)])
}))
df1
# A tibble: 4 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 1 2 3
#2 1 NA NA
#3 4 1 NA
#4 2 3 NA
Remove duplicate rows ignoring column order? in R
df <- data.frame(
var1 = c("a", "b", "a", "c", "b", "c"),
var2 = c("b", "a", "c", "a", "c", "b"),
value = c(0.576, 0.576, 0.987, 0.987, 0.034, 0.034)
)
A one-liner base-r
solution:
df_unique <- df[!duplicated(apply(df[,1:2], 1, function(row) paste(sort(row), collapse=""))),]
df_unique
var1 var2 value
1 a b 0.576
3 a c 0.987
5 b c 0.034
What it does: work across the first 2 columns row-wise (apply
with MARGIN = 1
), sort
(alphabetically) the content, paste
into a single string, remove all indices where the string has already occurred before (!duplicated
).
Another (probably better) approach, stepping back, is to take your original matrix and clear out the bottom half using lower.tri
. This way only half of the combinations will have non-0 values:
mat <- matrix(c(0, 0.576, 0.987, 0.576, 0, 0.034, 0.987, 0.034, 0),
nrow=3, dimnames=list(letters[1:3], letters[1:3]))
mat[lower.tri(mat, diag = TRUE)] <- NA
mat
a b c
a NA 0.576 0.987
b NA NA 0.034
c NA NA NA
tidy way to remove duplicates per row
Maybe you want something like this:
library(dplyr)
df %>%
rowwise() %>%
do(data.frame(replace(., duplicated(unlist(.)), NA)))
Output:
# A tibble: 4 × 3
# Rowwise:
x y z
<dbl> <dbl> <dbl>
1 1 NA 2
2 2 3 NA
3 3 4 5
4 4 5 6
How to remove duplicated characters from each rows of a column?
Using tidyverse
we can first add rownames as column, separate comma separated string into separate_rows
, group_by
rowname
and remove duplicated
values and convert them to comma separated string again using toString
.
library(tidyverse)
df %>%
rownames_to_column() %>%
separate_rows(name, sep = ",") %>%
group_by(rowname) %>%
filter(!duplicated(name)) %>%
summarise(name = toString(name)) %>%
column_to_rownames()
# name
#A a, b, c, d
#B a, b, f
#C d
#D a
Base R approach using sapply
which is quite same as @tmfmnk
sapply(strsplit(as.character(df$name), ","), function(x) toString(unique(x)))
#[1] "a, b, c, d" "a, b, f" "d" "a"
Remove duplicate cell in a row
Try this base R using apply
data.frame(Team=df1$Team, t(apply(df1[,-1], 1, function(x)
ifelse(!is.na(x)&duplicated(as.vector(x)),NA,x))))
Team Person1 Person2 Person3 Person4 Person5 Person6 Person7
1 6594794 37505959 37469784 NA NA NA NA NA
2 6595053 30113392 33080042 21537147 32293683 NA NA NA
3 6595201 697417 22860111 NA NA NA NA NA
4 6595380 24432987 32370372 11521625 362790 NA 22312802 32432267
5 6595382 12317669 25645492 NA NA NA NA NA
6 6595444 8114419 236357 32545314 22247108 NA NA NA
7 6595459 2135269 32332907 NA 32436550 NA NA NA
8 6595468 33590928 10905322 32319555 10439608 NA NA NA
9 6595485 33080810 33162061 NA NA NA NA NA
10 6595496 36901773 34931641 NA NA NA NA NA
11 6595523 512193 8747403 NA NA NA NA NA
12 6595524 32393404 113514 NA NA NA NA NA
13 6595526 37855554 37855512 NA NA NA NA NA
14 6595536 18603977 1882599 332261 10969771 712339 2206680 768785
Related Topics
Programming with Ggplot2 and Dplyr
How to Classify a Given Date/Time by the Season (E.G. Summer, Autumn)
R- Plot Numbers Instead of Points
Fitting a Lognormal Distribution to Truncated Data in R
How to Find Which Polygon a Point Belong to via Sf
Continuous Color Bar with Separators Instead of Ticks
Saving a File to Sharepoint with R
How to Prevent User from Setting the End Date Before the Start Date Using the Shiny Daterangeinput
Ggsave Png Error with Larger Size
Control Number Formatting in Shiny's Implementation of Datatable
How to Set Axis Ranges in Ggplot2 When Using a Log Scale
Keep All Plot Components Same Size in Ggplot2 Between Two Plots
Check Whether All Elements of a List Are in Equal in R
Alpha Aesthetic Shows Arrow's Skeleton Instead of Plain Shape - How to Prevent It
R: Is There a Good Replacement for Plyr::Rbind.Fill in Dplyr