Find duplicate values in R
You could use table
, i.e.
n_occur <- data.frame(table(vocabulary$id))
gives you a data frame with a list of id
s and the number of times they occurred.
n_occur[n_occur$Freq > 1,]
tells you which id
s occurred more than once.
vocabulary[vocabulary$id %in% n_occur$Var1[n_occur$Freq > 1],]
returns the records with more than one occurrence.
Find duplicated elements with dplyr
I guess you could use filter
for this purpose:
mtcars %>%
group_by(carb) %>%
filter(n()>1)
Small example (note that I added summarize()
to prove that the resulting data set does not contain rows with duplicate 'carb'. I used 'carb' instead of 'cyl' because 'carb' has unique values whereas 'cyl' does not):
mtcars %>% group_by(carb) %>% summarize(n=n())
#Source: local data frame [6 x 2]
#
# carb n
#1 1 7
#2 2 10
#3 3 3
#4 4 10
#5 6 1
#6 8 1
mtcars %>% group_by(carb) %>% filter(n()>1) %>% summarize(n=n())
#Source: local data frame [4 x 2]
#
# carb n
#1 1 7
#2 2 10
#3 3 3
#4 4 10
How to find duplicate records within the same timeperiod in R
Not sure what is your expected output. Here is one way which will give a unique ID
to every "duplicates".
library(dplyr)
df %>%
tidyr::unite(DateTime, Date, Time, sep = " ") %>%
mutate(DateTime = lubridate::ymd_hms(DateTime)) %>%
group_by(Observer, FocalID) %>%
mutate(grp = floor(difftime(DateTime, first(DateTime), units = 'hour'))) %>%
group_by(grp, .add = TRUE) %>%
mutate(ID = cur_group_id()) %>%
ungroup() %>%
select(-grp)
# A tibble: 4 x 5
# N DateTime Observer FocalID ID
# <int> <dttm> <chr> <chr> <int>
#1 1 2018-05-20 07:05:00 VR JK 2
#2 2 2018-05-20 07:50:00 VR JK 2
#3 3 2018-05-21 07:50:00 JD CJD 1
#4 4 2018-05-21 08:25:00 JD CJD 1
All the rows with similar ID's can be considered as a part of one group.
data
df <- structure(list(N = 1:4, Date = c(20180520L, 20180520L, 20180521L,
20180521L), Time = c("07:05:00", "07:50:00", "07:50:00", "08:25:00"
), Observer = c("VR", "VR", "JD", "JD"), FocalID = c("JK", "JK",
"CJD", "CJD")), class = "data.frame", row.names = c(NA, -4L))
How to get a map of repeated values in R?
in base R you could do something like:
t(do.call(cbind,tapply(df$position,df$names,function(x)if(length(x)>1)combn(x,2))))
[,1] [,2]
[1,] 1 5
[2,] 1 7
[3,] 5 7
[4,] 2 4
Unique case of finding duplicate values flexibly across columns in R
tidyverse
df <- data.frame(animal_1 = c("cat", "dog", "mouse", "squirrel"),
predation_type = c("eats", "eats", "eaten by", "eats"),
animal_2 = c("mouse", "squirrel", "cat", "nuts"))
library(tidyverse)
df %>%
rowwise() %>%
mutate(duplicates = str_c(sort(c_across(c(1, 3))), collapse = "")) %>%
group_by(duplicates) %>%
mutate(duplicates = n() > 1) %>%
ungroup()
#> # A tibble: 4 x 4
#> animal_1 predation_type animal_2 duplicates
#> <chr> <chr> <chr> <lgl>
#> 1 cat eats mouse TRUE
#> 2 dog eats squirrel FALSE
#> 3 mouse eaten by cat TRUE
#> 4 squirrel eats nuts FALSE
Created on 2022-01-17 by the reprex package (v2.0.1)
removing duplicates
library(tidyverse)
df %>%
filter(!duplicated(map2(animal_1, animal_2, ~str_c(sort((c(.x, .y))), collapse = ""))))
#> animal_1 predation_type animal_2
#> 1 cat eats mouse
#> 2 dog eats squirrel
#> 3 squirrel eats nuts
Created on 2022-01-17 by the reprex package (v2.0.1)
R - Finding duplicates in list entries
You can unlist
first:
unlisted <- unlist(examplelist)
unlisted[duplicated(unlisted)]
# b1 c1 c2
# "red" "black" "green"
unlisted[!duplicated(unlisted)]
# a1 a2 a3 b2 b3 c3
# "blue" "red" "yellow" "black" "green" "brown"
If you only want the vector (without the names), use unname
:
unlisted <- unname(unlist(examplelist))
If/Then statement in R based on duplicate values
library(tidyverse)
df %>%
rowid_to_column('rn') %>%
left_join(pivot_longer(.,-c(EntryName, rn)) %>%
group_by(rn, EntryName) %>%
count(value) %>%
filter(n>=5))
rn EntryName Team1 Team2 Team3 Team4 Team5 Team6 Team7 Team8 value n
1 1 a MIN SF SF SF ATL TOR SF SF SF 5
2 2 b MIN SF SF SF SF SF DET MIA SF 5
3 3 c MIN CWS SF MIA ATL MIA TOR SF <NA> NA
4 4 d SF SF SF SF TOR TOR SF MIN SF 5
5 5 e MIN TOR ATL ATL SF CIN DET TB <NA> NA
Look at the value
column. Second to last
How to find duplicate dates within a row in R, and then replace associated values with the mean?
This involved change the structure of the table into the long format, averaging the duplicates and then reformatting back into the desired table:
library(tidyr)
library(dplyr)
df.1 <- data.frame(ID,Gest1,Sys1,Dia1,Gest2,Sys2,Dia2,Gest3,Sys3, Dia3,Gest4,Sys4,Dia4)
#convert data to long format
longdf <- df.1 %>% pivot_longer(!ID, names_to = c(".value", "time"), names_pattern = "(\\D+)(\\d)", values_to="count")
#average duplicate rows
temp<-longdf %>% group_by(ID, Gest) %>% summarize(Sys=mean(Sys), Dia=mean(Dia)) %>% mutate(time = row_number())
#convert back to wide format
answer<-temp %>% pivot_wider(ID, names_from = time, values_from = c("Gest", "Sys", "Dia"), names_glue = "{.value}{time}")
#resort the columns
answer <-answer[ , names(df.1)]
answer
# A tibble: 3 × 13
# Groups: ID [3]
ID Gest1 Sys1 Dia1 Gest2 Sys2 Dia2 Gest3 Sys3 Dia3 Gest4 Sys4 Dia4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 27 27 120 90 29 122 89 32 123 90 33 124 94
2 46 28 126. 83.5 29 122 88 30 123 89 NA NA NA
3 72 29 124 92 30 119 84.5 32 128 80 NA NA NA
Related Topics
Subscript Out of Bounds - General Definition and Solution
How to Send an Email With Attachment from R in Windows
Check If the Number Is Integer
How to Make Consistent-Width Plots in Ggplot (With Legends)
How to Get a Vertical Geom_Vline to an X-Axis of Class Date
Change Bar Plot Colour in Geom_Bar With Ggplot2 in R
How to Subtract/Add Days From/To a Date
Why Do I Get "Warning Longer Object Length Is Not a Multiple of Shorter Object Length"
Format Numbers With Million (M) and Billion (B) Suffixes
Index Values from a Matrix Using Row, Col Indices
Count Nas Per Row in Dataframe
Rcpp Pass by Reference Vs. by Value
As.Date With Dates in Format M/D/Y in R
Get Specific Object from Rdata File
Subsetting R Data Frame Results in Mysterious Na Rows
Putting Mathematical Symbols and Subscripts Mixed With Regular Letters
Using Functions of Multiple Columns in a Dplyr Mutate_At Call