Find how many times duplicated rows repeat in R data frame
Here is solution using function ddply()
from library plyr
library(plyr)
ddply(df,.(a,b),nrow)
a b V1
1 1 2.5 1
2 1 3.5 2
3 2 2.0 2
4 3 1.0 1
5 4 2.2 1
6 4 7.0 1
Counting Number of Times Each Row is Duplicated in R
With dplyr
, we could group by all columns:
dat %>%
group_by(across(everything())) %>%
mutate(n = n())
# # A tibble: 5 x 5
# # Groups: SSN, Name, Age, Gender [3]
# SSN Name Age Gender n
# <dbl> <chr> <dbl> <dbl> <int>
# 1 204 Blossum 7 0 2
# 2 401 Buttercup 8 0 2
# 3 204 Blossum 7 0 2
# 4 666 MojoJojo 43 1 1
# 5 401 Buttercup 8 0 2
(mutate(n = n())
is has a shortcut, add_tally()
, if you prefer. Use summarize(n = n()
or count()
if you want to collapse the data frame to the unique rows while adding counts)
getting a count of how many times a value in a column is duplicated
If we need to create a count column, use add_count
df %>%
add_count(name, name = "new_count")
-output
address name other count new_count
1 123 fake st joey 1 2 2
2 124 fake st rachel 1 1 1
3 125 fake st ross 1 3 3
4 126 fake st chandler 2 1 1
5 123 jerry st monika 2 1 1
6 124 road rd joey 3 2 2
7 125 tiny rd ross 4 3 3
8 126 cool r ross 4 3 3
group_size
returns only the summary count
group_size(group_by(df,name))
[1] 1 2 1 1 3
Count the number of duplicate for a column
If we need to count the total number of duplicates
sum(table(df1$date)-1)
#[1] 5
Suppose, we need the count of each date, one option would be to group by 'date' and get the number of rows. This can be done with data.table
.
library(data.table)
setDT(df1)[, .N, date]
Finding duplicates in a dataframe and returning count of each duplicate record
We can use group_by_all
to group by all columns and then remove the ones which are not duplicates by selecting rows which have count > 1.
library(dplyr)
df %>%
group_by_all() %>%
count() %>%
filter(n > 1)
# col1 col2 col3 n
# <fct> <fct> <fct> <int>
#1 A B B 2
#2 A B C 3
Find duplicate values in R
You could use table
, i.e.
n_occur <- data.frame(table(vocabulary$id))
gives you a data frame with a list of id
s and the number of times they occurred.
n_occur[n_occur$Freq > 1,]
tells you which id
s occurred more than once.
vocabulary[vocabulary$id %in% n_occur$Var1[n_occur$Freq > 1],]
returns the records with more than one occurrence.
R: Repeating row of dataframe with respect to multiple count columns
Here is a tidyverse
option. We can use uncount
from tidyr
to duplicate the rows according to the count in value
(i.e., from the var
columns) after pivoting to long format.
library(tidyverse)
df %>%
pivot_longer(starts_with("var"), names_to = "class") %>%
filter(value != 0) %>%
uncount(value) %>%
mutate(class = str_extract(class, "\\d+"))
Output
f1 f2 class
<chr> <chr> <chr>
1 a c 1
2 a c 3
3 a c 3
4 a c 3
5 b d 1
6 b d 2
7 b d 2
Another slight variation is to use expandrows
from splitstackshape
in conjunction with tidyverse
.
library(splitstackshape)
df %>%
pivot_longer(starts_with("var"), names_to = "class") %>%
filter(value != 0) %>%
expandRows("value") %>%
mutate(class = str_extract(class, "\\d+"))
Repeat rows of a data.frame
df <- data.frame(a = 1:2, b = letters[1:2])
df[rep(seq_len(nrow(df)), each = 2), ]
Repeat rows making each repeated rows following the original rows and assign new variables for each row
You can repeat each row twice and repeat c('origin', 'destination')
for each row.
In base R, you can do -
transform(df[rep(seq(nrow(df)), each = 2), ], type = c('origin', 'destination'))
Or in tidyverse
-
library(dplyr)
library(tidyr)
df %>%
uncount(2) %>%
mutate(type = rep(c('origin', 'destination'), length.out = n()))
# a b type
#1 1 1 origin
#2 1 1 destination
#3 2 2 origin
#4 2 2 destination
#5 3 3 origin
#6 3 3 destination
Related Topics
Starting Shiny App After Password Input
Manually Setting Group Colors For Ggplot2
Levels≪-'( What Sorcery Is This
Unlist Data Frame Column Preserving Information from Other Column
Measuring Function Execution Time in R
Subset Data Frame Based on Multiple Conditions
How to Delete Rows from a Dataframe That Contain N*Na
What Do Hjust and Vjust Do When Making a Plot Using Ggplot
Alternate, Interweave or Interlace Two Vectors
Data.Table "Key Indices" or "Group Counter"
How to Remove All Whitespace from a String
How to Use an Image as a Point in Ggplot
Create a Group Number For Each Consecutive Sequence
Efficient Way to Rbind Data.Frames With Different Columns
How to Remove Outliers from a Dataset
Test If Characters Are in a String