Count Unique Values of a Column by Pairwise Combinations of Another Column in R

Count unique values of a column by pairwise combinations of another column in R

Here is a data.table way to solve the problem. Use combn function to pick up all possible combinations of Code and then count ID for each unique CodeComb:

library(data.table)
setDT(df)[, .(CodeComb = sapply(combn(Code, 2, simplify = F),
function(cmb) paste(sort(cmb), collapse = ", "))), .(ID)]
# list all combinations of Code for each ID
[, .(IdCount = .N), .(CodeComb)]
# count number of unique id for each code combination

# CodeComb IdCount
# 1: A, B 2
# 2: A, C 2
# 3: B, C 3
# 4: B, D 3
# 5: C, D 2
# 6: A, D 1

Count unique values of a column by pairwise combinations of two other columns in R

dt[, .(Count = uniqueN(Analys)), by = .(CUSIP, Fdate)]
# CUSIP Fdate Count
# 1: 1 2000-12-31 2
# 2: 1 2001-12-31 2
# 3: 2 2000-12-31 3
# 4: 2 2001-12-31 2

The example you linked in the question was overly complicated because it used pairwise combinations of a single column --- it had to match up a column with itself in every possible way. You want unique observations by group, and it happens that your group is defined by 2 columns. It's a much simpler problem.

Count unique combinations in and summarize other columns in new one

We could use return as a list

library(data.table)
dt[, .(N = .N, new_col = .(d)), by = .(a, b, c)]
a b c N new_col
<char> <char> <char> <int> <list>
1: 1a 1b 1c 2 n1,n2
2: 2a 2b 2c 4 n1,n2,n3,n4

count the frequency of all pairwise combinations by group

I did this by using expand.grid to make every combination, then join on what you already made, then fill in the unmatched rows with zero. I also renamed your count to n.

have2 = have %>% 
full_join(have, by="group") %>%
group_by(item.x, item.y) %>%
summarise(n = length(unique(group))) %>%
filter(item.x!=item.y) %>%
mutate(item = paste(item.x, item.y, sep=", "))

combos = expand.grid(item.x = unique(have$item),
item.y = unique(have$item)) %>%
filter(as.numeric(item.x) < as.numeric(item.y)) %>%
mutate(item = paste(item.x, item.y, sep = ', ')) %>%
arrange(item.x, item.y) %>%
left_join(have2) %>%
mutate(n = replace(n, is.na(n), 0))

R Calculate sum of values by unique column PAIRS (B-A and A-B) while keeping both pairs

This may also be done with pmin/pmax to create a grouping column

library(dplyr)
library(stringr)
df1 %>%
group_by(Date, grp = str_c(pmin(ID1, ID2), pmax(ID1, ID2))) %>%
mutate(Sum = sum(Count)) %>%
ungroup %>%
select(-grp)

-output

# A tibble: 6 × 5
Date ID1 ID2 Count Sum
<chr> <chr> <chr> <int> <int>
1 12-1 A B 1 2
2 12-1 B A 1 2
3 12-1 D E 1 3
4 12-1 E D 2 3
5 12-2 Y Z 2 5
6 12-2 Z Y 3 5

data

df1 <- structure(list(Date = c("12-1", "12-1", "12-1", "12-1", "12-2", 
"12-2"), ID1 = c("A", "B", "D", "E", "Y", "Z"), ID2 = c("B",
"A", "E", "D", "Z", "Y"), Count = c(1L, 1L, 1L, 2L, 2L, 3L)),
class = "data.frame", row.names = c(NA,
-6L))

How to count frequency of unique pair combinations from a column of comma-separated values?

You can try with this

Input:

df <- read.table(text = "combo      startts endts
A,B 02:20 02:23
A,B,D 02:23 02:25
A,C 02:27 02:28", header = TRUE)

Solution:

# user defined functions
pastecollapse <- function(...) paste(..., collapse = "")
sortedcomb2collapse <- function(x) combn(sort(x), m = 2, FUN = pastecollapse)

# get combos
combos <- strsplit(df$combo, split = ",")

# all possible combos
allcombos <- sortedcomb2collapse(unique(unlist(combos)))

# existing combos
mycombos <- unlist(lapply(combos, sortedcomb2collapse))

# count combos (show missing combos)
as.data.frame(table(combo = factor(mycombos, levels = allcombos)), responseName = "count")

#> combo count
#> 1 AB 2
#> 2 AC 1
#> 3 AD 1
#> 4 BC 0
#> 5 BD 1
#> 6 CD 0

Similarly, with tidyverse:

library(tidyr)
library(dplyr)

df_sep <- df %>% separate_rows(combo)
allcombos <- df_sep %>% pull(combo) %>% unique %>% sortedcomb2collapse

df_sep %>%
group_by(startts, endts) %>%
summarise(combo = sortedcomb2collapse(combo), .groups = "drop") %>%
mutate(combo = factor(combo, levels = allcombos)) %>%
count(combo, name = "count", .drop = FALSE)
#> # A tibble: 6 x 2
#> combo count
#> <fct> <int>
#> 1 AB 2
#> 2 AC 1
#> 3 AD 1
#> 4 BC 0
#> 5 BD 1
#> 6 CD 0

Note: in your expected output one possible combination was missing (CD). Was it a mistake?

count unique combinations of values

count in plyr package will do that task.

> df
ID value.1 value.2 value.3 value.4
1 1 M D F A
2 2 F M G B
3 3 M D F A
4 4 L D E B
> library(plyr)
> count(df[, -1])
value.1 value.2 value.3 value.4 freq
1 F M G B 1
2 L D E B 1
3 M D F A 2


Related Topics



Leave a reply



Submit