Count unique values of a column by pairwise combinations of another column in R
Here is a data.table
way to solve the problem. Use combn
function to pick up all possible combinations of Code and then count ID for each unique CodeComb
:
library(data.table)
setDT(df)[, .(CodeComb = sapply(combn(Code, 2, simplify = F),
function(cmb) paste(sort(cmb), collapse = ", "))), .(ID)]
# list all combinations of Code for each ID
[, .(IdCount = .N), .(CodeComb)]
# count number of unique id for each code combination
# CodeComb IdCount
# 1: A, B 2
# 2: A, C 2
# 3: B, C 3
# 4: B, D 3
# 5: C, D 2
# 6: A, D 1
Count unique values of a column by pairwise combinations of two other columns in R
dt[, .(Count = uniqueN(Analys)), by = .(CUSIP, Fdate)]
# CUSIP Fdate Count
# 1: 1 2000-12-31 2
# 2: 1 2001-12-31 2
# 3: 2 2000-12-31 3
# 4: 2 2001-12-31 2
The example you linked in the question was overly complicated because it used pairwise combinations of a single column --- it had to match up a column with itself in every possible way. You want unique observations by group, and it happens that your group is defined by 2 columns. It's a much simpler problem.
Count unique combinations in and summarize other columns in new one
We could use return as a list
library(data.table)
dt[, .(N = .N, new_col = .(d)), by = .(a, b, c)]
a b c N new_col
<char> <char> <char> <int> <list>
1: 1a 1b 1c 2 n1,n2
2: 2a 2b 2c 4 n1,n2,n3,n4
count the frequency of all pairwise combinations by group
I did this by using expand.grid
to make every combination, then join on what you already made, then fill in the unmatched rows with zero. I also renamed your count to n.
have2 = have %>%
full_join(have, by="group") %>%
group_by(item.x, item.y) %>%
summarise(n = length(unique(group))) %>%
filter(item.x!=item.y) %>%
mutate(item = paste(item.x, item.y, sep=", "))
combos = expand.grid(item.x = unique(have$item),
item.y = unique(have$item)) %>%
filter(as.numeric(item.x) < as.numeric(item.y)) %>%
mutate(item = paste(item.x, item.y, sep = ', ')) %>%
arrange(item.x, item.y) %>%
left_join(have2) %>%
mutate(n = replace(n, is.na(n), 0))
R Calculate sum of values by unique column PAIRS (B-A and A-B) while keeping both pairs
This may also be done with pmin/pmax
to create a grouping column
library(dplyr)
library(stringr)
df1 %>%
group_by(Date, grp = str_c(pmin(ID1, ID2), pmax(ID1, ID2))) %>%
mutate(Sum = sum(Count)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 6 × 5
Date ID1 ID2 Count Sum
<chr> <chr> <chr> <int> <int>
1 12-1 A B 1 2
2 12-1 B A 1 2
3 12-1 D E 1 3
4 12-1 E D 2 3
5 12-2 Y Z 2 5
6 12-2 Z Y 3 5
data
df1 <- structure(list(Date = c("12-1", "12-1", "12-1", "12-1", "12-2",
"12-2"), ID1 = c("A", "B", "D", "E", "Y", "Z"), ID2 = c("B",
"A", "E", "D", "Z", "Y"), Count = c(1L, 1L, 1L, 2L, 2L, 3L)),
class = "data.frame", row.names = c(NA,
-6L))
How to count frequency of unique pair combinations from a column of comma-separated values?
You can try with this
Input:
df <- read.table(text = "combo startts endts
A,B 02:20 02:23
A,B,D 02:23 02:25
A,C 02:27 02:28", header = TRUE)
Solution:
# user defined functions
pastecollapse <- function(...) paste(..., collapse = "")
sortedcomb2collapse <- function(x) combn(sort(x), m = 2, FUN = pastecollapse)
# get combos
combos <- strsplit(df$combo, split = ",")
# all possible combos
allcombos <- sortedcomb2collapse(unique(unlist(combos)))
# existing combos
mycombos <- unlist(lapply(combos, sortedcomb2collapse))
# count combos (show missing combos)
as.data.frame(table(combo = factor(mycombos, levels = allcombos)), responseName = "count")
#> combo count
#> 1 AB 2
#> 2 AC 1
#> 3 AD 1
#> 4 BC 0
#> 5 BD 1
#> 6 CD 0
Similarly, with tidyverse
:
library(tidyr)
library(dplyr)
df_sep <- df %>% separate_rows(combo)
allcombos <- df_sep %>% pull(combo) %>% unique %>% sortedcomb2collapse
df_sep %>%
group_by(startts, endts) %>%
summarise(combo = sortedcomb2collapse(combo), .groups = "drop") %>%
mutate(combo = factor(combo, levels = allcombos)) %>%
count(combo, name = "count", .drop = FALSE)
#> # A tibble: 6 x 2
#> combo count
#> <fct> <int>
#> 1 AB 2
#> 2 AC 1
#> 3 AD 1
#> 4 BC 0
#> 5 BD 1
#> 6 CD 0
Note: in your expected output one possible combination was missing (CD
). Was it a mistake?
count unique combinations of values
count
in plyr
package will do that task.
> df
ID value.1 value.2 value.3 value.4
1 1 M D F A
2 2 F M G B
3 3 M D F A
4 4 L D E B
> library(plyr)
> count(df[, -1])
value.1 value.2 value.3 value.4 freq
1 F M G B 1
2 L D E B 1
3 M D F A 2
Related Topics
R + Ggplot2: How to Hide Missing Dates from X-Axis
Use Sprintf() to Add Trailing Zeros
Ggplot2: Creating Themed Title, Subtitle with Cowplot
Date-Time Differences Between Rows in R
Coerce Logical (Boolean) Vector to 0 and 1
R - Replace Specific Value Contents with Na
Install.Packages R on Ubuntu 12.04 Downloads But Does Not Install Packages
Large Integers in Data.Table. Grouping Results Different in 1.9.2 Compared to 1.8.10
How to Retrieve the Client's Current Time and Time Zone When Using Shiny
Add a Dynamic Value into Rmysql Getquery
How to Specify the Size/Layout of a Single Plot to Match a Certain Grid in R
Predict Out of Sample on Fixed Effects Model
R Xts: .001 Millisecond in Index
Scraping Leaderboard Table on Golf Website in R
How to Get the Second Sub Element of Every Element in a List
Ddply + Summarize for Repeating Same Statistical Function Across Large Number of Columns
How to Specify Names of Columns for X and Y When Joining in Dplyr