Frequency Count of Two Column in R

Frequency count of two column in R

If your data is dataframe df with columns y and m

library(plyr)
counts <- ddply(df, .(df$y, df$m), nrow)
names(counts) <- c("y", "m", "Freq")

Frequency count based on two columns

You could also use the dcast function from the reshape2 package to get the desired result:

library(reshape2)
dat.new <- dcast(dat, cell ~ sport, fun.aggregate = length)

this wil result in the following dataframe:

> dat.new
cell football gym tennis
1 A1 1 0 1
2 A2 0 0 1
3 A3 0 1 0

An extended and optimized dcast function is also available in the data.table package.


A dplyr/tidyr alternative:

library(dplyr)
library(tidyr)

dat.new <- dat %>%
group_by(cell, sport) %>%
tally() %>%
spread(sport, n, fill = 0)

giving you the same result.

Frequency count based on two columns in r

In base R, use ave :

df$freq <- with(df, ave(d, cumsum(d == 0), FUN = length))
df

# o d freq
#1 a 0.0 1
#2 a 0.0 2
#3 a 1.0 2
#4 a 0.0 3
#5 a 0.3 3
#6 a 0.6 3
#7 a 0.0 5
#8 a 1.0 5
#9 a 2.0 5
#10 a 3.0 5
#11 a 4.0 5
#12 a 0.0 1
#13 b 0.0 2
#14 b 1.0 2
#15 b 0.0 1

With dplyr :

library(dplyr)
df %>% add_count(grp = cumsum(d == 0))

How do I get the sum of frequency count based on two columns?

We can use count

library(dplyr)
df1 %>%
filter(!is.na(Medal)) %>%
count(Team)
# A tibble: 2 x 2
# Team n
# <fct> <int>
#1 Australia 2
#2 United States 2

Count frequency of same value in several columns

You can useunlist() and table() to get the overall counts. Wrapping it in data.frame() will give you the desired two column output.

clg <- data.frame(date=1:3, 
X1=c("nor", "swe", "alg"),
X2=c("swe", "alg", "jpn"))

data.frame(table(unlist(clg[c("X1", "X2")])))
# Var1 Freq
# 1 alg 2
# 2 nor 1
# 3 swe 2
# 4 jpn 1

Compare and count the frequency of pairs of entries in two columns

With tidyverse, you can arrive at this answers using usual group_by operations.

Sample data

I'm creating column names to make it easier to convert to tibble.

set.seed(123)
M <- matrix(sample(0:5, 100, TRUE),
sample(0:5, 100, TRUE),
ncol = 2,
nrow = 100,
dimnames = list(NULL, c("colA", "colB")))

Solution

library("tidyverse") 

as_tibble(M) %>%
arrange(colA, colB) %>%
group_by(colA, colB) %>%
summarise(num_pairs = n(), .groups = "drop") %>%
pivot_wider(names_from = colB, values_from = num_pairs) %>%
remove_rownames()

Preview

# A tibble: 6 x 7
colA `0` `1` `2` `4` `5` `3`
<int> <int> <int> <int> <int> <int> <int>
1 0 4 4 4 2 4 NA
2 1 2 2 4 6 2 NA
3 2 6 4 NA 2 6 NA
4 3 2 NA NA 4 6 2
5 4 NA 2 6 NA 2 4
6 5 6 2 4 4 2 2

Comments

You have asked:

I mean that I want to know how many pairs (0,0) , (0,1), ...,
(0,5),... (5,5) are in both columns?

This answer gives you that, the question is how important is for you to have your results stored as a matrix? You can convert the results further into matrix by using as.matrix on what you get. Likely, I would stop after summarise(num_pairs = n(), .groups = "drop") as that gives very usable results, easy to subset join and so forth.

Get frequency of values for multiple columns using dplyr?

I've assumed column d row 3 is a typo and .5. really is 0.5, in which case you could do the following:


library(tidyr)
library(dplyr)

df %>%
pivot_longer(everything()) %>%
group_by(name, value) %>%
summarise(count = n()) %>%
arrange(name, desc(value))

# or more succinctly as pointed out by @LMc

df %>%
pivot_longer(everything()) %>%
count(name, value) %>%
arrange(name, desc(value))

#> # A tibble: 15 x 3
#> name value count
#> <chr> <dbl> <int>
#> 1 a 1 2
#> 2 a 0.5 1
#> 3 a 0 2
#> 4 b 1 3
#> 5 b 0 2
#> 6 c 1 2
#> 7 c 0.5 1
#> 8 c 0 2
#> 9 d 1 2
#> 10 d 0.5 1
#> 11 d 0 1
#> 12 d NA 1
#> 13 e 1 3
#> 14 e 0.5 1
#> 15 e 0 1

data

df <- structure(list(a = c(1, 0.5, 1, 0, 0), b = c(0, 1, 1, 0, 1), 
c = c(0, 1, 0.5, 1, 0), d = c(1, 0, 0.5, NA, 1),
e = c(1, 1, 0, 1, 0.5)), class = "data.frame", row.names = c(NA,
-5L))


Created on 2021-04-13 by the reprex package (v2.0.0)



Related Topics



Leave a reply



Submit