Frequency count of two column in R
If your data is dataframe df
with columns y
and m
library(plyr)
counts <- ddply(df, .(df$y, df$m), nrow)
names(counts) <- c("y", "m", "Freq")
Frequency count based on two columns
You could also use the dcast
function from the reshape2
package to get the desired result:
library(reshape2)
dat.new <- dcast(dat, cell ~ sport, fun.aggregate = length)
this wil result in the following dataframe:
> dat.new
cell football gym tennis
1 A1 1 0 1
2 A2 0 0 1
3 A3 0 1 0
An extended and optimized dcast
function is also available in the data.table
package.
A dplyr
/tidyr
alternative:
library(dplyr)
library(tidyr)
dat.new <- dat %>%
group_by(cell, sport) %>%
tally() %>%
spread(sport, n, fill = 0)
giving you the same result.
Frequency count based on two columns in r
In base R, use ave
:
df$freq <- with(df, ave(d, cumsum(d == 0), FUN = length))
df
# o d freq
#1 a 0.0 1
#2 a 0.0 2
#3 a 1.0 2
#4 a 0.0 3
#5 a 0.3 3
#6 a 0.6 3
#7 a 0.0 5
#8 a 1.0 5
#9 a 2.0 5
#10 a 3.0 5
#11 a 4.0 5
#12 a 0.0 1
#13 b 0.0 2
#14 b 1.0 2
#15 b 0.0 1
With dplyr
:
library(dplyr)
df %>% add_count(grp = cumsum(d == 0))
How do I get the sum of frequency count based on two columns?
We can use count
library(dplyr)
df1 %>%
filter(!is.na(Medal)) %>%
count(Team)
# A tibble: 2 x 2
# Team n
# <fct> <int>
#1 Australia 2
#2 United States 2
Count frequency of same value in several columns
You can useunlist()
and table()
to get the overall counts. Wrapping it in data.frame()
will give you the desired two column output.
clg <- data.frame(date=1:3,
X1=c("nor", "swe", "alg"),
X2=c("swe", "alg", "jpn"))
data.frame(table(unlist(clg[c("X1", "X2")])))
# Var1 Freq
# 1 alg 2
# 2 nor 1
# 3 swe 2
# 4 jpn 1
Compare and count the frequency of pairs of entries in two columns
With tidyverse
, you can arrive at this answers using usual group_by
operations.
Sample data
I'm creating column names to make it easier to convert to tibble
.
set.seed(123)
M <- matrix(sample(0:5, 100, TRUE),
sample(0:5, 100, TRUE),
ncol = 2,
nrow = 100,
dimnames = list(NULL, c("colA", "colB")))
Solution
library("tidyverse")
as_tibble(M) %>%
arrange(colA, colB) %>%
group_by(colA, colB) %>%
summarise(num_pairs = n(), .groups = "drop") %>%
pivot_wider(names_from = colB, values_from = num_pairs) %>%
remove_rownames()
Preview
# A tibble: 6 x 7
colA `0` `1` `2` `4` `5` `3`
<int> <int> <int> <int> <int> <int> <int>
1 0 4 4 4 2 4 NA
2 1 2 2 4 6 2 NA
3 2 6 4 NA 2 6 NA
4 3 2 NA NA 4 6 2
5 4 NA 2 6 NA 2 4
6 5 6 2 4 4 2 2
Comments
You have asked:
I mean that I want to know how many pairs (0,0) , (0,1), ...,
(0,5),... (5,5) are in both columns?
This answer gives you that, the question is how important is for you to have your results stored as a matrix? You can convert the results further into matrix by using as.matrix
on what you get. Likely, I would stop after summarise(num_pairs = n(), .groups = "drop")
as that gives very usable results, easy to subset join and so forth.
Get frequency of values for multiple columns using dplyr?
I've assumed column d row 3 is a typo and .5. really is 0.5, in which case you could do the following:
library(tidyr)
library(dplyr)
df %>%
pivot_longer(everything()) %>%
group_by(name, value) %>%
summarise(count = n()) %>%
arrange(name, desc(value))
# or more succinctly as pointed out by @LMc
df %>%
pivot_longer(everything()) %>%
count(name, value) %>%
arrange(name, desc(value))
#> # A tibble: 15 x 3
#> name value count
#> <chr> <dbl> <int>
#> 1 a 1 2
#> 2 a 0.5 1
#> 3 a 0 2
#> 4 b 1 3
#> 5 b 0 2
#> 6 c 1 2
#> 7 c 0.5 1
#> 8 c 0 2
#> 9 d 1 2
#> 10 d 0.5 1
#> 11 d 0 1
#> 12 d NA 1
#> 13 e 1 3
#> 14 e 0.5 1
#> 15 e 0 1
data
df <- structure(list(a = c(1, 0.5, 1, 0, 0), b = c(0, 1, 1, 0, 1),
c = c(0, 1, 0.5, 1, 0), d = c(1, 0, 0.5, NA, 1),
e = c(1, 1, 0, 1, 0.5)), class = "data.frame", row.names = c(NA,
-5L))
Created on 2021-04-13 by the reprex package (v2.0.0)
Related Topics
Select Groups With More Than One Distinct Value
Split Violin Plot With Ggplot2
Create a Group Number For Each Consecutive Sequence
How to Number/Label Data-Table by Group-Number from Group_By
Yaml Current Date in Rmarkdown
How to Install Packages in Latest Version of Rstudio and R Version.3.1.1
R Stacked Percentage Bar Plot With Percentage of Binary Factor and Labels (With Ggplot)
Change the Spacing of Tick Marks on the Axis of a Plot
Painless Way to Install a New Version of R
Manually Setting Group Colors For Ggplot2
R Spreading Multiple Columns With Tidyr
Multiple Plots in For Loop Ignoring Par
Read All Files in a Folder and Apply a Function to Each Data Frame
How to Put Labels Over Geom_Bar For Each Bar in R With Ggplot2
R Install.Packages Returns "Failed to Create Lock Directory"