Using dplyr for frequency counts of interactions, must include zero counts
Here's a simple option, using data.table
instead:
library(data.table)
dt = as.data.table(your_df)
setkey(dt, id, date)
# in versions 1.9.3+
dt[CJ(unique(id), unique(date)), .N, by = .EACHI]
# id date N
# 1: Andrew13 2006-08-03 0
# 2: Andrew13 2007-09-11 1
# 3: Andrew13 2008-06-12 0
# 4: Andrew13 2008-10-11 0
# 5: Andrew13 2009-07-03 0
# 6: John12 2006-08-03 1
# 7: John12 2007-09-11 0
# 8: John12 2008-06-12 0
# 9: John12 2008-10-11 0
#10: John12 2009-07-03 0
#11: Lisa825 2006-08-03 0
#12: Lisa825 2007-09-11 0
#13: Lisa825 2008-06-12 0
#14: Lisa825 2008-10-11 0
#15: Lisa825 2009-07-03 1
#16: Tom2993 2006-08-03 0
#17: Tom2993 2007-09-11 0
#18: Tom2993 2008-06-12 1
#19: Tom2993 2008-10-11 1
#20: Tom2993 2009-07-03 0
In versions 1.9.2 or before the equivalent expression omits the explicit by
:
dt[CJ(unique(id), unique(date)), .N]
The idea is to create all possible pairs of id
and date
(which is what the CJ
part does), and then merge it back, counting occurrences.
Relative frequencies / proportions with dplyr
Try this:
mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))
# am gear n freq
# 1 0 3 15 0.7894737
# 2 0 4 4 0.2105263
# 3 1 4 8 0.6153846
# 4 1 5 5 0.3846154
From the dplyr vignette:
When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.
Thus, after the summarise
, the last grouping variable specified in group_by
, 'gear', is peeled off. In the mutate
step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups
.
The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by
call. You may wish to do a subsequent group_by(am)
, to make your code more explicit.
For rounding and prettification, please refer to the nice answer by @Tyler Rinker.
Is there a way to show the zero-counts by using dplyr on sample data?
You can do a left join:
library(dplyr)
numbofrunsperside %>%
left_join(
sampledata_hit_counts,
by = c("StartPos", "Direction"),
suffix = c("_runs", "_hits")
) %>%
mutate(
p_test = ifelse(is.na(n_hits), 0, n_hits) / n_runs
) %>%
pull(p_test)
#[1] 0.2000000 0.0000000 0.0000000 0.1666667 0.0000000 0.0000000 0.3333333 0.1428571 0.0000000 0.1250000 0.1666667 0.5000000 0.2000000
#[14] 0.4000000 0.1666667 0.0000000 0.0000000 0.3333333 0.5000000 0.0000000
Create data of frequency of interactions between variables using R
Using data.table
, you can probably do something like:
library(data.table)
#convert into data.table
setDT(B1)
#create interaction between animals in the same location & month
ans <- B1[, if (.N > 1L) transpose(combn(unique(Animal), 2L, simplify=FALSE)),
by=.(Location, Month)]
#change column names to desired column names
setnames(ans, paste0("V", 1L:2L), paste0("Animal", 1L:2L))
#sort animals so that A, B and B, A are the same
ans[, paste0("Animal", 1L:2L) := .(pmin(Animal1, Animal2), pmax(Animal1, Animal2))]
#count the number of interactions as requested
ans[, .(NumInteract=.N), by=c(paste0("Animal", 1L:2L))]
output:
Animal1 Animal2 NumInteract
1: A B 1
2: A D 1
3: B D 3
4: C D 2
5: A C 1
6: D E 1
7: B C 1
count frequency by year with dplyr (conditional count)
Here is another tidyverse
method. Simply speaking, we would pivot the dataframe from wide to long and then summarize. Frist summarization gets rid of all the other non-"A"
s. Second summarization condenses the result table into unique bins identified by each toolA
and produces a count
.
library(dplyr)
library(tidyr)
df %>%
mutate(value = +(Tool == "A")) %>%
pivot_wider(names_from = Year, values_fill = 0L) %>%
group_by(ID) %>%
summarize(across(-Tool, sum)) %>%
group_by(toolA = rowSums(across(-ID))) %>%
summarize(count = n(), across(-c(ID, count), sum))
Output
# A tibble: 4 x 5
toolA count `2000` `2001` `2002`
<dbl> <int> <int> <int> <int>
1 0 1 0 0 0
2 1 2 1 0 1
3 2 1 0 1 1
4 3 1 1 1 1
Using R - frequency counts with variable binwidths and factors
The following snippet should do what you want:
I loaded your sample into df
.
library("dplyr")
df %>% group_by(sample.type, leaf.side, canopy, treatment) %>%
dplyr::select(Feret) %>%
do(data.frame(table(cut(.$Feret, breaks=bins, include.lowest=T))))
I refer you to the dplyr documentation. In short, x %>% f
is f(x)
and x -> f(a)
is f(x,a)
.
Note that dplyr::select
is just select
, but I have had namespace issue so many times that now I always specify the package.
table(cut(df$Feret, breaks=bins))
is just a nicer way to do what you did with hist
. With cut
, you create a factor variable (Remember to add include.lowest=T if your values can reach the lower bound) and with table
, you count the frequency of each level.
This gives:
sample.type leaf.side canopy treatment Var1 Freq
1 flower upper top green (0.01,0.03] 0
2 flower upper top green (0.03,0.1] 6
3 flower upper top green (0.1,0.3] 1
4 flower upper top green (0.3,1] 0
5 flower upper top green (1,3] 1
6 flower upper top green (3,10] 3
7 flower upper top white (0.01,0.03] 4
8 flower upper top white (0.03,0.1] 4
9 flower upper top white (0.1,0.3] 0
10 flower upper top white (0.3,1] 0
11 flower upper top white (1,3] 0
12 flower upper top white (3,10] 3
13 leaf lower bottom white (0.01,0.03] 5
14 leaf lower bottom white (0.03,0.1] 4
15 leaf lower bottom white (0.1,0.3] 1
16 leaf lower bottom white (0.3,1] 1
17 leaf lower bottom white (1,3] 0
18 leaf lower bottom white (3,10] 0
19 leaf lower top grey (0.01,0.03] 10
20 leaf lower top grey (0.03,0.1] 1
21 leaf lower top grey (0.1,0.3] 0
22 leaf lower top grey (0.3,1] 0
23 leaf lower top grey (1,3] 0
24 leaf lower top grey (3,10] 0
25 leaf upper bottom white (0.01,0.03] 4
26 leaf upper bottom white (0.03,0.1] 6
27 leaf upper bottom white (0.1,0.3] 1
28 leaf upper bottom white (0.3,1] 0
29 leaf upper bottom white (1,3] 0
30 leaf upper bottom white (3,10] 0
31 leaf upper top blue (0.01,0.03] 10
32 leaf upper top blue (0.03,0.1] 0
33 leaf upper top blue (0.1,0.3] 0
34 leaf upper top blue (0.3,1] 0
35 leaf upper top blue (1,3] 1
36 leaf upper top blue (3,10] 0
(Actually, it doesn't print like this since this is a tbl, but you can use print.data.frame to print a tbl the old way.)
From here it should be straightforward to extract the info you want.
How to get frequency counts using column breaks by row?
One more solution based on base R rle
library(dplyr)
dat %>% group_by(name) %>%
summarise(ever_inv = length(with(rle(srvc_inv), lengths[values==1])))
# A tibble: 1 x 2
name ever_inv
<fct> <int>
1 Bob 2
Cross tabulation of co-occuring pairs of variables
Since the columns are binary 1 or 0, you can also do this by multiplyting the columns together, which will result in 1 only if both columns are equal to 1, then summing
out <- sapply(df, function(x) colSums(df*x))
diag(out) <- NA
out
# var.1 var.2 var.3 var.4
# var.1 NA 1 1 1
# var.2 1 NA 2 1
# var.3 1 2 NA 2
# var.4 1 1 2 NA
or using matrix multiplication
out <- t(df) %*% as.matrix(df)
diag(out) <- NA
out
# var.1 var.2 var.3 var.4
# var.1 NA 1 1 1
# var.2 1 NA 2 1
# var.3 1 2 NA 2
# var.4 1 1 2 NA
Related Topics
How to Create a Time-Spiral Graph Using R
Shiny: Plot Results in Popup Window
R Text Mining Documents from CSV File (One Row Per Doc)
R Draw Kmeans Clustering with Heatmap
How to Use 'Facet' to Create Multiple Density Plot in Ggplot
Add New Variable to List of Data Frames with Purrr and Mutate() from Dplyr
Roll Your Own Linked List/Tree in R
How to Calculate Wind Direction from U and V Wind Components in R
What Is the Correct/Standard Way to Check If Difference Is Smaller Than MAChine Precision
How to Sort Files List by Date
Setting Seed Locally (Not Globally) in R
R: Import CSV with Column Names That Contain Spaces
R Graphs: Creating Tufte's Horizontal Bar Lines
How to Properly Use Functions from Other Packages in a R Package