Tidyr how to spread into count of occurrence
These are a few ways of many to go about it:
1) With library dplyr
, you can simply group things and count into the format needed:
library(dplyr)
other %>% group_by(name) %>% summarise(N = sum(result == 'N'), Y = sum(result == 'Y'))
Source: local data frame [4 x 3]
name N Y
<fctr> <int> <int>
1 a 0 2
2 b 1 0
3 c 0 1
4 d 1 0
2) You can use a combination of table
and tidyr
spread as follows:
library(tidyr)
spread(as.data.frame(table(other)), result, Freq)
name N Y
1 a 0 2
2 b 1 0
3 c 0 1
4 d 1 0
3) You can use a combination of dplyr
and tidyr
to do as follows:
library(dplyr)
library(tidyr)
spread(count(other, name, result), result, n, fill = 0)
Source: local data frame [4 x 3]
Groups: name [4]
name N Y
<fctr> <dbl> <dbl>
1 a 0 2
2 b 1 0
3 c 0 1
4 d 1 0
Pivot_wider and count number of occurrences retaining all other columns
You can count them first:
library(dplyr)
library(tidyr)
df = data.frame(product= c(1,1,1,2,2,2,3,3,3),
type = c(rep("a",3),rep("b", 3),rep("a",3)),
issue = c("fall","fall","trap",
"trap", "jump", "fall",
"trap", "jump", "open"))
df %>%
count(product, type, issue) %>%
pivot_wider(names_from = issue,
values_from = n)
#> # A tibble: 3 × 6
#> product type fall trap jump open
#> <dbl> <chr> <int> <int> <int> <int>
#> 1 1 a 2 1 NA NA
#> 2 2 b 1 1 1 NA
#> 3 3 a NA 1 1 1
Created on 2022-05-25 by the reprex package (v2.0.1)
R count instances of a value in one column, put counts in new columns
We can get the frequency with count
and then spread
it to 'wide' format
library(tidyverse)
df %>%
count(agestrat, gender) %>%
spread(gender, n, fill = 0)
Count the number of times two values appear in a column based on the unique values of another column
We can use count
and spread
to get the df format and use fill = 0
in spread
to fill in the 0s:
library(tidyverse)
YG %>%
group_by(year) %>%
count(gender) %>%
spread(gender, n, fill = 0)
Output:
# A tibble: 3 x 3
# Groups: year [3]
year F M
<fct> <dbl> <dbl>
1 2000 1 2
2 2001 0 1
3 2002 1 0
How can I count the total number of occurrences at time step t of an element?
There is a function called rle
which seems like it might be nice fit here. This answer is almost all base
R, but I couldn't resist dipping into the tidyverse
for map_dfr
.
library(purrr)
df_rle <- map_dfr(row.names(df), function(x){
r <- rle(rev(df[x,]))
rev(r$lengths * r$values * 10)
})
tab <- table(stack(df_rle), exclude = c('0', NA))
tab
#---------
ind
values t1 t2 t3
10 2 0 0
20 0 0 1
30 0 2 0
40 1 0 0
If you wanted to represent these as percentages
sweep(tab,2, colSums(tab), '/')
#--------
ind
values t1 t2 t3
10 0.6666667 0.0000000 0.0000000
20 0.0000000 0.0000000 1.0000000
30 0.0000000 1.0000000 0.0000000
40 0.3333333 0.0000000 0.0000000
Break it down a bit
The rle
function when run on a given row gets us pretty close to the desired output. As an example row 5 from df
.
r <- rle(c(1,0,1,1))
r
#-----
Run Length Encoding
lengths: int [1:3] 1 1 2
values : num [1:3] 1 0 1
# Multiply to get time periods
r$lengths * r$values * 10
# -----
[1] 10 0 20
But with a dataframe row as input, values are linked to the last time period not the first. So we reverse the order in which we feed the row to the rle
function, and then un-reverse (?) the results.
r <- rle(df[5,])
r$values
#------
t1 t2 t4
5 1 0 1
# Reverse before we feed into rle(), reverse the output
r <- rle(rev(df[5,]))
rev(r$lengths * r$values * 10)
#-----------
t1 t2 t3
5 10 0 20
Then we need to do this reversed rle function on each row. This answer uses purrr::map_dfr()
, which maps the function to each row, then row binds the results together into a single dataframe.
df_rle <- map_dfr(row.names(df), function(x){
r <- rle(rev(df[x,]))
rev(r$lengths * r$values * 10)
})
#-----
t1 t2 t3
1 10 0 NA
2 40 NA NA
3 0 30 NA
4 0 30 NA
5 10 0 20
From here, we need to count the values by the 10 minute duration categories. There are several ways to do this. Here is one way in which we first convert df_rle
to a long form two column dataframe using stack
, then use the table
function to tabulate by duration levels.
tab <- table(stack(df_rle), exclude = c('0', NA))
#--------
ind
values t1 t2 t3
10 2 0 0
20 0 0 1
30 0 2 0
40 1 0 0
To convert to percentages (assuming you mean duration as a percent of each time period) you can divide by the original number of measurements, which here is equal to colSums
. sweep
can apply the value rowwise (1) or column-wise (2), using the function divide '/'
.
sweep(tab, 2, colSums(tab), '/')
Count occurrences of factors across multiple columns in grouped dataframe
You can stack col1
& col2
together, count the number of each combination, and then transform the table to a wide form.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(col1:col2) %>%
count(grp, name, value) %>%
pivot_wider(grp, names_from = c(name, value), names_sort = TRUE,
values_from = n, values_fill = 0)
# A tibble: 3 x 6
grp col1_A col1_B col2_B col2_C col2_D
<chr> <int> <int> <int> <int> <int>
1 a 1 2 2 0 1
2 b 2 0 0 2 0
3 c 1 2 0 2 1
A base
solution (Thank @GKi to refine the code):
table(cbind(df["grp"], col=do.call(paste0, stack(df[-1])[2:1])))
col
grp col1A col1B col2B col2C col2D
a 1 2 2 0 1
b 2 0 0 2 0
c 1 2 0 2 1
count occurrences in column and group by id
Here's a tidyverse
solution.
# Create data frame
df <- read.table(text = " user_id webpage
1 google
1 bing
2 google
2 google
2 yahoo", header = TRUE)
# Load libraries
library(dplyr)
library(tibble)
library(tidyr)
# Count and restructure
as_tibble(table(df)) %>% spread(webpage, n)
#> # A tibble: 2 x 4
#> user_id bing google yahoo
#> <chr> <int> <int> <int>
#> 1 1 1 1 0
#> 2 2 0 2 1
Created on 2019-05-13 by the reprex package (v0.2.1)
R count occurrences of an element by groups
Using data.table
library(data.table)
setDT(mydata)
mydata[, myorder := 1:.N, by = .(group, letter)]
The by
argument makes the table be dealt with within the groups of the column called A
. .N
is the number of rows within that group (if the by
argument was empty it would be the number of rows in the table), so for each sub-table, each row is indexed from 1 to the number of rows in that sub-table.
mydata
group letter myorder
1: x A 1
2: x A 2
3: x A 3
4: x B 1
5: y B 1
6: y A 1
7: y A 2
or a dplyr
solution which is pretty much the same
mydata %>%
group_by(group, letter) %>%
mutate(myorder = 1:n())
Count occurrences in column in R
You may apply sort()
and head()
to the output of table()
, e.g.,
head(sort(table(Data$Tags), decreasing=TRUE), 10)
Related Topics
How to Specify a Dynamic Position for the Start of Substring
Tidyr How to Spread into Count of Occurrence
Lm Function in R Does Not Give Coefficients for All Factor Levels in Categorical Data
Subset Xts Object by Time of Day
How to Read CSV File in R Where Some Values Contain the Percent Symbol (%)
Error in New.Session():Could Not Establish Session After 5 Attempts
Scraping with Rvest - Complete with Nas When Tag Is Not Present
Add a Column with Count of Nas and Mean
What Is About the First Column in R's Dataset Mtcars
Set One or More of Coefficients to a Specific Integer
Find Rows in a Data Frame Where Two Columns Are Equal
Count How Many Values in Some Cells of a Row Are Not Na (In R)
Reading Multiple Files into Multiple Data Frames