Tidyr How to Spread into Count of Occurrence

Tidyr how to spread into count of occurrence

These are a few ways of many to go about it:

1) With library dplyr, you can simply group things and count into the format needed:

library(dplyr)
other %>% group_by(name) %>% summarise(N = sum(result == 'N'), Y = sum(result == 'Y'))
Source: local data frame [4 x 3]

name N Y
<fctr> <int> <int>
1 a 0 2
2 b 1 0
3 c 0 1
4 d 1 0

2) You can use a combination of table and tidyr spread as follows:

library(tidyr)
spread(as.data.frame(table(other)), result, Freq)
name N Y
1 a 0 2
2 b 1 0
3 c 0 1
4 d 1 0

3) You can use a combination of dplyr and tidyr to do as follows:

library(dplyr)
library(tidyr)
spread(count(other, name, result), result, n, fill = 0)
Source: local data frame [4 x 3]
Groups: name [4]

name N Y
<fctr> <dbl> <dbl>
1 a 0 2
2 b 1 0
3 c 0 1
4 d 1 0

Pivot_wider and count number of occurrences retaining all other columns

You can count them first:

library(dplyr)
library(tidyr)

df = data.frame(product= c(1,1,1,2,2,2,3,3,3),
type = c(rep("a",3),rep("b", 3),rep("a",3)),
issue = c("fall","fall","trap",
"trap", "jump", "fall",
"trap", "jump", "open"))
df %>%
count(product, type, issue) %>%
pivot_wider(names_from = issue,
values_from = n)
#> # A tibble: 3 × 6
#> product type fall trap jump open
#> <dbl> <chr> <int> <int> <int> <int>
#> 1 1 a 2 1 NA NA
#> 2 2 b 1 1 1 NA
#> 3 3 a NA 1 1 1

Created on 2022-05-25 by the reprex package (v2.0.1)

R count instances of a value in one column, put counts in new columns

We can get the frequency with count and then spread it to 'wide' format

library(tidyverse)
df %>%
count(agestrat, gender) %>%
spread(gender, n, fill = 0)

Count the number of times two values appear in a column based on the unique values of another column

We can use count and spread to get the df format and use fill = 0 in spread to fill in the 0s:

library(tidyverse)
YG %>%
group_by(year) %>%
count(gender) %>%
spread(gender, n, fill = 0)

Output:

# A tibble: 3 x 3
# Groups: year [3]
year F M
<fct> <dbl> <dbl>
1 2000 1 2
2 2001 0 1
3 2002 1 0

How can I count the total number of occurrences at time step t of an element?

There is a function called rle which seems like it might be nice fit here. This answer is almost all base R, but I couldn't resist dipping into the tidyverse for map_dfr.

library(purrr)

df_rle <- map_dfr(row.names(df), function(x){
r <- rle(rev(df[x,]))
rev(r$lengths * r$values * 10)
})

tab <- table(stack(df_rle), exclude = c('0', NA))
tab
#---------
ind
values t1 t2 t3
10 2 0 0
20 0 0 1
30 0 2 0
40 1 0 0

If you wanted to represent these as percentages

sweep(tab,2, colSums(tab), '/')

#--------
ind
values t1 t2 t3
10 0.6666667 0.0000000 0.0000000
20 0.0000000 0.0000000 1.0000000
30 0.0000000 1.0000000 0.0000000
40 0.3333333 0.0000000 0.0000000


Break it down a bit

The rle function when run on a given row gets us pretty close to the desired output. As an example row 5 from df.

r <- rle(c(1,0,1,1))
r
#-----
Run Length Encoding
lengths: int [1:3] 1 1 2
values : num [1:3] 1 0 1


# Multiply to get time periods
r$lengths * r$values * 10
# -----
[1] 10 0 20

But with a dataframe row as input, values are linked to the last time period not the first. So we reverse the order in which we feed the row to the rle function, and then un-reverse (?) the results.

r <- rle(df[5,])
r$values
#------
t1 t2 t4
5 1 0 1

# Reverse before we feed into rle(), reverse the output
r <- rle(rev(df[5,]))
rev(r$lengths * r$values * 10)
#-----------
t1 t2 t3
5 10 0 20

Then we need to do this reversed rle function on each row. This answer uses purrr::map_dfr(), which maps the function to each row, then row binds the results together into a single dataframe.

df_rle <- map_dfr(row.names(df), function(x){
r <- rle(rev(df[x,]))
rev(r$lengths * r$values * 10)
})


#-----
t1 t2 t3
1 10 0 NA
2 40 NA NA
3 0 30 NA
4 0 30 NA
5 10 0 20

From here, we need to count the values by the 10 minute duration categories. There are several ways to do this. Here is one way in which we first convert df_rle to a long form two column dataframe using stack, then use the table function to tabulate by duration levels.

tab <- table(stack(df_rle), exclude = c('0', NA))

#--------
ind
values t1 t2 t3
10 2 0 0
20 0 0 1
30 0 2 0
40 1 0 0

To convert to percentages (assuming you mean duration as a percent of each time period) you can divide by the original number of measurements, which here is equal to colSums. sweep can apply the value rowwise (1) or column-wise (2), using the function divide '/'.

sweep(tab, 2, colSums(tab), '/')

Count occurrences of factors across multiple columns in grouped dataframe

You can stack col1 & col2 together, count the number of each combination, and then transform the table to a wide form.

library(dplyr)
library(tidyr)

df %>%
pivot_longer(col1:col2) %>%
count(grp, name, value) %>%
pivot_wider(grp, names_from = c(name, value), names_sort = TRUE,
values_from = n, values_fill = 0)

# A tibble: 3 x 6
grp col1_A col1_B col2_B col2_C col2_D
<chr> <int> <int> <int> <int> <int>
1 a 1 2 2 0 1
2 b 2 0 0 2 0
3 c 1 2 0 2 1

A base solution (Thank @GKi to refine the code):

table(cbind(df["grp"], col=do.call(paste0, stack(df[-1])[2:1])))

col
grp col1A col1B col2B col2C col2D
a 1 2 2 0 1
b 2 0 0 2 0
c 1 2 0 2 1

count occurrences in column and group by id

Here's a tidyverse solution.

# Create data frame
df <- read.table(text = " user_id webpage
1 google
1 bing
2 google
2 google
2 yahoo", header = TRUE)

# Load libraries
library(dplyr)
library(tibble)
library(tidyr)

# Count and restructure
as_tibble(table(df)) %>% spread(webpage, n)
#> # A tibble: 2 x 4
#> user_id bing google yahoo
#> <chr> <int> <int> <int>
#> 1 1 1 1 0
#> 2 2 0 2 1

Created on 2019-05-13 by the reprex package (v0.2.1)

R count occurrences of an element by groups

Using data.table

library(data.table)
setDT(mydata)
mydata[, myorder := 1:.N, by = .(group, letter)]

The by argument makes the table be dealt with within the groups of the column called A. .N is the number of rows within that group (if the by argument was empty it would be the number of rows in the table), so for each sub-table, each row is indexed from 1 to the number of rows in that sub-table.

mydata
group letter myorder
1: x A 1
2: x A 2
3: x A 3
4: x B 1
5: y B 1
6: y A 1
7: y A 2

or a dplyr solution which is pretty much the same

mydata %>% 
group_by(group, letter) %>%
mutate(myorder = 1:n())

Count occurrences in column in R

You may apply sort() and head() to the output of table(), e.g.,

head(sort(table(Data$Tags), decreasing=TRUE), 10)


Related Topics



Leave a reply



Submit