Dplyr: Put Count Occurrences into New Variable

dplyr: put count occurrences into new variable

All you need to do is group your data by both columns, "group" and "var1":

df %>% group_by(group, var1) %>% mutate(count = n())
#Source: local data frame [10 x 3]
#Groups: group, var1
#
# group var1 count
#1 1 1 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 2 1
#6 2 1 1
#7 2 2 3
#8 2 2 3
#9 2 2 3
#10 2 3 1

Edit after comment

Here's an example of how you SHOULD NOT DO IT:

df %>% group_by(group, var1) %>% do(data.frame(., count = length(.$group)))

The dplyr implementation with n() is for sure much faster, cleaner and shorter and should always be preferred over such implementations as above.

Count the number of times a value appears in a column using dplyr

Using the n() function:

x %>%
group_by(Code) %>%
mutate(Code_frequency = n()) %>%
ungroup()

Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?

This uses rowwise() and do() from dplyr but it's definitely ugly.

Not sure if there is something that can modify from this so that you get a data.frame output directly as shown over @ https://github.com/hadley/dplyr/releases.

interim_res <- df %>% 
rowwise() %>%
do(out = sapply(min(df):max(df), function(i) sum(i==.)))

interim_res <- interim_res[[1]] %>% do.call(rbind,.) %>% as.data.frame(.)

Then to get intended result:

res <- cbind(df,interim_res)

Counting occurrences based on condition in R (using dplyr ?)

If you want to see where count is 0 then you need a final complete too.

df <- structure(
list(Year = c(2000L, 2001L, 2002L, 2000L, 2001L, 2002L,2003L),
ID = c("A", "A", "A", "B", "B", "C", "C"),
Type = c(0L, 0L, 1L, 0L, 0L, 1L, 1L)
),
class = "data.frame",
row.names = c(NA,-7L)
)

df %>%
group_by(Year, Type) %>%
count() %>%
ungroup() %>%
complete(Year, Type, fill = list(n = 0))

To return

   Year  Type     n
<int> <int> <dbl>
1 2000 0 2
2 2000 1 0
3 2001 0 2
4 2001 1 0
5 2002 0 0
6 2002 1 2
7 2003 0 0
8 2003 1 1

adding a column to df that counts occurrence of a value in another column

You can use the following solution:

library(dplyr)

df %>%
group_by(id) %>%
add_count(name = "id_occurrence")

# A tibble: 10 x 3
# Groups: id [5]
id places id_occurrence
<dbl> <chr> <int>
1 204850 kitchen 3
2 204850 kitchen 3
3 204850 garden 3
4 312512 salon 2
5 312512 salon 2
6 452452 salon 1
7 285421 bathroom 1
8 758412 garden 3
9 758412 bathroom 3
10 758412 garden 3

dplyr solution to group and count occurence of selected values

Without having a complete output, it's hard to tell exactly what you're looking for but this should be close:

input %>% 
mutate_at(vars(rank), as.numeric) %>%
group_by(id) %>%
mutate(
gamewin = gamewin == "game-restart",
game = lag(cumsum(gamewin) + 1) %>% ifelse(is.na(.),1, .)
) %>%
group_by(id,game) %>%
mutate(
'gamewin_rank<10' = case_when(
gamewin ~ length(unique(genre[which(rank <= 10)])),
TRUE ~ NA_integer_),
gamewin_rank1 = case_when(
gamewin ~ length(unique(genre[which(rank == 1)])),
TRUE ~ NA_integer_)
)

A variable "game". Indicating the game number for each ID. Each new game is identified by the "gameWin" flag. When ever gameWin=="game-restart", a new game is initiated. Although there is only one ID in the sample dataset, there are many more in the real one.

We can do this easily by transforming your gamewin column to a Boolean state, which we can count and then running cumsum along it. (In general, storing 0s and then a string is going to take more memory than a Boolean column, so I recommend utilizing this elsewhere, plus you get the nice advantages of TRUEs equalling 1 for math purposes.) Since you want to add one after the gamewin pops up, we add lag, and then the ifelse(is.na(.)) bit is to address the first value being NA because of the lagging.

A variable "gamewinrank<10", that counts the cases where the variable rank equals 1 or rank<=10 for unique genres. So if "rank=1" for hip hop, twice, within the same game, it is only counted once.

I implemented what you asked for, but I'm not sure this is actually what you meant because you got a value of 3 and there are definitely five unique genres where the rank <= 10 (we don't need to evaluate rank == 1 because that's included in rank <= 10). Perhaps you could rephrase and I can edit this to help?

A variable "gamewin_rank1", that counts the cases where rank=1 for unique genres. So within a game-ID (grouped by game,id) if rank=1 for rap and rank=1 for rock, the field would output 2. Likewise, within a game-ID, rank=1 for rap, rank=1 for rock and another rank=1 for rock, the field would still output two.

Same implementation as above, restricting to rank == 1.

Edit:
Here's what I think you're actually after re: summary statistics using summarize instead of mutate as discussed in the comments:

input %>% 
mutate_at(vars(rank), as.numeric) %>%
group_by(id) %>%
mutate(
gamewin = gamewin == "game-restart",
game = lag(cumsum(gamewin) + 1) %>% ifelse(is.na(.),1, .)
) %>%
group_by(id,game) %>%
summarize(
'gamewin_rank<10' = length(unique(genre[which(rank <= 10)])),
gamewin_rank1 = length(unique(genre[which(rank == 1)])),
hiphop = sum(genre=="Hip.Hop" & rank == 1)
)

R count how many elements in a group occur in a dataframe

df %>% filter(Class != "f") %>% 
group_by(Subject) %>%
summarise(`# of occurrences` = n_distinct(Class)) %>%
group_by(`# of occurrences`) %>%
summarise(count = length(Subject),
count.from.subject = paste(Subject, collapse = ","))


Edit:

You can use also use mutate with group_by instead of summarise, which will append the same value to each element in the group:
(with complete you can extend the missing values)

df %>% 
mutate(Class = na_if(Class, "f")) %>%
group_by(Subject) %>%
summarise(`# of occurrences` = n_distinct(na.omit(Class)),
count.class = na_if(paste(sort(unique(na.omit(Class))), collapse = ","), "")) %>%
group_by(`# of occurrences`) %>%
mutate(count = n()) %>% ungroup() %>%
complete(`# of occurrences` = 0:5, fill = list(count = 0)) %>%
transmute(`# of occurrences`, count, count.from.subject = Subject, count.class)

R/dplyr: How do I count the number of unique occurrences of an observation over time without double counting?

A third option, since I'm not sure what the correct output should look like:

library(tidyverse)

df %>%
group_by(`Company Name`) %>%
distinct(`Machine Name`) %>%
mutate(count=n())

Count occurrence of string values per row in dataframe in R (dplyr)

You can use across with rowSums -

library(dplyr)

df %>% mutate(d9 = rowSums(across(all_of(cols), `%in%`, bcde)))

# d1 d2 d3 d4 d5 d6 d7 d8 d9
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#1 b a a a a a a a 0
#2 a a a a c a a a 1
#3 a b a a a a a a 1
#4 a a c a a b a a 2
#5 a a a a a a a a 0
#6 a a b a a a a a 1
#7 a a a a a d a a 1
#8 a a a d a a a a 1

This can also be written in base R -

df$d9 <- rowSums(sapply(df[cols], `%in%`, bcde))


Related Topics



Leave a reply



Submit