dplyr: put count occurrences into new variable
All you need to do is group your data by both columns, "group" and "var1":
df %>% group_by(group, var1) %>% mutate(count = n())
#Source: local data frame [10 x 3]
#Groups: group, var1
#
# group var1 count
#1 1 1 4
#2 1 1 4
#3 1 1 4
#4 1 1 4
#5 1 2 1
#6 2 1 1
#7 2 2 3
#8 2 2 3
#9 2 2 3
#10 2 3 1
Edit after comment
Here's an example of how you SHOULD NOT DO IT:
df %>% group_by(group, var1) %>% do(data.frame(., count = length(.$group)))
The dplyr implementation with n()
is for sure much faster, cleaner and shorter and should always be preferred over such implementations as above.
Count the number of times a value appears in a column using dplyr
Using the n()
function:
x %>%
group_by(Code) %>%
mutate(Code_frequency = n()) %>%
ungroup()
Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?
This uses rowwise()
and do()
from dplyr
but it's definitely ugly.
Not sure if there is something that can modify from this so that you get a data.frame output directly as shown over @ https://github.com/hadley/dplyr/releases.
interim_res <- df %>%
rowwise() %>%
do(out = sapply(min(df):max(df), function(i) sum(i==.)))
interim_res <- interim_res[[1]] %>% do.call(rbind,.) %>% as.data.frame(.)
Then to get intended result:
res <- cbind(df,interim_res)
Counting occurrences based on condition in R (using dplyr ?)
If you want to see where count is 0 then you need a final complete
too.
df <- structure(
list(Year = c(2000L, 2001L, 2002L, 2000L, 2001L, 2002L,2003L),
ID = c("A", "A", "A", "B", "B", "C", "C"),
Type = c(0L, 0L, 1L, 0L, 0L, 1L, 1L)
),
class = "data.frame",
row.names = c(NA,-7L)
)
df %>%
group_by(Year, Type) %>%
count() %>%
ungroup() %>%
complete(Year, Type, fill = list(n = 0))
To return
Year Type n
<int> <int> <dbl>
1 2000 0 2
2 2000 1 0
3 2001 0 2
4 2001 1 0
5 2002 0 0
6 2002 1 2
7 2003 0 0
8 2003 1 1
adding a column to df that counts occurrence of a value in another column
You can use the following solution:
library(dplyr)
df %>%
group_by(id) %>%
add_count(name = "id_occurrence")
# A tibble: 10 x 3
# Groups: id [5]
id places id_occurrence
<dbl> <chr> <int>
1 204850 kitchen 3
2 204850 kitchen 3
3 204850 garden 3
4 312512 salon 2
5 312512 salon 2
6 452452 salon 1
7 285421 bathroom 1
8 758412 garden 3
9 758412 bathroom 3
10 758412 garden 3
dplyr solution to group and count occurence of selected values
Without having a complete output, it's hard to tell exactly what you're looking for but this should be close:
input %>%
mutate_at(vars(rank), as.numeric) %>%
group_by(id) %>%
mutate(
gamewin = gamewin == "game-restart",
game = lag(cumsum(gamewin) + 1) %>% ifelse(is.na(.),1, .)
) %>%
group_by(id,game) %>%
mutate(
'gamewin_rank<10' = case_when(
gamewin ~ length(unique(genre[which(rank <= 10)])),
TRUE ~ NA_integer_),
gamewin_rank1 = case_when(
gamewin ~ length(unique(genre[which(rank == 1)])),
TRUE ~ NA_integer_)
)
A variable "game". Indicating the game number for each ID. Each new game is identified by the "gameWin" flag. When ever gameWin=="game-restart", a new game is initiated. Although there is only one ID in the sample dataset, there are many more in the real one.
We can do this easily by transforming your gamewin column to a Boolean state, which we can count and then running cumsum along it. (In general, storing 0s and then a string is going to take more memory than a Boolean column, so I recommend utilizing this elsewhere, plus you get the nice advantages of TRUEs equalling 1 for math purposes.) Since you want to add one after the gamewin pops up, we add lag, and then the ifelse(is.na(.))
bit is to address the first value being NA because of the lagging.
A variable "gamewinrank<10", that counts the cases where the variable rank equals 1 or rank<=10 for unique genres. So if "rank=1" for hip hop, twice, within the same game, it is only counted once.
I implemented what you asked for, but I'm not sure this is actually what you meant because you got a value of 3 and there are definitely five unique genres where the rank <= 10 (we don't need to evaluate rank == 1 because that's included in rank <= 10). Perhaps you could rephrase and I can edit this to help?
A variable "gamewin_rank1", that counts the cases where rank=1 for unique genres. So within a game-ID (grouped by game,id) if rank=1 for rap and rank=1 for rock, the field would output 2. Likewise, within a game-ID, rank=1 for rap, rank=1 for rock and another rank=1 for rock, the field would still output two.
Same implementation as above, restricting to rank == 1.
Edit:
Here's what I think you're actually after re: summary statistics using summarize instead of mutate as discussed in the comments:
input %>%
mutate_at(vars(rank), as.numeric) %>%
group_by(id) %>%
mutate(
gamewin = gamewin == "game-restart",
game = lag(cumsum(gamewin) + 1) %>% ifelse(is.na(.),1, .)
) %>%
group_by(id,game) %>%
summarize(
'gamewin_rank<10' = length(unique(genre[which(rank <= 10)])),
gamewin_rank1 = length(unique(genre[which(rank == 1)])),
hiphop = sum(genre=="Hip.Hop" & rank == 1)
)
R count how many elements in a group occur in a dataframe
df %>% filter(Class != "f") %>%
group_by(Subject) %>%
summarise(`# of occurrences` = n_distinct(Class)) %>%
group_by(`# of occurrences`) %>%
summarise(count = length(Subject),
count.from.subject = paste(Subject, collapse = ","))
Edit:
You can use also use mutate
with group_by
instead of summarise
, which will append the same value to each element in the group:
(with complete you can extend the missing values)
df %>%
mutate(Class = na_if(Class, "f")) %>%
group_by(Subject) %>%
summarise(`# of occurrences` = n_distinct(na.omit(Class)),
count.class = na_if(paste(sort(unique(na.omit(Class))), collapse = ","), "")) %>%
group_by(`# of occurrences`) %>%
mutate(count = n()) %>% ungroup() %>%
complete(`# of occurrences` = 0:5, fill = list(count = 0)) %>%
transmute(`# of occurrences`, count, count.from.subject = Subject, count.class)
R/dplyr: How do I count the number of unique occurrences of an observation over time without double counting?
A third option, since I'm not sure what the correct output should look like:
library(tidyverse)
df %>%
group_by(`Company Name`) %>%
distinct(`Machine Name`) %>%
mutate(count=n())
Count occurrence of string values per row in dataframe in R (dplyr)
You can use across
with rowSums
-
library(dplyr)
df %>% mutate(d9 = rowSums(across(all_of(cols), `%in%`, bcde)))
# d1 d2 d3 d4 d5 d6 d7 d8 d9
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#1 b a a a a a a a 0
#2 a a a a c a a a 1
#3 a b a a a a a a 1
#4 a a c a a b a a 2
#5 a a a a a a a a 0
#6 a a b a a a a a 1
#7 a a a a a d a a 1
#8 a a a d a a a a 1
This can also be written in base R -
df$d9 <- rowSums(sapply(df[cols], `%in%`, bcde))
Related Topics
Link Selectinput with Sliderinput in Shiny
Remove Data.Frame Row Names When Using Xtable
Clipping Raster Using Shapefile in R, But Keeping the Geometry of the Shapefile
Random Forest with Classes That Are Very Unbalanced
How to Put Exact Number of Decimal Places on Label Ggplot Bar Chart
Join Data.Table on Exact Date or If Not the Case on the Nearest Less Than Date
Ggmap with Geom_Map Superimposed
Arrange N Ggplots into Lower Triangle Matrix Shape
How to Create Base R Plot 'Type = B' Equivalent in Ggplot2
Selection of Activity Trace in a Chart and Display in a Data Table in R Shiny
Transform Only One Axis to Log10 Scale with Ggplot2
Shaded Area Under Two Curves Using R
R Convert Between Zoo Object and Data Frame, Results Inconsistent for Different Numbers of Columns
Using Lapply to Change Column Names of a List of Data Frames
How to Not Display Number as Exponent
How to Expand an Ellipsis (...) Argument Without Evaluating It in R