How to Count How Many Values Per Level in a Given Factor

How to count how many values per level in a given factor?

Or using the dplyr library:

library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>% 
  group_by(ID) %>%
  summarise(no_rows = length(ID))

Note the use of %>%, which is similar to the use of pipes in bash. Effectively, the code above pipes dat into group_by, and the result of that operation is piped into summarise.

The result is:

Source: local data frame [26 x 2]

   ID no_rows
1   a       2
2   b       3
3   c       3
4   d       3
5   e       2
6   f       4
7   g       6
8   h       1
9   i       6
10  j       5
11  k       6
12  l       4
13  m       7
14  n       2
15  o       2
16  p       2
17  q       5
18  r       4
19  s       5
20  t       3
21  u       8
22  v       4
23  w       5
24  x       4
25  y       3
26  z       1

See the dplyr introduction for some more context, and the documentation for details regarding the individual functions.

count and listing all factor levels of all factors

I think the most efficient way to do it, in terms of length of code and storing final output in a tidy format is this:

library(tidyverse)

# example data
data <- data.frame(D = rep(c("110", "111"), 3),
                   I = c(rep("2012", 3), "2014", "2013", "2013"),
                   S = rep(c("1000", "2000"), 3))

data %>%
  gather(name,value) %>%  # reshape datset
  count(name, value)      # count combinations

# # A tibble: 7 x 3
#    name value     n
#   <chr> <chr> <int>
# 1     D   110     3
# 2     D   111     3
# 3     I  2012     3
# 4     I  2013     2
# 5     I  2014     1
# 6     S  1000     3
# 7     S  2000     3

1st column represent the name of you factor variable.
2nd column has the unique values of each variable.
3rd column is the counter.

Count number of unique levels of a variable

The following should do the job:

choices <- length(unique(iris$Species))

Counting the number of factor variables in a data frame

A few problems here:

Your subscript is out of bounds problem is because df[1:5, ] is rows 1:5, whereas columns would be df[ ,1:5]. It appears that you only have 3 rows, not 5.

The second error no applicable method for 'as.quoted' applied to an object of class "function" is referring to the as.factor, which is a function. It is saying that a function doesn't belong within the function count. You can check exactly what count wants by running ?count in the console

A third problem that I see is that R will not automatically think that integers are factors. You will have to specify this with numbers. If you read in words, they are often automatically set as factors.

Here is a reproducible example:

> df<-data.frame("var1"=rnorm(3),"var2"=c(1:3),"var3"=rnorm(3),"var4"=c(3,1,2),"var5"=rnorm(3))
> str(df)

'data.frame':   3 obs. of  5 variables:
 $ var1: num  0.716 1.43 -0.726
 $ var2: int  1 2 3
 $ var3: num  0.238 -0.658 0.492
 $ var4: num  3 1 2
 $ var5: num  1.71 1.54 1.05

Here I used the structure str() function to check what type of data I have. Note, var1 is read in as an integer when I generated it as c(1:3), whereas specifying c(3,1,2) was read in as numeric in var4

Here, I will tell R I want two of the columns to be factors, and I will make another column of words, which will automatically become factors.

> df<-data.frame("var1"=rnorm(3),"var2"=as.factor(c(1:3)),"var3"=rnorm(3),"var4"=as.factor(c(3,1,2))
+                ,"var5"=rnorm(3), "var6"=c("Green","Red","Blue"))
> str(df)
'data.frame':   3 obs. of  6 variables:
 $ var1: num  -1.18 1.26 -0.53
 $ var2: Factor w/ 3 levels "1","2","3": 1 2 3
 $ var3: num  1.38 -0.401 -0.924
 $ var4: Factor w/ 3 levels "1","2","3": 3 1 2
 $ var5: num  1.688 0.547 0.727
 $ var6: Factor w/ 3 levels "Blue","Green",..: 2 3 1

You can then as which are factors:

> sapply(df, is.factor)
 var1  var2  var3  var4  var5  var6 
FALSE  TRUE FALSE  TRUE FALSE  TRUE

And if you wanted a number for how many are factors something like this would get you there:

> length(which(sapply(df, is.factor)==TRUE))
[1] 3

You have something similar: length(which(vec==as.factor)), but one problem with this is you are asking which things in the vec object are the same as a function as.factor, which doesn't make sense. So it is giving you the error Error in vec == as.factor : comparison (1) is possible only for atomic and list types

as.factor is for setting things as factor (as I have shown above), but is.factor is for asking if something is a factor, which will return a logical (TRUE vs FALSE) - also shown above.

How to count levels of a factor in a data.frame, grouped by another value of that data.frame [GNU R]

There are a large number of ways and this question is undoubtedly a duplicate. What have you tried? You can use dcast in the reshape2 pacakge.

require(reshape2)
dcast( df , Country ~ Year , length )

#  Country 1999 2000 2001
#1     GER    0    2    0
#2      UK    1    0    0
#3     USA    2    2    1

R Count number of times a level occurs in n rows

Here are a couple of options you might find useful:

a) count all entries per 5 rows and return a list:

head(lapply(split(df$test, rep(1:200, each = 5)), table), 2)
# $`1`      # <- result for rows 1:5
# 
# A B C 
# 1 0 4 
# 
# $`2`      # <- result for rows 6:10
# 
# A B C 
# 3 0 2

b) count all entries per 5 rows and return a matrix:

head(t(sapply(split(df$test, rep(1:200, each = 5)), table)), 2)
#   A B C
# 1 1 0 4
# 2 3 0 2

c) count number of As per 5 rows and return a list:

head(lapply(split(df$test == "A", rep(1:200, each = 5)), sum), 2)
# $`1`
# [1] 1
# 
# $`2`
# [1] 3

d) count number of As per 5 rows and return a vector:

head(sapply(split(df$test == "A", rep(1:200, each = 5)), sum), 2)
#1 2 
#1 3

Each of the results will be 200 entries long / have 200 rows.

Count occurrences of factor in R, with zero counts reported

You get this for free if you define your events variable correctly as a factor with the desired three levels:

R> events <- data.frame(type = factor(c('A', 'A', 'B'), c('A','B','C')), 
+                       quantity = c(1, 2, 1))
R> events
  type quantity
1    A        1
2    A        2
3    B        1
R> table(events$type)

A B C 
2 1 0 
R>

Simply calling table() on the factor already does the right thing, and ddply() can too
if you tell it not to drop:

R> ddply(events, .(type), summarise, quantity = sum(quantity), .drop=FALSE)
  type quantity
1    A        3
2    B        1
3    C        0
R>

How to Count How Many Values Per Level in a Given Factor