How to Count How Many Values Per Level in a Given Factor

How to count how many values per level in a given factor?

Or using the dplyr library:

library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>%
group_by(ID) %>%
summarise(no_rows = length(ID))

Note the use of %>%, which is similar to the use of pipes in bash. Effectively, the code above pipes dat into group_by, and the result of that operation is piped into summarise.

The result is:

Source: local data frame [26 x 2]

ID no_rows
1 a 2
2 b 3
3 c 3
4 d 3
5 e 2
6 f 4
7 g 6
8 h 1
9 i 6
10 j 5
11 k 6
12 l 4
13 m 7
14 n 2
15 o 2
16 p 2
17 q 5
18 r 4
19 s 5
20 t 3
21 u 8
22 v 4
23 w 5
24 x 4
25 y 3
26 z 1

See the dplyr introduction for some more context, and the documentation for details regarding the individual functions.

count and listing all factor levels of all factors

I think the most efficient way to do it, in terms of length of code and storing final output in a tidy format is this:

library(tidyverse)

# example data
data <- data.frame(D = rep(c("110", "111"), 3),
I = c(rep("2012", 3), "2014", "2013", "2013"),
S = rep(c("1000", "2000"), 3))

data %>%
gather(name,value) %>% # reshape datset
count(name, value) # count combinations

# # A tibble: 7 x 3
# name value n
# <chr> <chr> <int>
# 1 D 110 3
# 2 D 111 3
# 3 I 2012 3
# 4 I 2013 2
# 5 I 2014 1
# 6 S 1000 3
# 7 S 2000 3

1st column represent the name of you factor variable.
2nd column has the unique values of each variable.
3rd column is the counter.

Count number of unique levels of a variable

The following should do the job:

choices <- length(unique(iris$Species))

Counting the number of factor variables in a data frame

A few problems here:

Your subscript is out of bounds problem is because df[1:5, ] is rows 1:5, whereas columns would be df[ ,1:5]. It appears that you only have 3 rows, not 5.

The second error no applicable method for 'as.quoted' applied to
an object of class "function"
is referring to the as.factor, which is a function. It is saying that a function doesn't belong within the function count. You can check exactly what count wants by running ?count in the console

A third problem that I see is that R will not automatically think that integers are factors. You will have to specify this with numbers. If you read in words, they are often automatically set as factors.

Here is a reproducible example:

> df<-data.frame("var1"=rnorm(3),"var2"=c(1:3),"var3"=rnorm(3),"var4"=c(3,1,2),"var5"=rnorm(3))
> str(df)

'data.frame': 3 obs. of 5 variables:
$ var1: num 0.716 1.43 -0.726
$ var2: int 1 2 3
$ var3: num 0.238 -0.658 0.492
$ var4: num 3 1 2
$ var5: num 1.71 1.54 1.05

Here I used the structure str() function to check what type of data I have. Note, var1 is read in as an integer when I generated it as c(1:3), whereas specifying c(3,1,2) was read in as numeric in var4

Here, I will tell R I want two of the columns to be factors, and I will make another column of words, which will automatically become factors.

> df<-data.frame("var1"=rnorm(3),"var2"=as.factor(c(1:3)),"var3"=rnorm(3),"var4"=as.factor(c(3,1,2))
+ ,"var5"=rnorm(3), "var6"=c("Green","Red","Blue"))
> str(df)
'data.frame': 3 obs. of 6 variables:
$ var1: num -1.18 1.26 -0.53
$ var2: Factor w/ 3 levels "1","2","3": 1 2 3
$ var3: num 1.38 -0.401 -0.924
$ var4: Factor w/ 3 levels "1","2","3": 3 1 2
$ var5: num 1.688 0.547 0.727
$ var6: Factor w/ 3 levels "Blue","Green",..: 2 3 1

You can then as which are factors:

> sapply(df, is.factor)
var1 var2 var3 var4 var5 var6
FALSE TRUE FALSE TRUE FALSE TRUE

And if you wanted a number for how many are factors something like this would get you there:

> length(which(sapply(df, is.factor)==TRUE))
[1] 3

You have something similar: length(which(vec==as.factor)), but one problem with this is you are asking which things in the vec object are the same as a function as.factor, which doesn't make sense. So it is giving you the error Error in vec == as.factor :
comparison (1) is possible only for atomic and list types

as.factor is for setting things as factor (as I have shown above), but is.factor is for asking if something is a factor, which will return a logical (TRUE vs FALSE) - also shown above.

How to count levels of a factor in a data.frame, grouped by another value of that data.frame [GNU R]

There are a large number of ways and this question is undoubtedly a duplicate. What have you tried? You can use dcast in the reshape2 pacakge.

require(reshape2)
dcast( df , Country ~ Year , length )

# Country 1999 2000 2001
#1 GER 0 2 0
#2 UK 1 0 0
#3 USA 2 2 1

R Count number of times a level occurs in n rows

Here are a couple of options you might find useful:

a) count all entries per 5 rows and return a list:

head(lapply(split(df$test, rep(1:200, each = 5)), table), 2)
# $`1` # <- result for rows 1:5
#
# A B C
# 1 0 4
#
# $`2` # <- result for rows 6:10
#
# A B C
# 3 0 2

b) count all entries per 5 rows and return a matrix:

head(t(sapply(split(df$test, rep(1:200, each = 5)), table)), 2)
# A B C
# 1 1 0 4
# 2 3 0 2

c) count number of As per 5 rows and return a list:

head(lapply(split(df$test == "A", rep(1:200, each = 5)), sum), 2)
# $`1`
# [1] 1
#
# $`2`
# [1] 3

d) count number of As per 5 rows and return a vector:

head(sapply(split(df$test == "A", rep(1:200, each = 5)), sum), 2)
#1 2
#1 3

Each of the results will be 200 entries long / have 200 rows.

Count occurrences of factor in R, with zero counts reported

You get this for free if you define your events variable correctly as a factor with the desired three levels:

R> events <- data.frame(type = factor(c('A', 'A', 'B'), c('A','B','C')), 
+ quantity = c(1, 2, 1))
R> events
type quantity
1 A 1
2 A 2
3 B 1
R> table(events$type)

A B C
2 1 0
R>

Simply calling table() on the factor already does the right thing, and ddply() can too
if you tell it not to drop:

R> ddply(events, .(type), summarise, quantity = sum(quantity), .drop=FALSE)
type quantity
1 A 3
2 B 1
3 C 0
R>


Related Topics



Leave a reply



Submit