How to count how many values per level in a given factor?
Or using the dplyr
library:
library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>%
group_by(ID) %>%
summarise(no_rows = length(ID))
Note the use of %>%
, which is similar to the use of pipes in bash. Effectively, the code above pipes dat
into group_by
, and the result of that operation is piped into summarise
.
The result is:
Source: local data frame [26 x 2]
ID no_rows
1 a 2
2 b 3
3 c 3
4 d 3
5 e 2
6 f 4
7 g 6
8 h 1
9 i 6
10 j 5
11 k 6
12 l 4
13 m 7
14 n 2
15 o 2
16 p 2
17 q 5
18 r 4
19 s 5
20 t 3
21 u 8
22 v 4
23 w 5
24 x 4
25 y 3
26 z 1
See the dplyr
introduction for some more context, and the documentation for details regarding the individual functions.
count and listing all factor levels of all factors
I think the most efficient way to do it, in terms of length of code and storing final output in a tidy format is this:
library(tidyverse)
# example data
data <- data.frame(D = rep(c("110", "111"), 3),
I = c(rep("2012", 3), "2014", "2013", "2013"),
S = rep(c("1000", "2000"), 3))
data %>%
gather(name,value) %>% # reshape datset
count(name, value) # count combinations
# # A tibble: 7 x 3
# name value n
# <chr> <chr> <int>
# 1 D 110 3
# 2 D 111 3
# 3 I 2012 3
# 4 I 2013 2
# 5 I 2014 1
# 6 S 1000 3
# 7 S 2000 3
1st column represent the name of you factor variable.
2nd column has the unique values of each variable.
3rd column is the counter.
Count number of unique levels of a variable
The following should do the job:
choices <- length(unique(iris$Species))
Counting the number of factor variables in a data frame
A few problems here:
Your subscript is out of bounds problem
is because df[1:5, ]
is rows 1:5, whereas columns would be df[ ,1:5]
. It appears that you only have 3 rows, not 5.
The second error no applicable method for 'as.quoted' applied to
is referring to the as.factor, which is a function. It is saying that a function doesn't belong within the function
an object of class "function"count
. You can check exactly what count
wants by running ?count
in the console
A third problem that I see is that R will not automatically think that integers are factors. You will have to specify this with numbers. If you read in words, they are often automatically set as factors.
Here is a reproducible example:
> df<-data.frame("var1"=rnorm(3),"var2"=c(1:3),"var3"=rnorm(3),"var4"=c(3,1,2),"var5"=rnorm(3))
> str(df)
'data.frame': 3 obs. of 5 variables:
$ var1: num 0.716 1.43 -0.726
$ var2: int 1 2 3
$ var3: num 0.238 -0.658 0.492
$ var4: num 3 1 2
$ var5: num 1.71 1.54 1.05
Here I used the structure str()
function to check what type of data I have. Note, var1
is read in as an integer when I generated it as c(1:3)
, whereas specifying c(3,1,2)
was read in as numeric in var4
Here, I will tell R I want two of the columns to be factors, and I will make another column of words, which will automatically become factors.
> df<-data.frame("var1"=rnorm(3),"var2"=as.factor(c(1:3)),"var3"=rnorm(3),"var4"=as.factor(c(3,1,2))
+ ,"var5"=rnorm(3), "var6"=c("Green","Red","Blue"))
> str(df)
'data.frame': 3 obs. of 6 variables:
$ var1: num -1.18 1.26 -0.53
$ var2: Factor w/ 3 levels "1","2","3": 1 2 3
$ var3: num 1.38 -0.401 -0.924
$ var4: Factor w/ 3 levels "1","2","3": 3 1 2
$ var5: num 1.688 0.547 0.727
$ var6: Factor w/ 3 levels "Blue","Green",..: 2 3 1
You can then as which are factors:
> sapply(df, is.factor)
var1 var2 var3 var4 var5 var6
FALSE TRUE FALSE TRUE FALSE TRUE
And if you wanted a number for how many are factors something like this would get you there:
> length(which(sapply(df, is.factor)==TRUE))
[1] 3
You have something similar: length(which(vec==as.factor))
, but one problem with this is you are asking which things in the vec
object are the same as a function as.factor
, which doesn't make sense. So it is giving you the error Error in vec == as.factor :
comparison (1) is possible only for atomic and list types
as.factor
is for setting things as factor (as I have shown above), but is.factor
is for asking if something is a factor, which will return a logical (TRUE vs FALSE) - also shown above.
How to count levels of a factor in a data.frame, grouped by another value of that data.frame [GNU R]
There are a large number of ways and this question is undoubtedly a duplicate. What have you tried? You can use dcast
in the reshape2
pacakge.
require(reshape2)
dcast( df , Country ~ Year , length )
# Country 1999 2000 2001
#1 GER 0 2 0
#2 UK 1 0 0
#3 USA 2 2 1
R Count number of times a level occurs in n rows
Here are a couple of options you might find useful:
a) count all entries per 5 rows and return a list:
head(lapply(split(df$test, rep(1:200, each = 5)), table), 2)
# $`1` # <- result for rows 1:5
#
# A B C
# 1 0 4
#
# $`2` # <- result for rows 6:10
#
# A B C
# 3 0 2
b) count all entries per 5 rows and return a matrix:
head(t(sapply(split(df$test, rep(1:200, each = 5)), table)), 2)
# A B C
# 1 1 0 4
# 2 3 0 2
c) count number of A
s per 5 rows and return a list:
head(lapply(split(df$test == "A", rep(1:200, each = 5)), sum), 2)
# $`1`
# [1] 1
#
# $`2`
# [1] 3
d) count number of A
s per 5 rows and return a vector:
head(sapply(split(df$test == "A", rep(1:200, each = 5)), sum), 2)
#1 2
#1 3
Each of the results will be 200 entries long / have 200 rows.
Count occurrences of factor in R, with zero counts reported
You get this for free if you define your events
variable correctly as a factor with the desired three levels:
R> events <- data.frame(type = factor(c('A', 'A', 'B'), c('A','B','C')),
+ quantity = c(1, 2, 1))
R> events
type quantity
1 A 1
2 A 2
3 B 1
R> table(events$type)
A B C
2 1 0
R>
Simply calling table()
on the factor already does the right thing, and ddply()
can too
if you tell it not to drop
:
R> ddply(events, .(type), summarise, quantity = sum(quantity), .drop=FALSE)
type quantity
1 A 3
2 B 1
3 C 0
R>
Related Topics
Transparent Equivalent of Given Color
How to Extract Elements from a List with Mixed Elements
Colorize Clusters in Dendogram with Ggplot2
Understanding Lexical Scoping in R
How to Convert a String in a Function into an Object
Read.Table() and Read.CSV Both Error in Rmd
How to Get Xtabs to Calculate Means Instead of Sums in R
How Do {{}} Double Curly Brackets Work in Dplyr
Convert Column in Data.Frame to Date
How to Change a Single Value in a Data.Frame
Replacing Nas in R with Nearest Value
R 3.4.1 "Single Candle" Personal Library Path Error: Unable to Create 'Na'
How to Change Font Size of the Correlation Coefficient in Corrplot
Connecting Points with Lines in Ggplot2 in R
How to Set Attributes for a Variable in R