Adding Counts of a Factor to a Dataframe

Adding counts of a factor to a dataframe

using jmsigner's data you could do:

dt$count <- ave(dt$school, dt$school,  FUN = length)

Matching and Adding Factor Counts in R Data Frames

Load data with stringsAsFactors=FALSE:

df <- read.csv(header=TRUE, text="id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null", stringsAsFactors=FALSE)

# check to see if columns are factors
sapply(df, class)
# id obs country
# "character" "integer" "character"

Remove all rows with country = null

df <- df[df$country != "null", ]

Then you can use plyr package with summarise to get the desired result as follows:

ddply(df, .(id), summarise, tot_obs=sum(obs), tot_country=length(unique(country)))
# id tot_obs tot_country
# 1 A 12 2
# 2 B 3 1
# 3 C 5 2

How to count factors frequency and organize in a new dataframe in R

test1$year <- as.numeric(as.character(test1$year))

test1 %>% filter(between(year,2016,2018))
%>% group_by(id)
%>% summarize(times = n(),
year = toString(unique(year)))

id times year
<fct> <int> <chr>
1 FC01 3 2018 2017 2016
2 FC03 1 2018

Notes:

  • Getting the times column is easy, we just use the utility function dplyr::n().
  • For the pasted list of (unique) string names of years, same approach as this answer. toString(...) is cleaner code than paste0(as.character(...), collapse=' ')
  • Note we must use unique(year) as you might have multiple entries for same year.
  • In order to be able to filter(between(year, 2016, 2018)), we must first fix up year to be numeric, not a factor (or at minimum, make sure the factor levels are also 2015..2018 so that directly doing as.numeric() works as intended, instead of giving 1..4

Create columns from factors and count

You only need to make some slight modification to your code. You should use .(Name) instead of c("Name"):

ddply(df1, .(Name), summarise,
Score_1 = sum(Score == 1),
Score_2 = sum(Score == 2),
Score_3 = sum(Score == 3))

gives:

  Name Score_1 Score_2 Score_3
1 Ben 1 1 0
2 John 1 1 1

Other possibilities include:

1. table(df1) as @alexis_laz mentioned in the comments, this gives:

> table(df1)
Score
Name 1 2 3
Ben 1 1 0
John 1 1 1

2. The dcast function of the reshape2 package (or data.table which has the same dcast function):

library(reshape2) # or library(data.table)
dcast(df1, Name ~ paste0("Score_", Score), fun.aggregate = length)

gives:

  Name Score_1 Score_2 Score_3
1 Ben 1 1 0
2 John 1 1 1

Counting the number of factor variables in a data frame

A few problems here:

Your subscript is out of bounds problem is because df[1:5, ] is rows 1:5, whereas columns would be df[ ,1:5]. It appears that you only have 3 rows, not 5.

The second error no applicable method for 'as.quoted' applied to
an object of class "function"
is referring to the as.factor, which is a function. It is saying that a function doesn't belong within the function count. You can check exactly what count wants by running ?count in the console

A third problem that I see is that R will not automatically think that integers are factors. You will have to specify this with numbers. If you read in words, they are often automatically set as factors.

Here is a reproducible example:

> df<-data.frame("var1"=rnorm(3),"var2"=c(1:3),"var3"=rnorm(3),"var4"=c(3,1,2),"var5"=rnorm(3))
> str(df)

'data.frame': 3 obs. of 5 variables:
$ var1: num 0.716 1.43 -0.726
$ var2: int 1 2 3
$ var3: num 0.238 -0.658 0.492
$ var4: num 3 1 2
$ var5: num 1.71 1.54 1.05

Here I used the structure str() function to check what type of data I have. Note, var1 is read in as an integer when I generated it as c(1:3), whereas specifying c(3,1,2) was read in as numeric in var4

Here, I will tell R I want two of the columns to be factors, and I will make another column of words, which will automatically become factors.

> df<-data.frame("var1"=rnorm(3),"var2"=as.factor(c(1:3)),"var3"=rnorm(3),"var4"=as.factor(c(3,1,2))
+ ,"var5"=rnorm(3), "var6"=c("Green","Red","Blue"))
> str(df)
'data.frame': 3 obs. of 6 variables:
$ var1: num -1.18 1.26 -0.53
$ var2: Factor w/ 3 levels "1","2","3": 1 2 3
$ var3: num 1.38 -0.401 -0.924
$ var4: Factor w/ 3 levels "1","2","3": 3 1 2
$ var5: num 1.688 0.547 0.727
$ var6: Factor w/ 3 levels "Blue","Green",..: 2 3 1

You can then as which are factors:

> sapply(df, is.factor)
var1 var2 var3 var4 var5 var6
FALSE TRUE FALSE TRUE FALSE TRUE

And if you wanted a number for how many are factors something like this would get you there:

> length(which(sapply(df, is.factor)==TRUE))
[1] 3

You have something similar: length(which(vec==as.factor)), but one problem with this is you are asking which things in the vec object are the same as a function as.factor, which doesn't make sense. So it is giving you the error Error in vec == as.factor :
comparison (1) is possible only for atomic and list types

as.factor is for setting things as factor (as I have shown above), but is.factor is for asking if something is a factor, which will return a logical (TRUE vs FALSE) - also shown above.

Count occurrences of factors across multiple columns in grouped dataframe

You can stack col1 & col2 together, count the number of each combination, and then transform the table to a wide form.

library(dplyr)
library(tidyr)

df %>%
pivot_longer(col1:col2) %>%
count(grp, name, value) %>%
pivot_wider(grp, names_from = c(name, value), names_sort = TRUE,
values_from = n, values_fill = 0)

# A tibble: 3 x 6
grp col1_A col1_B col2_B col2_C col2_D
<chr> <int> <int> <int> <int> <int>
1 a 1 2 2 0 1
2 b 2 0 0 2 0
3 c 1 2 0 2 1

A base solution (Thank @GKi to refine the code):

table(cbind(df["grp"], col=do.call(paste0, stack(df[-1])[2:1])))

col
grp col1A col1B col2B col2C col2D
a 1 2 2 0 1
b 2 0 0 2 0
c 1 2 0 2 1

count the number of times a number (factor) occurs within each group, for each column in the dataframe

Conceptually it is the same solution by Ronak Shah but a little bit simpler.

library(tidyverse)

dat %>%
pivot_longer(-Bin) %>%
pivot_wider(names_from = value, values_fn = length, names_sort=TRUE)

# A tibble: 8 x 7
Bin name `1` `2` `3` `4` `5`
<int> <chr> <int> <int> <int> <int> <int>
1 1 Number 10 7 3 10 20
2 1 Number2 10 6 6 8 20
3 2 Number 2 7 6 8 27
4 2 Number2 2 5 8 13 22
5 3 Number 3 8 13 12 14
6 3 Number2 9 5 6 7 23
7 4 Number 9 6 7 3 25
8 4 Number2 2 7 8 19 14


Related Topics



Leave a reply



Submit