Adding counts of a factor to a dataframe
using jmsigner's data you could do:
dt$count <- ave(dt$school, dt$school, FUN = length)
Matching and Adding Factor Counts in R Data Frames
Load data with stringsAsFactors=FALSE
:
df <- read.csv(header=TRUE, text="id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null", stringsAsFactors=FALSE)
# check to see if columns are factors
sapply(df, class)
# id obs country
# "character" "integer" "character"
Remove all rows with country = null
df <- df[df$country != "null", ]
Then you can use plyr
package with summarise
to get the desired result as follows:
ddply(df, .(id), summarise, tot_obs=sum(obs), tot_country=length(unique(country)))
# id tot_obs tot_country
# 1 A 12 2
# 2 B 3 1
# 3 C 5 2
How to count factors frequency and organize in a new dataframe in R
test1$year <- as.numeric(as.character(test1$year))
test1 %>% filter(between(year,2016,2018))
%>% group_by(id)
%>% summarize(times = n(),
year = toString(unique(year)))
id times year
<fct> <int> <chr>
1 FC01 3 2018 2017 2016
2 FC03 1 2018
Notes:
- Getting the
times
column is easy, we just use the utility functiondplyr::n()
. - For the pasted list of (unique) string names of years, same approach as this answer.
toString(...)
is cleaner code thanpaste0(as.character(...), collapse=' ')
- Note we must use
unique(year)
as you might have multiple entries for same year. - In order to be able to
filter(between(year, 2016, 2018))
, we must first fix upyear
to be numeric, not a factor (or at minimum, make sure the factor levels are also 2015..2018 so that directly doingas.numeric()
works as intended, instead of giving 1..4
Create columns from factors and count
You only need to make some slight modification to your code. You should use .(Name)
instead of c("Name")
:
ddply(df1, .(Name), summarise,
Score_1 = sum(Score == 1),
Score_2 = sum(Score == 2),
Score_3 = sum(Score == 3))
gives:
Name Score_1 Score_2 Score_3
1 Ben 1 1 0
2 John 1 1 1
Other possibilities include:
1. table(df1)
as @alexis_laz mentioned in the comments, this gives:
> table(df1)
Score
Name 1 2 3
Ben 1 1 0
John 1 1 1
2. The dcast
function of the reshape2 package (or data.table which has the same dcast
function):
library(reshape2) # or library(data.table)
dcast(df1, Name ~ paste0("Score_", Score), fun.aggregate = length)
gives:
Name Score_1 Score_2 Score_3
1 Ben 1 1 0
2 John 1 1 1
Counting the number of factor variables in a data frame
A few problems here:
Your subscript is out of bounds problem
is because df[1:5, ]
is rows 1:5, whereas columns would be df[ ,1:5]
. It appears that you only have 3 rows, not 5.
The second error no applicable method for 'as.quoted' applied to
is referring to the as.factor, which is a function. It is saying that a function doesn't belong within the function
an object of class "function"count
. You can check exactly what count
wants by running ?count
in the console
A third problem that I see is that R will not automatically think that integers are factors. You will have to specify this with numbers. If you read in words, they are often automatically set as factors.
Here is a reproducible example:
> df<-data.frame("var1"=rnorm(3),"var2"=c(1:3),"var3"=rnorm(3),"var4"=c(3,1,2),"var5"=rnorm(3))
> str(df)
'data.frame': 3 obs. of 5 variables:
$ var1: num 0.716 1.43 -0.726
$ var2: int 1 2 3
$ var3: num 0.238 -0.658 0.492
$ var4: num 3 1 2
$ var5: num 1.71 1.54 1.05
Here I used the structure str()
function to check what type of data I have. Note, var1
is read in as an integer when I generated it as c(1:3)
, whereas specifying c(3,1,2)
was read in as numeric in var4
Here, I will tell R I want two of the columns to be factors, and I will make another column of words, which will automatically become factors.
> df<-data.frame("var1"=rnorm(3),"var2"=as.factor(c(1:3)),"var3"=rnorm(3),"var4"=as.factor(c(3,1,2))
+ ,"var5"=rnorm(3), "var6"=c("Green","Red","Blue"))
> str(df)
'data.frame': 3 obs. of 6 variables:
$ var1: num -1.18 1.26 -0.53
$ var2: Factor w/ 3 levels "1","2","3": 1 2 3
$ var3: num 1.38 -0.401 -0.924
$ var4: Factor w/ 3 levels "1","2","3": 3 1 2
$ var5: num 1.688 0.547 0.727
$ var6: Factor w/ 3 levels "Blue","Green",..: 2 3 1
You can then as which are factors:
> sapply(df, is.factor)
var1 var2 var3 var4 var5 var6
FALSE TRUE FALSE TRUE FALSE TRUE
And if you wanted a number for how many are factors something like this would get you there:
> length(which(sapply(df, is.factor)==TRUE))
[1] 3
You have something similar: length(which(vec==as.factor))
, but one problem with this is you are asking which things in the vec
object are the same as a function as.factor
, which doesn't make sense. So it is giving you the error Error in vec == as.factor :
comparison (1) is possible only for atomic and list types
as.factor
is for setting things as factor (as I have shown above), but is.factor
is for asking if something is a factor, which will return a logical (TRUE vs FALSE) - also shown above.
Count occurrences of factors across multiple columns in grouped dataframe
You can stack col1
& col2
together, count the number of each combination, and then transform the table to a wide form.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(col1:col2) %>%
count(grp, name, value) %>%
pivot_wider(grp, names_from = c(name, value), names_sort = TRUE,
values_from = n, values_fill = 0)
# A tibble: 3 x 6
grp col1_A col1_B col2_B col2_C col2_D
<chr> <int> <int> <int> <int> <int>
1 a 1 2 2 0 1
2 b 2 0 0 2 0
3 c 1 2 0 2 1
A base
solution (Thank @GKi to refine the code):
table(cbind(df["grp"], col=do.call(paste0, stack(df[-1])[2:1])))
col
grp col1A col1B col2B col2C col2D
a 1 2 2 0 1
b 2 0 0 2 0
c 1 2 0 2 1
count the number of times a number (factor) occurs within each group, for each column in the dataframe
Conceptually it is the same solution by Ronak Shah but a little bit simpler.
library(tidyverse)
dat %>%
pivot_longer(-Bin) %>%
pivot_wider(names_from = value, values_fn = length, names_sort=TRUE)
# A tibble: 8 x 7
Bin name `1` `2` `3` `4` `5`
<int> <chr> <int> <int> <int> <int> <int>
1 1 Number 10 7 3 10 20
2 1 Number2 10 6 6 8 20
3 2 Number 2 7 6 8 27
4 2 Number2 2 5 8 13 22
5 3 Number 3 8 13 12 14
6 3 Number2 9 5 6 7 23
7 4 Number 9 6 7 3 25
8 4 Number2 2 7 8 19 14
Related Topics
Find Overlapping Regions and Extract Respective Value
R Converting from Datetime to Date
How to Pass Column Name as Argument to Function for Dplyr Verbs
Factor with Comma and Percentage to Numeric
How to Convert Numeric Values to Time Without the Date
Delete Rows with Less Than 7 Characters
Plot Scatterplot on a Map in Shiny
Aggregating Unique Values in Columns to Single Dataframe "Cell"
Outputting Difftime as Hh:Mm:Ss:Mm in R
How to Load Dependencies in an R Package
Shift a Column of Lists in Data.Table by Group
Calculate Percentages of a Binary Variable by Another Variable in R
Create Several Dummy Variables from One String Variable
Count Total Missing Values by Group