count number of rows in a data frame in R based on group
Here's an example that shows how table(.)
(or, more closely matching your desired output, data.frame(table(.))
does what it sounds like you are asking for.
Note also how to share reproducible sample data in a way that others can copy and paste into their session.
Here's the (reproducible) sample data:
mydf <- structure(list(ID = c(110L, 111L, 121L, 131L, 141L),
MONTH.YEAR = c("JAN. 2012", "JAN. 2012",
"FEB. 2012", "FEB. 2012",
"MAR. 2012"),
VALUE = c(1000L, 2000L, 3000L, 4000L, 5000L)),
.Names = c("ID", "MONTH.YEAR", "VALUE"),
class = "data.frame", row.names = c(NA, -5L))
mydf
# ID MONTH.YEAR VALUE
# 1 110 JAN. 2012 1000
# 2 111 JAN. 2012 2000
# 3 121 FEB. 2012 3000
# 4 131 FEB. 2012 4000
# 5 141 MAR. 2012 5000
Here's the calculation of the number of rows per group, in two output display formats:
table(mydf$MONTH.YEAR)
#
# FEB. 2012 JAN. 2012 MAR. 2012
# 2 2 1
data.frame(table(mydf$MONTH.YEAR))
# Var1 Freq
# 1 FEB. 2012 2
# 2 JAN. 2012 2
# 3 MAR. 2012 1
Count number of rows within each group
Current best practice (tidyverse) is:
require(dplyr)
df1 %>% count(Year, Month)
Count number of rows per group and add result to original data frame
Using data.table
:
library(data.table)
dt = as.data.table(df)
# or coerce to data.table by reference:
# setDT(df)
dt[ , count := .N, by = .(name, type)]
For pre-data.table 1.8.2
alternative, see edit history.
Using dplyr
:
library(dplyr)
df %>%
group_by(name, type) %>%
mutate(count = n())
Or simply:
add_count(df, name, type)
Using plyr
:
plyr::ddply(df, .(name, type), transform, count = length(num))
Add a column that count number of rows until the first 1, by group in R
df <- data.frame(Group=c(1,1,1,1,2,2),
var1=c(1,0,0,1,1,1),
var2=c(0,0,1,1,0,0),
var3=c(0,1,0,0,0,1))
This works for any number of variables as long as the structure is the same as in the example (i.e. Group + many variables that are 0 or 1)
df %>%
mutate(rownr = row_number()) %>%
pivot_longer(-c(Group, rownr)) %>%
group_by(Group, name) %>%
mutate(out = cumsum(value != 1 & (cumsum(value) < 1)) + 1,
out = ifelse(max(out) > n(), 0, max(out))) %>%
pivot_wider(names_from = c(name, name), values_from = c(value, out)) %>%
select(-rownr)
Returns:
Group value_var1 value_var2 value_var3 out_var1 out_var2 out_var3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 1 3 2
2 1 0 0 1 1 3 2
3 1 0 1 0 1 3 2
4 1 1 1 0 1 3 2
5 2 1 0 0 1 0 2
6 2 1 0 1 1 0 2
Count rows in data table with certain values by group
You can solve it as follows:
cols <- c("number_of_offices", "number_of_apartments")
df[, (cols) := .(sum(Type == "office"), sum(Type == "apartment")), Property]
# Property Type number_of_offices number_of_apartments
# 1: 1 apartment 1 1
# 2: 1 office 1 1
# 3: 2 office 2 0
# 4: 2 office 2 0
# 5: 3 apartment 1 2
# 6: 3 apartment 1 2
# 7: 3 office 1 2
Calculate frequency for group-rows within data frame
The problem here is that although df
is technically a data frame, it is not well structured. A data frame should have one column per variable and one row per observation. Your data would make more sense if it were transposed first:
library(tibble)
library(dplyr)
df <- rownames_to_column(as.data.frame(t(df)), "sample")
df[2:5] <- lapply(df[2:5], as.numeric)
df
#> sample Species1 Species2 Species3 Species4 Group
#> 1 sample1 2 0 5 0 Gr1
#> 2 sample2 12 13 0 0 Gr1
#> 3 sample3 52 0 0 0 Gr2
#> 4 sample4 221 0 25 0 Gr2
Now we can pivot to make the Species
its own column, and it is straightforward to do the calculations you need:
tidyr::pivot_longer(df, 2:5) %>%
group_by(name, Group) %>%
summarise(absolute = sum(value > 0),
Freq = absolute / length(name))
#> # A tibble: 8 x 4
#> # Groups: name [4]
#> name Group absolute Freq
#> <chr> <chr> <int> <dbl>
#> 1 Species1 Gr1 2 1
#> 2 Species1 Gr2 2 1
#> 3 Species2 Gr1 1 0.5
#> 4 Species2 Gr2 0 0
#> 5 Species3 Gr1 1 0.5
#> 6 Species3 Gr2 1 0.5
#> 7 Species4 Gr1 0 0
#> 8 Species4 Gr2 0 0
Count rows with the same pattern in R
dplyr
library(dplyr)
dat %>%
group_by(start_id, end_id, type) %>%
tally() %>%
ungroup()
# # A tibble: 3 x 4
# start_id end_id type n
# <dbl> <dbl> <chr> <int>
# 1 1 2 a 1
# 2 1 3 b 2
# 3 2 5 a 2
base R
aggregate(. ~ start_id + end_id + type, data = dat, FUN = length)
# start_id end_id type id
# 1 1 2 a 1
# 2 2 5 a 2
# 3 1 3 b 2
Data
dat <- structure(list(start_id = c(1, 2, 1, 2, 1), end_id = c(2, 5, 3, 5, 3), type = c("a", "a", "b", "a", "b"), id = 1:5), row.names = c(NA, -5L), class = "data.frame")
Related Topics
Remove Ids With Fewer Than 9 Unique Observations
Remove Space Between Plotted Data and the Axes
Multi-Row X-Axis Labels in Ggplot Line Chart
How to Force a Line Break in Rmarkdown'S Title
Remove Last N Rows in Data Frame With the Arbitrary Number of Rows
R: How to Get the Percentage Change from Two Different Columns
Filter a Data Frame According to Minimum and Maximum Values
Selecting Only Duplicates Based on Multiple Columns in R
Convert Multiple Columns of Numeric Data to Dates in R
How to Append a Sequential Number for Every Element in a Data Frame
Create and Assign Multiple New Dataframe Columns in Ifelse Statement
Ggplot2: Setting Geom_Bar Baseline to 1 Instead of Zero
Add X and Y Axis to All Facet_Wrap
Replace Column Values With Na Based on a Different Column or Row Position With Tidyverse
How to Find the Difference in Value in Every Two Consecutive Rows in R
Append Data Frames Together in a for Loop
Why Does Data.Table Update Names(Dt) by Reference, Even If I Assign to Another Variable