How to get proportions and counts of a data frame in r
Try installing plyr and running
library(plyr)
df <- data.frame(x1=c(1, 1, 0, 0, 1, 0),
label=c("a", "a", "b", "a", "c", "c"))
ddply(df, .(label), summarize, prop = mean(x1), count = length(x1))
# label prop count
# 1 a 0.6666667 3
# 2 b 0.0000000 1
# 3 c 0.5000000 2
which under the hood applies a split/apply/combine method similar to this in base R:
do.call(rbind, lapply(split(df, df$x2),
with, list(prop = mean(x1),
count = length(x1))))
convert data frame of counts to proportions in R
Probably something along these lines:
df[, -1] <- lapply( df[ , -1], function(x) x/sum(x, na.rm=TRUE) )
If it were a matrix you could have just used prop.table(mat)
. In this case however you need to limit to working only on the numeric columns (by excluding the first one).
Furthermore I think you need to exclude the "total" row:
my.data[-5, -1] <- lapply( my.data[ -5 , -1], function(x){ x/sum(x, na.rm=TRUE)} )
my.data[ -5 , ]
state y1970 y1980 y1990 y2000
1 Alaska 0.02325581 0.03076923 NA 0.02941176
2 Iowa 0.05813953 0.10256410 0.21428571 0.16806723
3 Nevada 0.58139535 0.51282051 0.71428571 0.42016807
4 Ohio 0.29069767 0.30769231 NA 0.33613445
6 Wyoming 0.04651163 0.04615385 0.07142857 0.04621849
-------------
Alternate approach:
> my.data[,-1] <-lapply( my.data[ , -1], function(x){ x/x[5] } )
> my.data
state y1970 y1980 y1990 y2000
1 Alaska 0.02325581 0.03076923 NA 0.02941176
2 Iowa 0.05813953 0.10256410 0.13953488 0.16806723
3 Nevada 0.58139535 0.51282051 0.46511628 0.42016807
4 Ohio 0.29069767 0.30769231 NA 0.33613445
5 total 1.00000000 1.00000000 1.00000000 1.00000000
6 Wyoming 0.04651163 0.04615385 0.04651163 0.04621849
This shows what prop.table will return with missing values when used on both margins and then on rows and columns separately for a very simple matrix:
> prop.table( matrix( c( 1,2,NA, 3),2) )
[,1] [,2]
[1,] NA NA
[2,] NA NA
> prop.table( matrix( c( 1,2,NA, 3),2), 1 )
[,1] [,2]
[1,] NA NA
[2,] 0.4 0.6
> prop.table( matrix( c( 1,2,NA, 3),2), 2 )
[,1] [,2]
[1,] 0.3333333 NA
[2,] 0.6666667 NA
Calculate proportions within groups in a data frame in R
We can group by 'treatment', 'rep', calculate the 'prop'ortion by dividing the 'cells_alive' with the value of 'cells_alive' that correspond to 'Time' as 0
library(dplyr)
x1 <- x %>%
group_by(treatment, rep) %>%
mutate(prop = cells_alive/cells_alive[Time == 0])
-output
x1
# A tibble: 16 x 5
# Groups: treatment, rep [4]
# treatment rep Time cells_alive prop
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 500 1
# 2 1 1 30 470 0.94
# 3 1 1 60 100 0.2
# 4 1 1 180 20 0.04
# 5 1 2 0 476 1
# 6 1 2 30 310 0.651
# 7 1 2 60 99 0.208
# 8 1 2 180 2 0.00420
# 9 2 1 0 430 1
#10 2 1 30 420 0.977
#11 2 1 60 300 0.698
#12 2 1 180 100 0.233
#13 2 2 0 489 1
#14 2 2 30 451 0.922
#15 2 2 60 289 0.591
#16 2 2 180 4 0.00818
Or with match
x %>%
group_by(treatment, rep) %>%
mutate(prop = cells_alive/cells_alive[match(0, Time)])
if the 'Time' is already ordered
x %>%
group_by(treatment, rep) %>%
mutate(prop = cells_alive/first(cells_alive))
How to calculate the proportion in r?
Since you want ratio of sum of Immunised
column with Eligible
you could do
sum(df$Immunised)/sum(df$Eligible)
#[1] 0.770869
calculating the proportion of count variable per group in data.table in R
If you are looking for the ratio, you can do :
library(data.table)
mydata[, prop := count/sum(count) * 100, by = .(startYear, groupSize)]
# groupSize gender startYear count prop
# 1: intermediate F 2014 7546 55.9958445
# 2: small F 2014 3500 31.3395415
# 3: intermediate M 2014 5930 44.0041555
# 4: small M 2014 7668 68.6604585
# 5: huge F 2014 18114 56.7125861
# 6: huge M 2014 13826 43.2874139
# 7: large F 2014 11943 54.2222828
# 8: large M 2014 10083 45.7777172
#....
R percentage of counts from a data.frame
If we are using dplyr/tidyr
, the way to get the expected is
library(dplyr)
library(tidyr)
df %>%
count(group, hight) %>%
mutate(percent = n/sum(n)) %>%
select(-n) %>%
spread(hight, percent)
# group short tall
# <fctr> <dbl> <dbl>
#1 A 0.3333333 0.6666667
#2 B 0.6666667 0.3333333
Or as @JoeRoe mentioned in the comments, we could use pivot_wider
in the newer versions of tidyr
as a replacement to spread
...
pivot_wider(names_from = hight, values_from = percent)
data
df <- data.frame(group, hight)
Tidy way to convert numeric columns from counts to proportions
Rephrase to the following:
df %>%
mutate_if(is.numeric, ~ . / rowSums(select(df, where(is.numeric))))
Output:
id x y
1 A 0.3333333 0.6666667
2 B 0.3333333 0.6666667
3 C 0.3333333 0.6666667
4 D 0.3333333 0.6666667
Edit: If you want an answer that doesn't use any additional packages besides dplyr and base, and that can be piped more easily, here's one other (hacky) solution:
df %>%
group_by(id) %>%
mutate(sum = as.character(rowSums(select(cur_data(), is.numeric)))) %>%
summarise_if(is.numeric, ~ . / as.numeric(sum))
The usual dplyr ways of referring to the current data within a function (e.g. cur_data
) don't seem to play nicely with rowSums
in my original phrasing, so I took a slightly different approach here. There is likely a better way of doing this though, so I'm open to suggestions.
Related Topics
R Markdown Add Tag to Head of HTML Output
Change Plot Panel in Multipanel Plot in R
Convert a Row of a Data Frame to a Simple Vector in R
Integrate() Gives Totally Wrong Number
Passing Ellipsis Arguments to Map Function Purrr Package, R
Disable Gui, Graphics Devices in R
R Mlogit Model, Computationally Singular
R - Insert Row for Missing Monthly Data and Interpolate
How to Predict Survival Probabilities in R
Trouble Getting Latest Version of Gdal on Ubuntu Running R
Is Ifelse Ever Appropriate in a Non-Vectorized Situation and Vice-Versa
How to Install/Locate R.H and Rmath.H Header Files
How to Get Column Names When Using Skip Along with Read.Csv
How to Extract Coefficients' Standard Error from an "Aov" Model