Getting both column counts and proportions in the same table in R
Here is one approach, you still need a second step, but it comes before the tabular
command so the result is still a tabular
object.
n <- 100
x <- sample(letters[1:3], n, T)
y <- sample(letters[1:3], n, T)
d <- data.frame(x=x, y=y)
d$z <- 1/ave( rep(1,n), d$x, FUN=sum )
(t1 <- tabular(x~y*Heading()*z*((n=length) + (p=sum)), d))
Two by two table with count and percentage in R
library(dplyr)
df %>% group_by(Gender,OnAntibiotic) %>% mutate(n=n()) %>%
group_by(OnAntibiotic) %>% distinct(OnAntibiotic,Gender,n)%>%
mutate(Per=n/sum(n), np=paste0(n," (",round(Per*100,2)," %)")) %>%
select(-n,-Per) %>% spread(OnAntibiotic,np)
# A tibble: 2 x 3
Gender No Yes
<fct> <chr> <chr>
1 Female 3 (60 %) 8 (57.14 %)
2 Male 2 (40 %) 6 (42.86 %)
calculating the proportion of count variable per group in data.table in R
If you are looking for the ratio, you can do :
library(data.table)
mydata[, prop := count/sum(count) * 100, by = .(startYear, groupSize)]
# groupSize gender startYear count prop
# 1: intermediate F 2014 7546 55.9958445
# 2: small F 2014 3500 31.3395415
# 3: intermediate M 2014 5930 44.0041555
# 4: small M 2014 7668 68.6604585
# 5: huge F 2014 18114 56.7125861
# 6: huge M 2014 13826 43.2874139
# 7: large F 2014 11943 54.2222828
# 8: large M 2014 10083 45.7777172
#....
Tidy way to convert numeric columns from counts to proportions
Rephrase to the following:
df %>%
mutate_if(is.numeric, ~ . / rowSums(select(df, where(is.numeric))))
Output:
id x y
1 A 0.3333333 0.6666667
2 B 0.3333333 0.6666667
3 C 0.3333333 0.6666667
4 D 0.3333333 0.6666667
Edit: If you want an answer that doesn't use any additional packages besides dplyr and base, and that can be piped more easily, here's one other (hacky) solution:
df %>%
group_by(id) %>%
mutate(sum = as.character(rowSums(select(cur_data(), is.numeric)))) %>%
summarise_if(is.numeric, ~ . / as.numeric(sum))
The usual dplyr ways of referring to the current data within a function (e.g. cur_data
) don't seem to play nicely with rowSums
in my original phrasing, so I took a slightly different approach here. There is likely a better way of doing this though, so I'm open to suggestions.
convert data frame of counts to proportions in R
Probably something along these lines:
df[, -1] <- lapply( df[ , -1], function(x) x/sum(x, na.rm=TRUE) )
If it were a matrix you could have just used prop.table(mat)
. In this case however you need to limit to working only on the numeric columns (by excluding the first one).
Furthermore I think you need to exclude the "total" row:
my.data[-5, -1] <- lapply( my.data[ -5 , -1], function(x){ x/sum(x, na.rm=TRUE)} )
my.data[ -5 , ]
state y1970 y1980 y1990 y2000
1 Alaska 0.02325581 0.03076923 NA 0.02941176
2 Iowa 0.05813953 0.10256410 0.21428571 0.16806723
3 Nevada 0.58139535 0.51282051 0.71428571 0.42016807
4 Ohio 0.29069767 0.30769231 NA 0.33613445
6 Wyoming 0.04651163 0.04615385 0.07142857 0.04621849
-------------
Alternate approach:
> my.data[,-1] <-lapply( my.data[ , -1], function(x){ x/x[5] } )
> my.data
state y1970 y1980 y1990 y2000
1 Alaska 0.02325581 0.03076923 NA 0.02941176
2 Iowa 0.05813953 0.10256410 0.13953488 0.16806723
3 Nevada 0.58139535 0.51282051 0.46511628 0.42016807
4 Ohio 0.29069767 0.30769231 NA 0.33613445
5 total 1.00000000 1.00000000 1.00000000 1.00000000
6 Wyoming 0.04651163 0.04615385 0.04651163 0.04621849
This shows what prop.table will return with missing values when used on both margins and then on rows and columns separately for a very simple matrix:
> prop.table( matrix( c( 1,2,NA, 3),2) )
[,1] [,2]
[1,] NA NA
[2,] NA NA
> prop.table( matrix( c( 1,2,NA, 3),2), 1 )
[,1] [,2]
[1,] NA NA
[2,] 0.4 0.6
> prop.table( matrix( c( 1,2,NA, 3),2), 2 )
[,1] [,2]
[1,] 0.3333333 NA
[2,] 0.6666667 NA
How to Calculate Percentage Based On Other Row
This is the beginning of a solution:
library(dplyr)
Year <- rep(2000, 6)
State <- c(rep("VA", 4), rep("MA", 2))
Age <- c("<44", "44+", "44+", "<44", "<44", "44+")
Pop <- c(150, 350, 500, 200, 100, 100)
df <- data.frame(State = State, Age = Age, Pop = Pop, Year= Year)
df %>% filter(Age != "Total") %>% group_by(Year, State) %>% summarize(Pop44 = sum(Pop[Age=="<44"]) / sum(Pop))
You don't have to filter the "Total" category but it's usually not a good idea to have a "total" category (better have a column for that)
Calculating count and proportion of a certain value for a number of variables subsetted by other variables
You don't have to convert columns to factors
. In fact, data.table
recommends avoiding factors wherever possible, as it'll also improve speed. However, I'll illustrate how you can convert to factor
much more easily for the future.
sd_cols = c("Feature1", "Feature2", "Feature3")
DT[, c(sd_cols) := lapply(.SD, as.factor), .SDcols=sd_cols]
Okay, now on to the solution. Of course we'll need to use CJ
here because you need to get absent combinations as well. So, we've to generate that first.
uvals = c("no", "yes")
setkey(DT, Feature1, Feature2, Feature3)
DTn = DT[CJ(uvals, uvals, uvals), allow.cartesian=TRUE]
The allow.cartesian=TRUE
is necessary because the join will result in more rows than max(nrow(x), nrow(i))
in a join x[i]
. Read this post for more on allow.cartesian
.
Now that we've all the combinations, we can group/aggregate them to obtain the results in the fashion you require.
ans = DTn[, { tmp1 = sum(Var1 == "yes", na.rm=TRUE);
tmp2 = sum(Var2 == "yes", na.rm=TRUE);
list(Var1.count = tmp1,
Var1.prop = tmp1/.N,
Var2.count = tmp2,
Var2.prop = tmp2/.N * 100)
}, by=key(DT)]
# Feature1 Feature2 Feature3 Var1.count Var1.prop Var2.count Var2.prop
# 1: no no no 0 0.0000000 1 1
# 2: no no yes 0 0.0000000 0 0
# 3: no yes no 0 0.0000000 0 0
# 4: no yes yes 1 1.0000000 1 1
# 5: yes no no 0 0.0000000 0 0
# 6: yes no yes 0 0.0000000 0 0
# 7: yes yes no 0 0.0000000 0 0
# 8: yes yes yes 2 0.6666667 3 1
I think you can play around to get the values as NA instead of 0, if that's really that important?
Following OP's question under comment + edit, after getting DTn
:
vars = c("Var1", "Var2")
ans = DTn[, c(N=.N, lapply(.SD, function(x) sum(x=="yes", na.rm=TRUE))),
by=key(DTn), .SDcols=vars]
N = ans$N
ans[, N := NULL]
ans[, c(paste(vars, "prop", sep=".")) := .SD/N, .SDcols=vars]
setnames(ans, vars, paste(vars, "count", sep="."))
ans
# Feature1 Feature2 Feature3 Var1.count Var2.count Var1.prop Var2.prop
# 1: no no no 0 1 0.0000000 1
# 2: no no yes 0 0 0.0000000 0
# 3: no yes no 0 0 0.0000000 0
# 4: no yes yes 1 1 1.0000000 1
# 5: yes no no 0 0 0.0000000 0
# 6: yes no yes 0 0 0.0000000 0
# 7: yes yes no 0 0 0.0000000 0
# 8: yes yes yes 2 3 0.6666667 1
How about this?
Get the row (or column)-wise tabularized counts (as in table()) of a matrix
We convert the matrix from wide to long using melt
from library(reshape2)
and then do the table
library(reshape2)
table(melt(m)[3:2])
# Var2
#value 1 2 3
# a 1 1 3
# b 3 1 0
# c 0 2 0
# d 0 0 1
If we need the proportion, we can use prop.table
and change the margin accordingly.
prop.table(table(melt(m)[3:2]),1)
Another convenient function is mtabulate
from library(qdapTools)
library(qdapTools)
t(mtabulate(as.data.frame(m)))
Related Topics
Returning a Vector of Class Posixct with Vapply
Handling Latex Backslashes in Xtable
Add Missing Xts/Zoo Data with Linear Interpolation in R
Plotting Dose Response Curves with Ggplot2 and Drc
How to Store the Returned Value from a Shiny Module in Reactivevalues
How to Split a Character Vector into Data Frame
Hover Image in Plotly R Chart in Shiny App
How to Convert .Rdata Format into Text File Format
How to Install Rhadoop Packages (Rmr, Rhdfs, Rhbase)
Plot Only a Select Few Facets in Facet_Grid
Remove Empty Factors from Clustered Bargraph in Ggplot2 with Multiple Facets
Object.Size() Reports Smaller Size Than .Rdata File
Plotting a 95% Confidence Interval for a Lm Object
How to Do Gaussian Elimination in R (Do Not Use "Solve")
R:Convert Nested List into a One Level List
Fast Way of Getting Index of Match in List