Calculating Percentage Within a Group

Calculating percentages with GROUP BY query

WITH t1 AS 
(SELECT User, Rating, Count(*) AS n
FROM your_table
GROUP BY User, Rating)
SELECT User, Rating, n,
(0.0+n)/(COUNT(*) OVER (PARTITION BY User)) -- no integer divide!
FROM t1;

Or

SELECT User, Rating, Count(*) OVER w_user_rating AS n, 
(0.0+Count(*) OVER w_user_rating)/(Count(*) OVER (PARTITION BY User)) AS pct
FROM your_table
WINDOW w_user_rating AS (PARTITION BY User, Rating);

I would see if one of these or the other yields a better query plan with the appropriate tool for your RDBMS.

Calculating percentage within a group

You can do it with a sub-select and a join:

SELECT t1.sex, employed, count(*) AS `count`, count(*) / t2.total AS percent
FROM my_table AS t1
JOIN (
SELECT sex, count(*) AS total
FROM my_table
GROUP BY sex
) AS t2
ON t1.sex = t2.sex
GROUP BY t1.sex, employed;

I can't think of other approaches off the top of my head.

SQL - Calculate percentage by group, for multiple groups

SELECT
SIGN(Orders),
ROUND(COUNT(*) * 100.000 / SUM(COUNT(*), 2) OVER (PARTITION BY Month)) AS Parts,
Month
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)

Demo on Postgres:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=4cd2d1455673469c2dfc060eccea8020

You've stated that it's important for the total to be 100% so you might consider rounding down in the case of no orders and rounding up in the case of has orders for those scenarios where the percentages falls precisely on an odd multiple of 0.5%. Or perhaps rounding toward even or round smallest down would be better options:

WITH DATA AS (
SELECT SIGN(Orders) AS HasOrders, Month,
COUNT(*) * 10000.000 / SUM(COUNT(*)) OVER (PARTITION BY Month) AS PartsPercent
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
)
select HasOrders, Month, PartsPercent,
PartsPercent - TRUNCATE(PartsPercent) AS Fraction,
CASE WHEN HasOrders = 0
THEN FLOOR(PartsPercent) ELSE CEILING(PartsPercent)
END AS PartsRound0Down,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5
AND MOD(TRUNCATE(PartsPercent), 2) = 0
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsRoundTowardEven,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5 AND PartsPercent < 50
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsSmallestTowardZero
from DATA

It's usually not advisable to test floating-point values for equality and I don't know how BigQuery's float64 will work with the comparison against 0.5. One half is nevertheless representable in binary. See these in a case where the breakout is 101 vs 99. I don't have immediate access to BigQuery so be aware that Postgres's rounding behavior is different:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=c8237e272427a0d1114c3d8056a01a09

Calculate percentage within a subgroup in R

You first group by country to get the sum for each country. Then you group by country and motiv and use the sum for each country to calculate your frequency.

 am2 %>%
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))

ggplot2 example:

df <- am2 %>% 
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))

library(ggplot2)
df %>%
ggplot( aes(x=country, y=freq_perc, fill=motif)) +
geom_bar(stat="identity", position="dodge")

How to calculate percentage in group by with condition?

You can do conditional aggregation. I like to do this with avg():

select provider, avg(status = 'failure') failure_ratio
from mytable
group by provider

For each provider, this gives you a numeric value between 0 and 1 that represents the ratio of records in 'failure' status. You can multiply that by 100 if you want a percentage instead.

Using dplyr function to calculate percentage within groups

library(dplyr)

df %>%
# line below to freeze order of type_n if not ordered factor already
mutate(type_n = forcats::fct_inorder(type_n)) %>%
group_by(type_n) %>%
summarize(n = n(), total = sum(population)) %>%
mutate(new_col = (n / total) %>% scales::percent(decimal.mark = ",", suffix = ""))

# A tibble: 3 x 4
type_n n total new_col
<fct> <int> <int> <chr>
1 small 2 7 28,6
2 medium 2 14 14,3
3 large 3 15 20,0

How to calculate the percentage of values with grouping

If the column type has only the values 0 and 1:

SELECT 
`group`,
round(100.0 * sum(type) / count(*), 0) as percentage
FROM country
GROUP BY `group`;

See the demo.

Results:

| group | percentage |
| ----- | ---------- |
| 1 | 50 |
| 2 | 67 |
| 3 | 0 |


Related Topics



Leave a reply



Submit