How to Quickly Form Groups (Quartiles, Deciles, etc) by Ordering Column(S) in a Data Frame

How to quickly form groups (quartiles, deciles, etc) by ordering column(s) in a data frame

The method I use is one of these or Hmisc::cut2(value, g=4):

temp$quartile <- with(temp, cut(value, 
                                breaks=quantile(value, probs=seq(0,1, by=0.25), na.rm=TRUE), 
                                include.lowest=TRUE))

An alternate might be:

temp$quartile <- with(temp, factor(
                            findInterval( val, c(-Inf,
                               quantile(val, probs=c(0.25, .5, .75)), Inf) , na.rm=TRUE), 
                            labels=c("Q1","Q2","Q3","Q4")
      ))

The first one has the side-effect of labeling the quartiles with the values, which I consider a "good thing", but if it were not "good for you", or the valid problems raised in the comments were a concern you could go with version 2. You can use labels= in cut, or you could add this line to your code:

temp$quartile <- factor(temp$quartile, levels=c("1","2","3","4") )

Or even quicker but slightly more obscure in how it works, although it is no longer a factor, but rather a numeric vector:

temp$quartile <- as.numeric(temp$quartile)

R: splitting dataset into quartiles/deciles. What is the right method?

Another way would be ntile() in dplyr.

library(tidyverse)

foo <- data.frame(a = 1:100,
                  b = runif(100, 50, 200),
                  stringsAsFactors = FALSE)

foo %>%
    mutate(quantile = ntile(b, 10))

#  a         b quantile
#1 1  93.94754        2
#2 2 172.51323        8
#3 3  99.79261        3
#4 4  81.55288        2
#5 5 116.59942        5
#6 6 128.75947        6

How to set groups by the percentiles of whole sample?

First part answer is subtract 1 with integer division by 10 and add 1 for start groups from 1:

df = pd.DataFrame({'a':range(1,101)})

df['b'] = 'group ' + (df.a.sub(1) // 10 + 1).astype(str)
print(df)
      a         b
0     1   group 1
1     2   group 1
2     3   group 1
3     4   group 1
4     5   group 1
..  ...       ...
95   96  group 10
96   97  group 10
97   98  group 10
98   99  group 10
99  100  group 10

EDIT: For deciles use qcut:

df['b'] = pd.qcut(df.a, 10, labels=False)

findInterval by group with dplyr

You can do this in group_by + mutate step -

library(dplyr)

df %>%
  group_by(gr) %>%
  mutate(breakpoints = findInterval(val, 
                       c(-Inf, quantile(val, c(0.25, 0.5, 0.75)), Inf))) %>%
  ungroup

#      gr    val breakpoints
#   <int>  <dbl>       <int>
# 1     1  0.440           1
# 2     1  0.770           2
# 3     1  2.56            4
# 4     1  1.07            3
# 5     1  1.13            3
# 6     1  2.72            4
# 7     1  1.46            4
# 8     1 -0.265           1
# 9     1  0.313           1
#10     1  0.554           2
# … with 20 more rows

findInterval is applied for each gr separately.

How can I create a function that computes the median and quartiles for each column of data, for each factor of data?

You can write the function like this :

library(dplyr)

apply_fun <- function(data) {

data %>%
  group_by(Type) %>%
  summarise(across(starts_with('x'), list(med = median, 
                                          first_quartile = ~quantile(., 0.25), 
                                          second_quartile = ~quantile(., 0.5),
                                          third_quartile = ~quantile(., 0.75))))
}
result <- apply_fun(data1)

You can add/remove functions in the list as per requirement.

How to compute quantiles on groups

Using group_by you can just do:

library(lubridate)

temp.all = temp.all %>%
    # lubridate::date(date) might be necessary if you have datetimes
    group_by(date) %>%
    mutate(quartile = cut(value, breaks = 4, labels = paste0("Q", 1:4)))

dplyr also has a function ntile which should behave similarly to cut and should give the same results.

How to Quickly Form Groups (Quartiles, Deciles, etc) by Ordering Column(S) in a Data Frame