How to Calculate the Median on Grouped Dataset

how to calculate the median on grouped dataset?

Since you already know the formula, it should be easy enough to create a function to do the calculation for you.

Here, I've created a basic function to get you started. The function takes four arguments:

frequencies: A vector of frequencies ("number" in your first example)
intervals: A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, "intervals" may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix for you.
sep: The separator character in your "intervals" column in your data.frame.
trim: A regular expression of characters that need to be removed before trying to coerce to a numeric matrix. One pattern is built into the function: trim = "cut". This sets the regular expression pattern to remove (, ), [, and ] from the input.

Here's the function (with comments showing how I used your instructions to put it together):

GroupedMedian <- function(frequencies, intervals, sep = NULL, trim = NULL) {
  # If "sep" is specified, the function will try to create the 
  #   required "intervals" matrix. "trim" removes any unwanted 
  #   characters before attempting to convert the ranges to numeric.
  if (!is.null(sep)) {
    if (is.null(trim)) pattern <- ""
    else if (trim == "cut") pattern <- "\\[|\\]|\\(|\\)"
    else pattern <- trim
    intervals <- sapply(strsplit(gsub(pattern, "", intervals), sep), as.numeric)
  }

  Midpoints <- rowMeans(intervals)
  cf <- cumsum(frequencies)
  Midrow <- findInterval(max(cf)/2, cf) + 1
  L <- intervals[1, Midrow]      # lower class boundary of median class
  h <- diff(intervals[, Midrow]) # size of median class
  f <- frequencies[Midrow]       # frequency of median class
  cf2 <- cf[Midrow - 1]          # cumulative frequency class before median class
  n_2 <- max(cf)/2               # total observations divided by 2

  unname(L + (n_2 - cf2)/f * h)
}

Here's a sample data.frame to work with:

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", 
    "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", 
    "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 
    850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), 
    class = "data.frame", row.names = c(NA, -10L))
mydf
#       salary number
# 1  1500-1600    110
# 2  1600-1700    180
# 3  1700-1800    320
# 4  1800-1900    460
# 5  1900-2000    850
# 6  2000-2100    250
# 7  2100-2200    130
# 8  2200-2300     70
# 9  2300-2400     20
# 10 2400-2500     10

Now, we can simply do:

GroupedMedian(mydf$number, mydf$salary, sep = "-")
# [1] 1915.294

Here's an example of the function in action on some made up data:

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))
y
#           Var1 Freq
# 1   (1.9,11.7]    8
# 2  (11.7,21.5]    8
# 3  (21.5,31.4]    8
# 4  (31.4,41.2]   15
# 5    (41.2,51]   13
# 6    (51,60.8]    5
# 7  (60.8,70.6]   11
# 8  (70.6,80.5]   15
# 9  (80.5,90.3]   11
# 10  (90.3,100]    6

### Here's GroupedMedian's output on the grouped data.frame...
GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
# [1] 49.49231

### ... and the output of median on the original vector
median(x)
# [1] 49.5

By the way, with the sample data that you provided, where I think there was a mistake in one of your ranges (all were separated by dashes except one, which was separated by a comma), since strsplit uses a regular expression by default to split on, you can use the function like this:

x<-c(110,180,320,460,850,250,130,70,20,10)
colnames<-c("numbers")
rownames<-c("[1500-1600]","(1600-1700]","(1700-1800]","(1800-1900]",
            "(1900-2000]"," (2000,2100]","(2100-2200]","(2200-2300]",
            "(2300-2400]","(2400-2500]")
y<-matrix(x,nrow=length(x),dimnames=list(rownames,colnames))
GroupedMedian(y[, "numbers"], rownames(y), sep="-|,", trim="cut")
# [1] 1915.294

Find median of interval data in python

If you want to approximate median for discrete grouped data, you can approximate the median of the entire data set by interpolation using the formula:

median = L + interval * (N / 2 - CF) / F

L = lower limit of the median interval

N = total number of data points

CF = number of data points below the median interval

F = number of data points in the median interval

# Approximating median by pure python and pandas functions
import pandas as pd

df = pd.DataFrame.from_dict({'low_range':[1,11,21,31,41,51], 'high_range':[10,20,30,40,50,60], 'frequency':[123,350,200,1700,360,60]})

N = df['frequency'].sum()
# calulating L1
index = abs(df['frequency'].cumsum() - N/2).idxmin()
L1 = df['low_range'][index + 1]
cumsum_before = df['frequency'].cumsum()[index]
freq_medain = df['frequency'][index + 1]
width = df['high_range'][index + 1] - df['low_range'][index + 1]  + 1

median = L1 + (N/2 - cumsum_before ) / freq_medain * width

print("L1 = {} , cumsum_before = {}, freq_medain = {}, width = {}".format(L1, cumsum_before, freq_medain, width ) )
print("Approximated median = ", median)

L1 = 31 , cumsum_before = 673, freq_medain = 1700, width = 10

Approximated median = 35.25588235294118

If you have continuous data, you can use median_grouped function in statistics package.

# Approximating median by statistics grouped_median for continuous values and fixed intervals

import statistics as st
import pandas as pd

df = pd.DataFrame.from_dict({'low_range':[1,10,21,31,41,51], 'high_range':[10,21,31,41,51,60], 'frequency':[123,350,200,1700,360,60]})

X = ((df['low_range'] + df['high_range'])/2).tolist()

f = df['frequency'].tolist()

# repeating values based on their frequencies
Y = [item for i,item in enumerate(X)
              for count in range(f[i])]

width = df['high_range'][0] - df['low_range'][0] + 1 

median = st.median_grouped(Y, width)
print("Approximated median = ", median)

Approximated median = 35.25588235294118

Access VBA: Calculating Median on data using GROUP BY on two columns

Consider an extension of @Fionnuala's great answer to calculate median in MS Access by accommodating an open-ended number of grouping variables.

VBA (save below in a standard module of Access project)

Code builds a dynamic SQL string for DAO recordset call for later median calculation. Special handling required for groupings with 0-2 records and null values for groupings.

Public Function MedianVBA(ParamArray Arr() As Variant) As Double
On Error GoTo ErrHandle
    Dim N As Long
    Dim tblName As String, numCol As String, grpVals As String
    Dim strSQL As String
    Dim db As DAO.Database, rs As DAO.Recordset
    Dim varMedian As Double, fMedian As Double
    
    'BUILD DYNAMIC SQL
    tblName = Arr(0)
    numCol = Arr(1)
    grpVals = " WHERE " & numCol & " IS NOT NULL "
        
    For N = 2 To UBound(Arr) Step 2
        If Arr(N + 1) = "" Or IsNull(Arr(N + 1)) Then
            grpVals = grpVals & " AND " & Arr(N) & " IS NULL"
        ElseIf IsDate(Arr(N + 1)) Then
            grpVals = grpVals & " AND " & Arr(N) & " = #" & Arr(N + 1) & "#"
        Else
            grpVals = grpVals & " AND CStr(" & Arr(N) & ") = '" & Arr(N + 1) & "'"
        End If
    Next N

    strSQL = "SELECT " & numCol _
              & " FROM " & tblName _
              & grpVals _
              & " ORDER BY " & numCol
  
    'CALCULATE MEDIAN
    Set db = CurrentDb
    Set rs = db.OpenRecordset(strSQL, dbOpenDynaset)
    
    If rs.RecordCount = 0 Then
        MedianAcc = fMedian
        GoTo ExitHandle
    ElseIf rs.RecordCount = 1 Then
        MedianAcc = rs.Fields(numCol)
        GoTo ExitHandle
    End If
    
    rs.Move (rs.RecordCount / 2)
    rs.MovePrevious

    If rs.RecordCount Mod 2 = 0 Then
        varMedian = rs.Fields(numCol)
        If rs.RecordCount = 2 Then
            rs.MoveLast
        Else
            rs.MoveNext
        End If
        fMedian = (varMedian + rs.Fields(numCol)) / 2
    Else
        fMedian = rs.Fields(numCol)
    End If

    rs.Close
    MedianAcc = fMedian
  
ExitHandle:
    Set rs = Nothing: Set db = Nothing
    Exit Function
    
ErrHandle:
    MsgBox Err.Number & ": " & Err.Description, vbCritical, "RUNTIME ERROR"
    Resume ExitHandle
End Function

Do note, above VBA function uses a ParamArray where first argument expects the source table and second column expects the numeric column and the remaining is open-ended for group column name and value pairs. Signature of call is as follows:

=MedianAcc("table_name", 
           "numeric_col", 
           "group1_column", "group1_value",
           "group2_column", "group2_value", 
           ...)

SQL (stored query that calls above VBA function)

Below runs a one-group and two-group median calculation.

SELECT t.typeA, t.typeB
       , MedianVBA('[myTable]', '[total]', '[typeA]', t.typeA) AS MedianGrp1, 
       , MedianVBA('[myTable]', '[total]', '[typeA]', t.typeA, '[typeB]', t.typeB) AS MedianGrp2
FROM myTable t
GROUP BY t.typeA, t.typeB

How to get median with frequency table in R?

EDIT:

Here's how you calculate the mean patient age by hospital:

df %>%
  group_by(hospital) %>%
  summarise(
    mean_age = sum(patient_age*number_patients)/sum(number_patients)
    )

or simply:

df %>%
  group_by(hospital) %>%
  summarise(
    mean_age = mean(rep(patient_age,number_patients))
  )

Here's the medians:

df %>%
  group_by(hospital) %>%
  summarise(
    median_age = sort(rep(patient_age,number_patients))[length(rep(patient_age,number_patients))/2]
  )

Here, we subset sort(rep(patient_age,number_patients)) on its middle value, which is length(rep(patient_age,number_patients))/2

EDIT 2:

or simply:

df %>%
  group_by(hospital) %>%
  summarise(
    median_age = median(rep(patient_age,number_patients))
  )

R - Median of a Frequency distribution, grouped by another variable

We can try with dplyr

library(dplyr)    
Clean1 <- Clean[rep(1:nrow(Clean), Clean$Frequency),]
Clean1 %>%
      group_by(State) %>%
      summarise(Median = median(medicare_average_payment))

Or using data.table

library(data.table)
setDT(Clean)[, .(Median = median(rep(medicare_average_payment, Frequency))) , State]

How to Calculate the Median on Grouped Dataset