﻿ Is There a Weighted.Median() Function - ITCodar

# Is There a Weighted.Median() Function

## Is there a weighted.median() function?

The following packages all have a function to calculate a weighted median: 'aroma.light', 'isotone', 'limma', 'cwhmisc', 'ergm', 'laeken', 'matrixStats, 'PSCBS', and 'bigvis' (on github).

To find them I used the invaluable findFn() in the 'sos' package which is an extension for R's inbuilt help.

``findFn('weighted median')``

Or,

`???'weighted median'`

as ??? is a shortcut in the same way `?some.function` is for `help(some.function)`

## weighted median in spatstat package

I believe this is a flaw in the package, and I'll explain why.

Firstly, `weighted.median` actually just calls `weighted.quantile` with the `probs` vector set to `0.5`. But if you call `weighted.quantile` with your data, you get very strange results:

``weighted.quantile(x, w)#>    0%   25%   50%   75%  100% #> 10.00 10.00 10.50 11.25 12.00 ``

That's not right.

If you look at the body of this function using `body(weighted.quantile)`, and follow the logic through, there seems to be a problem with the way the weights are normalized on line 10 into a variable called `Fx`. To work properly, the normalized weights should be a vector of the same length as `x`, but starting at 0 and ending in 1, with the spacing in between being proportional to the weights.

But if you look at how this is actually calculated:

``body(weighted.quantile)[[10]]#> Fx <- cumsum(w)/sum(w)``

You can see it doesn't start at 0. In your case, the first element would be 0.3333.

So to show this is the case, let's write over this line with the correct expression. (First we need to unlock the binding to give access to the function)

``unlockBinding("weighted.quantile", asNamespace("spatstat"))body(weighted.quantile)[[10]] <- substitute(Fx <- (cumsum(w) - min(w))/(sum(w) - min(w)))``

Now we get the correct result for weighted quantiles (including the correct median)

``weighted.quantile(x, w)#>   0%  25%  50%  75% 100% #> 10.0 10.5 11.0 11.5 12.0 ``

## Python: define function to get the weighted median

Try to stack only one level:

``wmedian = lambda x: x.loc[x['weight'].cumsum().gt(0.5), 'close'].head(1)out = df1.stack(level=0).groupby(level=0).apply(wmedian) \         .reset_index(level=[1, 2], drop=True)``

Output:

``>>> out01-01-2020    2301-02-2020    2101-03-2020    44Name: close, dtype: int64>>> df1.stack(level=0)              close  weight01-01-2020 A     10     0.1           B     20     0.2           C     23     0.3           D     45     0.501-02-2020 A     12     0.3           B     19     0.1           C     21     0.4           D     47     0.201-03-2020 A     15     0.1           B     29     0.2           C      4     0.1           D     44     0.6``

## How to calculate weighted mean and median in python?

First, install the weightedstats library in python.

``pip install weightedstats``

Then, do the following -

Weighted Mean

``ws.weighted_mean(state['Murder.Rate'], weights=state['Population'])4.445833981123394``

Weighted Median

``ws.weighted_median(state['Murder.Rate'], weights=state['Population'])4.4``

It also has special weighted mean and median methods to use with numpy arrays. The above methods will work but in case if you need it.

``my_data = [1, 2, 3, 4, 5]my_weights = [10, 1, 1, 1, 9]ws.numpy_weighted_mean(my_data, weights=my_weights)ws.numpy_weighted_median(my_data, weights=my_weights)``

## KDB: weighted median

For values `v` and weights `w`, `med v where w` gobbles space for larger values of `w`.

Instead, sort `w` into ascending order of `v` and look for where cumulative sums reach half their sum.

``q)show v:10?10017 23 12 66 36 37 44 28 20 30q)show w:.001*10?10000.418 0.126 0.077 0.829 0.503 0.12 0.71 0.506 0.804 0.012q)med v where "j"\$w*100036fq)w iasc v / sort w into ascending order of v0.077 0.418 0.804 0.126 0.506 0.012 0.503 0.12 0.71 0.829q)0.5 1*(sum;sums)@\:w iasc v / half the sum and cumulative sums of w2.05250.077 0.495 1.299 1.425 1.931 1.943 2.446 2.566 3.276 4.105q).[>]0.5 1*(sum;sums)@\:w iasc v / compared1111110000bq)v i sum .[>]0.5 1*(sum;sums)@\:w i:iasc v / weighted median36q)\ts:1000 med v where "j"\$w*100018 132192q)\ts:1000 v i sum .[>]0.5 1*(sum;sums)@\:w i:iasc v2 2576q)wmed:{x i sum .[>]0.5 1*(sum;sums)@\:y i:iasc x}``

Some vector techniques worth noticing:

• Applying two functions with Each Left `(sum;sums)@\:` and using Apply `.` and an operator on the result, rather than setting a variable, e.g. `(0.5*sum yi)>sums yi:y i` or defining an inner lambda `{sums[x]<0.5*sum x}y i`
• Grading one list with `iasc` to sort another
• Multiple mappings through juxtaposition: `v i sum ..`

## Python: weighted median algorithm with pandas

If you want to do this in pure pandas, here's a way. It does not interpolate either. (@svenkatesh, you were missing the cumulative sum in your pseudocode)

``df.sort_values('impwealth', inplace=True)cumsum = df.indweight.cumsum()cutoff = df.indweight.sum() / 2.0median = df.impwealth[cumsum >= cutoff].iloc[0]``

This gives a median of 925000.

## Calculate median from x, y data R

Without transforming:

``lapply(df[,2:3], function(y) median(rep(df\$Size, times = y)))\$val1[1] 49\$val2[1] 47``

data:

``set.seed(99)df <- data.frame(Size = c(1:100),                 val1 = sample(0:9,100,replace = TRUE,),                 val2 = sample(0:9,100,replace = TRUE))``