Is There a Weighted.Median() Function

Is there a weighted.median() function?

The following packages all have a function to calculate a weighted median: 'aroma.light', 'isotone', 'limma', 'cwhmisc', 'ergm', 'laeken', 'matrixStats, 'PSCBS', and 'bigvis' (on github).

To find them I used the invaluable findFn() in the 'sos' package which is an extension for R's inbuilt help.

findFn('weighted median')

Or,

???'weighted median'

as ??? is a shortcut in the same way ?some.function is for help(some.function)

weighted median in spatstat package

I believe this is a flaw in the package, and I'll explain why.

Firstly, weighted.median actually just calls weighted.quantile with the probs vector set to 0.5. But if you call weighted.quantile with your data, you get very strange results:

weighted.quantile(x, w)
#> 0% 25% 50% 75% 100%
#> 10.00 10.00 10.50 11.25 12.00

That's not right.

If you look at the body of this function using body(weighted.quantile), and follow the logic through, there seems to be a problem with the way the weights are normalized on line 10 into a variable called Fx. To work properly, the normalized weights should be a vector of the same length as x, but starting at 0 and ending in 1, with the spacing in between being proportional to the weights.

But if you look at how this is actually calculated:

body(weighted.quantile)[[10]]
#> Fx <- cumsum(w)/sum(w)

You can see it doesn't start at 0. In your case, the first element would be 0.3333.

So to show this is the case, let's write over this line with the correct expression. (First we need to unlock the binding to give access to the function)

unlockBinding("weighted.quantile", asNamespace("spatstat"))
body(weighted.quantile)[[10]] <- substitute(Fx <- (cumsum(w) - min(w))/(sum(w) - min(w)))

Now we get the correct result for weighted quantiles (including the correct median)

weighted.quantile(x, w)
#> 0% 25% 50% 75% 100%
#> 10.0 10.5 11.0 11.5 12.0

Python: define function to get the weighted median

Try to stack only one level:

wmedian = lambda x: x.loc[x['weight'].cumsum().gt(0.5), 'close'].head(1)
out = df1.stack(level=0).groupby(level=0).apply(wmedian) \
.reset_index(level=[1, 2], drop=True)

Output:

>>> out
01-01-2020 23
01-02-2020 21
01-03-2020 44
Name: close, dtype: int64

>>> df1.stack(level=0)
close weight
01-01-2020 A 10 0.1
B 20 0.2
C 23 0.3
D 45 0.5
01-02-2020 A 12 0.3
B 19 0.1
C 21 0.4
D 47 0.2
01-03-2020 A 15 0.1
B 29 0.2
C 4 0.1
D 44 0.6

How to calculate weighted mean and median in python?

First, install the weightedstats library in python.

pip install weightedstats

Then, do the following -

Weighted Mean

ws.weighted_mean(state['Murder.Rate'], weights=state['Population'])
4.445833981123394

Weighted Median

ws.weighted_median(state['Murder.Rate'], weights=state['Population'])
4.4

It also has special weighted mean and median methods to use with numpy arrays. The above methods will work but in case if you need it.

my_data = [1, 2, 3, 4, 5]
my_weights = [10, 1, 1, 1, 9]

ws.numpy_weighted_mean(my_data, weights=my_weights)
ws.numpy_weighted_median(my_data, weights=my_weights)

KDB: weighted median

For values v and weights w, med v where w gobbles space for larger values of w.

Instead, sort w into ascending order of v and look for where cumulative sums reach half their sum.

q)show v:10?100
17 23 12 66 36 37 44 28 20 30
q)show w:.001*10?1000
0.418 0.126 0.077 0.829 0.503 0.12 0.71 0.506 0.804 0.012
q)med v where "j"$w*1000
36f

q)w iasc v / sort w into ascending order of v
0.077 0.418 0.804 0.126 0.506 0.012 0.503 0.12 0.71 0.829
q)0.5 1*(sum;sums)@\:w iasc v / half the sum and cumulative sums of w
2.0525
0.077 0.495 1.299 1.425 1.931 1.943 2.446 2.566 3.276 4.105
q).[>]0.5 1*(sum;sums)@\:w iasc v / compared
1111110000b
q)v i sum .[>]0.5 1*(sum;sums)@\:w i:iasc v / weighted median
36

q)\ts:1000 med v where "j"$w*1000
18 132192
q)\ts:1000 v i sum .[>]0.5 1*(sum;sums)@\:w i:iasc v
2 2576

q)wmed:{x i sum .[>]0.5 1*(sum;sums)@\:y i:iasc x}

Some vector techniques worth noticing:

  • Applying two functions with Each Left (sum;sums)@\: and using Apply . and an operator on the result, rather than setting a variable, e.g. (0.5*sum yi)>sums yi:y i or defining an inner lambda {sums[x]<0.5*sum x}y i
  • Grading one list with iasc to sort another
  • Multiple mappings through juxtaposition: v i sum ..

Python: weighted median algorithm with pandas

If you want to do this in pure pandas, here's a way. It does not interpolate either. (@svenkatesh, you were missing the cumulative sum in your pseudocode)

df.sort_values('impwealth', inplace=True)
cumsum = df.indweight.cumsum()
cutoff = df.indweight.sum() / 2.0
median = df.impwealth[cumsum >= cutoff].iloc[0]

This gives a median of 925000.

Calculate median from x, y data R

Without transforming:

lapply(df[,2:3], function(y) median(rep(df$Size, times = y)))
$val1
[1] 49

$val2
[1] 47

data:

set.seed(99)
df <- data.frame(Size = c(1:100),
val1 = sample(0:9,100,replace = TRUE,),
val2 = sample(0:9,100,replace = TRUE))


Related Topics



Leave a reply



Submit