## Is there a weighted.median() function?

The following packages all have a function to calculate a weighted median: 'aroma.light', 'isotone', 'limma', 'cwhmisc', 'ergm', 'laeken', 'matrixStats, 'PSCBS', and 'bigvis' (on github).

To find them I used the invaluable findFn() in the 'sos' package which is an extension for R's inbuilt help.

`findFn('weighted median')`

Or,

`???'weighted median'`

as ??? is a shortcut in the same way `?some.function`

is for `help(some.function)`

## weighted median in spatstat package

I believe this is a flaw in the package, and I'll explain why.

Firstly, `weighted.median`

actually just calls `weighted.quantile`

with the `probs`

vector set to `0.5`

. But if you call `weighted.quantile`

with your data, you get very strange results:

`weighted.quantile(x, w)`

#> 0% 25% 50% 75% 100%

#> 10.00 10.00 10.50 11.25 12.00

That's not right.

If you look at the body of this function using `body(weighted.quantile)`

, and follow the logic through, there seems to be a problem with the way the weights are normalized on line 10 into a variable called `Fx`

. To work properly, the normalized weights should be a vector of the same length as `x`

, but starting at 0 and ending in 1, with the spacing in between being proportional to the weights.

But if you look at how this is actually calculated:

`body(weighted.quantile)[[10]]`

#> Fx <- cumsum(w)/sum(w)

You can see it doesn't start at 0. In your case, the first element would be 0.3333.

So to show this is the case, let's write over this line with the correct expression. (First we need to unlock the binding to give access to the function)

`unlockBinding("weighted.quantile", asNamespace("spatstat"))`

body(weighted.quantile)[[10]] <- substitute(Fx <- (cumsum(w) - min(w))/(sum(w) - min(w)))

Now we get the correct result for weighted quantiles (including the correct median)

`weighted.quantile(x, w)`

#> 0% 25% 50% 75% 100%

#> 10.0 10.5 11.0 11.5 12.0

## Python: define function to get the weighted median

Try to stack only one level:

`wmedian = lambda x: x.loc[x['weight'].cumsum().gt(0.5), 'close'].head(1)`

out = df1.stack(level=0).groupby(level=0).apply(wmedian) \

.reset_index(level=[1, 2], drop=True)

Output:

`>>> out`

01-01-2020 23

01-02-2020 21

01-03-2020 44

Name: close, dtype: int64

>>> df1.stack(level=0)

close weight

01-01-2020 A 10 0.1

B 20 0.2

C 23 0.3

D 45 0.5

01-02-2020 A 12 0.3

B 19 0.1

C 21 0.4

D 47 0.2

01-03-2020 A 15 0.1

B 29 0.2

C 4 0.1

D 44 0.6

## How to calculate weighted mean and median in python?

First, install the weightedstats library in python.

`pip install weightedstats`

Then, do the following -

**Weighted Mean**

`ws.weighted_mean(state['Murder.Rate'], weights=state['Population'])`

4.445833981123394

**Weighted Median**

`ws.weighted_median(state['Murder.Rate'], weights=state['Population'])`

4.4

It also has special weighted mean and median methods to use with numpy arrays. The above methods will work but in case if you need it.

`my_data = [1, 2, 3, 4, 5]`

my_weights = [10, 1, 1, 1, 9]

ws.numpy_weighted_mean(my_data, weights=my_weights)

ws.numpy_weighted_median(my_data, weights=my_weights)

## KDB: weighted median

For values `v`

and weights `w`

, `med v where w`

gobbles space for larger values of `w`

.

Instead, sort `w`

into ascending order of `v`

and look for where cumulative sums reach half their sum.

`q)show v:10?100`

17 23 12 66 36 37 44 28 20 30

q)show w:.001*10?1000

0.418 0.126 0.077 0.829 0.503 0.12 0.71 0.506 0.804 0.012

q)med v where "j"$w*1000

36f

q)w iasc v / sort w into ascending order of v

0.077 0.418 0.804 0.126 0.506 0.012 0.503 0.12 0.71 0.829

q)0.5 1*(sum;sums)@\:w iasc v / half the sum and cumulative sums of w

2.0525

0.077 0.495 1.299 1.425 1.931 1.943 2.446 2.566 3.276 4.105

q).[>]0.5 1*(sum;sums)@\:w iasc v / compared

1111110000b

q)v i sum .[>]0.5 1*(sum;sums)@\:w i:iasc v / weighted median

36

q)\ts:1000 med v where "j"$w*1000

18 132192

q)\ts:1000 v i sum .[>]0.5 1*(sum;sums)@\:w i:iasc v

2 2576

q)wmed:{x i sum .[>]0.5 1*(sum;sums)@\:y i:iasc x}

Some vector techniques worth noticing:

- Applying two functions with Each Left
`(sum;sums)@\:`

and using Apply`.`

and an operator on the result, rather than setting a variable, e.g.`(0.5*sum yi)>sums yi:y i`

or defining an inner lambda`{sums[x]<0.5*sum x}y i`

- Grading one list with
`iasc`

to sort another - Multiple mappings through juxtaposition:
`v i sum ..`

## Python: weighted median algorithm with pandas

If you want to do this in pure pandas, here's a way. It does not interpolate either. (@svenkatesh, you were missing the cumulative sum in your pseudocode)

`df.sort_values('impwealth', inplace=True)`

cumsum = df.indweight.cumsum()

cutoff = df.indweight.sum() / 2.0

median = df.impwealth[cumsum >= cutoff].iloc[0]

This gives a median of 925000.

## Calculate median from x, y data R

Without transforming:

`lapply(df[,2:3], function(y) median(rep(df$Size, times = y)))`

$val1

[1] 49

$val2

[1] 47

data:

`set.seed(99)`

df <- data.frame(Size = c(1:100),

val1 = sample(0:9,100,replace = TRUE,),

val2 = sample(0:9,100,replace = TRUE))

### Related Topics

Programmatically Creating Markdown Tables in R with Knitr

How to Extract the Fill Colours from a Ggplot Object

How to Index an Element of a List Object in R

Plot a Line Chart with Conditional Colors Depending on Values

Replace Empty Values with Value from Other Column in a Dataframe

Add Text to Horizontal Barplot in R, Y-Axis at Different Scale

How to Make a Discontinuous Axis in R with Ggplot2

What You Can Do with a Data.Frame That You Can't with a Data.Table

Simple Way to Subset Spatialpolygonsdataframe (I.E. Delete Polygons) by Attribute in R

R - Markdown Avoiding Package Loading Messages

Get Column Index from Label in a Data Frame

Cleaning 'Inf' Values from an R Dataframe

Administrative Regions Map of a Country with Ggmap and Ggplot2

Difference Between If() and Ifelse() Functions

Counting Number of Instances of a Condition Per Row R

How to Stack Error Bars in a Stacked Bar Plot Using Geom_Errorbar