Percentiles from Histogram Data

Calculated Percentile from Histogram Data Element in Anylogic

While I cant confirm if your logic is correct there are a few basic mistakes in your code

  1. The input argument type for histoName should be HistogramSmartData not object. That is why "the methods don't get recognized"

You can verify the type of object by using code complete - see example below how I verified that a histogram data object that I dragged from the palette is of type HistogramSmartData. After that you can lookup HistogramSmartData in the help.

Sample Image


  1. You used a do while syntax for your while loop - not how it is used in java ;-)

It is either

do {
// Do stuff
} while (condition);

or

while (condition) {
// Do stuff
}

The code below works.

Sample Image

PS - screenshots are awesome - but code is better - if it was not for the text recognition on iPhones I used to convert your image to text I would not have answered this question ;-)

int index = 0;
int indexStart = roundToInt(histoName.getXMin());
int indexStop = roundToInt(histoName.getXMax());
double temp1 = 0;
double temp2 = 0;
double size = histoName.getIntervalWidth();
double result;

while (temp2 < percentile) {
index = indexStart;
temp1 = histoName.getCDF(index);
temp2 = histoName.getCDF(index+1);
index++;
}

result = index * (percentile-temp1) /(temp2-temp1)+(index-1)*(temp2-percentile)/(temp2-temp1);
return result;

percentiles from histogram data

First you need to unpivot this. We can do that like this...

SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades

name | array
-------+-----------
arun | {1,4,2,1}
neha | {3,2,1,4}
ram | {1,1,3,0}
radha | {0,3,1,4}

Then we need to index into grades... We do that with a CROSS JOIN LATERAL. We have 4 rows with an array of 4. We want 4*4 rows.

SELECT name, grades, gs1.x, grades[gs1.x] AS gradeqty
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
ORDER BY name, x;

name | grades | x | gradeqty
-------+-----------+---+----------
arun | {1,4,2,1} | 1 | 1
arun | {1,4,2,1} | 2 | 4
arun | {1,4,2,1} | 3 | 2
arun | {1,4,2,1} | 4 | 1
neha | {3,2,1,4} | 1 | 3
neha | {3,2,1,4} | 2 | 2
neha | {3,2,1,4} | 3 | 1
neha | {3,2,1,4} | 4 | 4
radha | {0,3,1,4} | 1 | 0
radha | {0,3,1,4} | 2 | 3
radha | {0,3,1,4} | 3 | 1
radha | {0,3,1,4} | 4 | 4
ram | {1,1,3,0} | 1 | 1
ram | {1,1,3,0} | 2 | 1
ram | {1,1,3,0} | 3 | 3
ram | {1,1,3,0} | 4 | 0
(16 rows)

Now what remains, is we need to CROSS JOIN LATERAL again to reproduce x (our grade), over gradeqty

SELECT name,
gs1.x
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
ORDER BY name, gs1.x;

name | x
-------+---
arun | 1
arun | 2
arun | 2
arun | 2
arun | 2
arun | 3
arun | 3
arun | 4
neha | 1
neha | 1
neha | 1
neha | 2
neha | 2
neha | 3
neha | 4
neha | 4
neha | 4
neha | 4
radha | 2
radha | 2
radha | 2
radha | 3
radha | 4
radha | 4
radha | 4
radha | 4
ram | 1
ram | 2
ram | 3
ram | 3
ram | 3
(31 rows)

Now we GROUP BY name and then we use an Ordered-Set Aggregate Functions percent_disc to finish the job..

SELECT name, percentile_disc(0.5) WITHIN GROUP (ORDER BY gs1.x)
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
GROUP BY name ORDER BY name;

name | percentile_disc
-------+-----------------
arun | 2
neha | 2
radha | 3
ram | 3
(4 rows)

Want to go into it further and make it pretty...

SELECT name, (ARRAY['Poor', 'Fair', 'Good', 'Very Good'])[percentile_disc(0.5) WITHIN GROUP (ORDER BY gs1.x)]
FROM (
SELECT name,
ARRAY[grade_poor, grade_fair, grade_good, grade_vgood]
FROM grades
) AS t(name, grades)
CROSS JOIN LATERAL generate_series(1,4) AS gs1(x)
CROSS JOIN LATERAL generate_series(1,grades[gs1.x]) AS gs2(x)
GROUP BY name
ORDER BY name;

name | array
-------+-------
arun | Fair
neha | Fair
radha | Good
ram | Good
(4 rows)

We can get a slightly more varied out put if we jack up a new user.

INSERT INTO grades (name,grade_poor,grade_fair,grade_good,grade_vgood)
VALUES ('Bob', 0,0,0,100);

name | array
-------+-----------
arun | Fair
Bob | Very Good
neha | Fair
radha | Good
ram | Good
(5 rows)

How do I calculate percentiles with python/numpy?

You might be interested in the SciPy Stats package. It has the percentile function you're after and many other statistical goodies.

percentile() is available in numpy too.

import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50) # return 50th percentile, e.g median.
print p
3.0

This ticket leads me to believe they won't be integrating percentile() into numpy anytime soon.



Related Topics



Leave a reply



Submit