Pandas Groupby Range of Values

Pandas Groupby Range of Values

You might be interested in pd.cut:

>>> df.groupby(pd.cut(df["B"], np.arange(0, 1.0+0.155, 0.155))).sum()
A B
B
(0, 0.155] 2.775458 0.246394
(0.155, 0.31] 1.123989 0.471618
(0.31, 0.465] 2.051814 1.882763
(0.465, 0.62] 2.277960 1.528492
(0.62, 0.775] 1.577419 2.810723
(0.775, 0.93] 0.535100 1.694955
(0.93, 1.085] NaN NaN

[7 rows x 2 columns]

how to group by list ranges of value in python pandas

Use groupby + cut:

bins = [-1, 100, 200, np.inf]
labels=['0-100','100-200','more than 200']
df=df.groupby(pd.cut(df['value'], bins=bins, labels=labels)).size().reset_index(name='count')
print (df)
value count
0 0-100 2
1 100-200 3
2 more than 200 2

Groupby range of numbers in Pandas and extract start and end values

IIUC you can use diff and cumsum to group, then check if the group has more than 1 element:

df["group"] = df["higher_count"].diff().ne(1).cumsum()
print (df.loc[df.groupby("group")["higher_count"].transform(len)>1]
.rename_axis("date")
.reset_index()
.groupby("group")[["date", "price"]].agg(["first", "last"]))

date price
first last first last
group
2 2020-03-19 01:00:00 2020-03-19 04:00:00 8 11
3 2020-03-19 05:00:00 2020-03-19 08:00:00 6 9
6 2020-03-19 11:00:00 2020-03-19 13:00:00 9 11

Pandas groupby range of values when range is unknown

Would some thing like this work (you should modify the print command to write to file):

thresh = 10
s = df.groupby('range')['pos1'].diff().gt(thresh).cumsum()

for (r,g), d in df.groupby(['range',s])['pos1']:
print(r, list(d))

Output:

range1 [1, 2, 3, 4]
range1 [100, 101, 102, 104, 107, 108]
range1 [207, 208, 209, 210]
range2 [10, 11, 12]
range2 [50, 51, 52, 54, 55]
range3 [50, 51, 52, 53]
range3 [107, 108, 109, 110, 111, 112, 113]
range3 [800, 802, 803, 804, 805]

How to group by a range of values in pandas?

I am guessing OP wants to group by categorical variables, followed by a numeric variable binned in intervals. In that case you can use the np.digitize().

smallest = np.min(df['strike'])
largest = np.max(df['strike'])
num_edges = 3
# np.digitize(input_array, bin_edges)
ind = np.digitize(df['strike'], np.linspace(smallest, largest, num_edges))

then ind should be

array([1, 1, 2, 2, 2, 2, 3], dtype=int64)

which corresponding to binning

 [10, 10, 12, 13, 12, 13, 14]

with bin edges

array([ 10.,  12.,  14.]) # == np.linspace(smallest, largest, num_edges)

Finally, group by all the columns you want, but with this additional bin column

df['binned_strike'] = ind
for grp in df.groupby(['symbol', 'serie', 'binned_strike']):
print "group key"
print grp[0]
print "group content"
print grp[1]
print "============="

This should print

group key
('IP', 'A', 1)
group content
last price serie strike symbol type binned_strike
0 1.0 11 A 10 IP call 1
=============
group key
('IP', 'A', 2)
group content
last price serie strike symbol type binned_strike
2 2.5 11 A 12 IP put 2
4 4.5 11 A 12 IP call 2
=============
group key
('IP', 'B', 1)
group content
last price serie strike symbol type binned_strike
1 2.0 11 B 10 IP put 1
=============
group key
('IP', 'B', 2)
group content
last price serie strike symbol type binned_strike
3 3.0 11 B 13 IP put 2
5 5.0 11 B 13 IP put 2
=============
group key
('IP', 'B', 3)
group content
last price serie strike symbol type binned_strike
6 6.0 11 B 14 IP call 3
=============

Pandas: Group by column and count values in range in another column and add that count to a new column

You could create a binning column displaying whether each position is between 0,10 and then use a pivot_table with aggfunc set to count:

df['threshold'] = np.where(df['position'].between(0,10),'within 10','outside of 10')
df.pivot_table(index='page', columns='threshold', values='position', aggfunc='count',fill_value=0)

prints:

threshold  outside of 10  within 10
page
url/1 0 1
url/2 2 1
url/3 0 1


Related Topics



Leave a reply



Submit