Pandas How to Use Pd.Cut()

Pandas how to use pd.cut()

test['range'] = pd.cut(test.days, [0,30,60], include_lowest=True)
print (test)
   days           range
0     0  (-0.001, 30.0]
1    31    (30.0, 60.0]
2    45    (30.0, 60.0]

See difference:

test = pd.DataFrame({'days': [0,20,30,31,45,60]})

test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True)
#30 value is in [30, 60) group
test['range2'] = pd.cut(test.days, [0,30,60], right=False)
#30 value is in (0, 30] group
test['range3'] = pd.cut(test.days, [0,30,60])
print (test)
   days          range1    range2    range3
0     0  (-0.001, 30.0]   [0, 30)       NaN
1    20  (-0.001, 30.0]   [0, 30)   (0, 30]
2    30  (-0.001, 30.0]  [30, 60)   (0, 30]
3    31    (30.0, 60.0]  [30, 60)  (30, 60]
4    45    (30.0, 60.0]  [30, 60)  (30, 60]
5    60    (30.0, 60.0]       NaN  (30, 60]

Or use numpy.searchsorted, but values of days has to be sorted:

arr = np.array([0,30,60])
test['range1'] = arr.searchsorted(test.days)
test['range2'] = arr.searchsorted(test.days, side='right') - 1
print (test)
   days  range1  range2
0     0       0       0
1    20       1       0
2    30       1       1
3    31       2       1
4    45       2       1
5    60       2       2

Python pandas.cut()

The cut method raise a TypeError if you pass a non-int array datatype. The solution I suggest is to pass from an array to a list to manage different datatypes. In this case you can replace the nan and 'Values' with a negative number using a list comprehension. With this set you can use pd.cut method on list and label the data.

a = np.array(['10','8', '15', '20','21','22', '27', '28', 'nan', '30', '32', '33', 'Value'])
a_list = [int(i) if i.isdigit() else -1 for i in c]
bins = pd.IntervalIndex.from_tuples([(-np.Inf, 0), (0, 10), (10, 32), (32, np.Inf)])
lab = ['Not a Value', '10 and below', '11 - 32', '33 and above']
a_cut = pd.cut(s, bins)
a_cut.categories = lab
print(a.value_counts())

Binning in Pandas Cut

Let's add np.inf to end of your bin list:

pd.cut(df.TOTAL, bins=[0,100,200,300,400,450,500,600,700,800,900,1000,2000,np.inf])

how to use pd.cut() across columns of a data frame?

Use apply

df.apply(pd.cut, bins=[0,0.5,1])

You can specify the axis if you want to run across columns (axis=0) or rows (axis=1)

How to give label on pandas.cut() when a a value does not meet any boundaries

Use cat.add_categories with Series.fillna:

binned_out = pd.cut(df['a'], bins=bins, labels=labels).cat.add_categories([0]).fillna(0)
print (binned_out)
0    2
1    0
2    5
3    3
4    1
5    6
Name: a, dtype: category
Categories (7, int64): [1 < 2 < 3 < 4 < 5 < 6 < 0]

Pandas replace tuple like value from pd.cut with a integer

There is no need to use replace, you can use .cat.codes to get the ordinal values assigned to the corresponding intervals

t['count'] = pd.cut(t['count'], bins=p_breaks, duplicates='drop', include_lowest=True).cat.codes + 1

Sort data ranges with pandas.cut

In addition of the comment of @QuangHoang, you can use value_counts with a bins parameter:

bins : int, optional
Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data.

>>> ages.value_counts(bins=[0,20,40,60,80,100,120], sort=False)
(-0.001, 20.0]    334
(20.0, 40.0]      382
(40.0, 60.0]      224
(60.0, 80.0]       54
(80.0, 100.0]       6
(100.0, 120.0]      0
dtype: int64

How do I rewrite this pd.cut call to use df.loc and avoid a SettingWithCopyWarning?

You can try using pandas.DataFrame.loc.

labels=['Child', 'Teen', 'Adult', 'Retired']
tdf.loc[:, 'age_group']=pd.cut(tdf['Age'], bins=[0, 12, 18, 65, 86],labels=labels)

Pandas How to Use Pd.Cut()