Pandas How to Use Pd.Cut()

Pandas how to use pd.cut()

test['range'] = pd.cut(test.days, [0,30,60], include_lowest=True)
print (test)
days range
0 0 (-0.001, 30.0]
1 31 (30.0, 60.0]
2 45 (30.0, 60.0]

See difference:

test = pd.DataFrame({'days': [0,20,30,31,45,60]})

test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True)
#30 value is in [30, 60) group
test['range2'] = pd.cut(test.days, [0,30,60], right=False)
#30 value is in (0, 30] group
test['range3'] = pd.cut(test.days, [0,30,60])
print (test)
days range1 range2 range3
0 0 (-0.001, 30.0] [0, 30) NaN
1 20 (-0.001, 30.0] [0, 30) (0, 30]
2 30 (-0.001, 30.0] [30, 60) (0, 30]
3 31 (30.0, 60.0] [30, 60) (30, 60]
4 45 (30.0, 60.0] [30, 60) (30, 60]
5 60 (30.0, 60.0] NaN (30, 60]

Or use numpy.searchsorted, but values of days has to be sorted:

arr = np.array([0,30,60])
test['range1'] = arr.searchsorted(test.days)
test['range2'] = arr.searchsorted(test.days, side='right') - 1
print (test)
days range1 range2
0 0 0 0
1 20 1 0
2 30 1 1
3 31 2 1
4 45 2 1
5 60 2 2

Python pandas.cut()

The cut method raise a TypeError if you pass a non-int array datatype. The solution I suggest is to pass from an array to a list to manage different datatypes. In this case you can replace the nan and 'Values' with a negative number using a list comprehension. With this set you can use pd.cut method on list and label the data.

a = np.array(['10','8', '15', '20','21','22', '27', '28', 'nan', '30', '32', '33', 'Value'])
a_list = [int(i) if i.isdigit() else -1 for i in c]
bins = pd.IntervalIndex.from_tuples([(-np.Inf, 0), (0, 10), (10, 32), (32, np.Inf)])
lab = ['Not a Value', '10 and below', '11 - 32', '33 and above']
a_cut = pd.cut(s, bins)
a_cut.categories = lab
print(a.value_counts())

Binning in Pandas Cut

Let's add np.inf to end of your bin list:

pd.cut(df.TOTAL, bins=[0,100,200,300,400,450,500,600,700,800,900,1000,2000,np.inf])

how to use pd.cut() across columns of a data frame?

Use apply

df.apply(pd.cut, bins=[0,0.5,1])

You can specify the axis if you want to run across columns (axis=0) or rows (axis=1)

How to give label on pandas.cut() when a a value does not meet any boundaries

Use cat.add_categories with Series.fillna:

binned_out = pd.cut(df['a'], bins=bins, labels=labels).cat.add_categories([0]).fillna(0)
print (binned_out)
0 2
1 0
2 5
3 3
4 1
5 6
Name: a, dtype: category
Categories (7, int64): [1 < 2 < 3 < 4 < 5 < 6 < 0]

Pandas replace tuple like value from pd.cut with a integer

There is no need to use replace, you can use .cat.codes to get the ordinal values assigned to the corresponding intervals

t['count'] = pd.cut(t['count'], bins=p_breaks, duplicates='drop', include_lowest=True).cat.codes + 1

Sort data ranges with pandas.cut

In addition of the comment of @QuangHoang, you can use value_counts with a bins parameter:

bins : int, optional

Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data.

>>> ages.value_counts(bins=[0,20,40,60,80,100,120], sort=False)
(-0.001, 20.0] 334
(20.0, 40.0] 382
(40.0, 60.0] 224
(60.0, 80.0] 54
(80.0, 100.0] 6
(100.0, 120.0] 0
dtype: int64

How do I rewrite this pd.cut call to use df.loc and avoid a SettingWithCopyWarning?

You can try using pandas.DataFrame.loc.

labels=['Child', 'Teen', 'Adult', 'Retired']
tdf.loc[:, 'age_group']=pd.cut(tdf['Age'], bins=[0, 12, 18, 65, 86],labels=labels)


Related Topics



Leave a reply



Submit