Pandas how to use pd.cut()
test['range'] = pd.cut(test.days, [0,30,60], include_lowest=True)
print (test)
days range
0 0 (-0.001, 30.0]
1 31 (30.0, 60.0]
2 45 (30.0, 60.0]
See difference:
test = pd.DataFrame({'days': [0,20,30,31,45,60]})
test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True)
#30 value is in [30, 60) group
test['range2'] = pd.cut(test.days, [0,30,60], right=False)
#30 value is in (0, 30] group
test['range3'] = pd.cut(test.days, [0,30,60])
print (test)
days range1 range2 range3
0 0 (-0.001, 30.0] [0, 30) NaN
1 20 (-0.001, 30.0] [0, 30) (0, 30]
2 30 (-0.001, 30.0] [30, 60) (0, 30]
3 31 (30.0, 60.0] [30, 60) (30, 60]
4 45 (30.0, 60.0] [30, 60) (30, 60]
5 60 (30.0, 60.0] NaN (30, 60]
Or use numpy.searchsorted
, but values of days
has to be sorted:
arr = np.array([0,30,60])
test['range1'] = arr.searchsorted(test.days)
test['range2'] = arr.searchsorted(test.days, side='right') - 1
print (test)
days range1 range2
0 0 0 0
1 20 1 0
2 30 1 1
3 31 2 1
4 45 2 1
5 60 2 2
Python pandas.cut()
The cut method raise a TypeError if you pass a non-int array datatype. The solution I suggest is to pass from an array to a list to manage different datatypes. In this case you can replace the nan
and 'Values'
with a negative number using a list comprehension. With this set you can use pd.cut method on list and label the data.
a = np.array(['10','8', '15', '20','21','22', '27', '28', 'nan', '30', '32', '33', 'Value'])
a_list = [int(i) if i.isdigit() else -1 for i in c]
bins = pd.IntervalIndex.from_tuples([(-np.Inf, 0), (0, 10), (10, 32), (32, np.Inf)])
lab = ['Not a Value', '10 and below', '11 - 32', '33 and above']
a_cut = pd.cut(s, bins)
a_cut.categories = lab
print(a.value_counts())
Binning in Pandas Cut
Let's add np.inf
to end of your bin list:
pd.cut(df.TOTAL, bins=[0,100,200,300,400,450,500,600,700,800,900,1000,2000,np.inf])
how to use pd.cut() across columns of a data frame?
Use apply
df.apply(pd.cut, bins=[0,0.5,1])
You can specify the axis
if you want to run across columns (axis=0
) or rows (axis=1
)
How to give label on pandas.cut() when a a value does not meet any boundaries
Use cat.add_categories
with Series.fillna
:
binned_out = pd.cut(df['a'], bins=bins, labels=labels).cat.add_categories([0]).fillna(0)
print (binned_out)
0 2
1 0
2 5
3 3
4 1
5 6
Name: a, dtype: category
Categories (7, int64): [1 < 2 < 3 < 4 < 5 < 6 < 0]
Pandas replace tuple like value from pd.cut with a integer
There is no need to use replace
, you can use .cat.codes
to get the ordinal values assigned to the corresponding intervals
t['count'] = pd.cut(t['count'], bins=p_breaks, duplicates='drop', include_lowest=True).cat.codes + 1
Sort data ranges with pandas.cut
In addition of the comment of @QuangHoang, you can use value_counts
with a bins
parameter:
bins : int, optional
Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data.
>>> ages.value_counts(bins=[0,20,40,60,80,100,120], sort=False)
(-0.001, 20.0] 334
(20.0, 40.0] 382
(40.0, 60.0] 224
(60.0, 80.0] 54
(80.0, 100.0] 6
(100.0, 120.0] 0
dtype: int64
How do I rewrite this pd.cut call to use df.loc and avoid a SettingWithCopyWarning?
You can try using pandas.DataFrame.loc
.
labels=['Child', 'Teen', 'Adult', 'Retired']
tdf.loc[:, 'age_group']=pd.cut(tdf['Age'], bins=[0, 12, 18, 65, 86],labels=labels)
Related Topics
Typeerror: List Indices Must Be Integers or Slices, Not Str
Replace First Occurrence of String in Python
How to Rotate Xticklabels in Matplotlib So That the Spacing Between Each Xticklabel Is Equal
Image Segmentation Based on Edge Pixel Map
Reading/Writing Ms Word Files in Python
Conversion of Strings Like \\Uxxxx in Python
Matplotlib Yaxis Range Display Using Absolute Values Rather Than Offset Values
Plotting Networkx Graph with Node Labels Defaulting to Node Name
Get Timezone from City in Python/Django
Prevent Plot from Showing in Jupyter Notebook
How to Call Function That Takes an Argument in a Django Template