Pandas: Convert Categories to Numbers

Pandas: convert categories to numbers

First, change the type of the column:

df.cc = pd.Categorical(df.cc)

Now the data look similar but are stored categorically. To capture the category codes:

df['code'] = df.cc.cat.codes

Now you have:

   cc  temp  code
0 US 37.0 2
1 CA 12.0 1
2 US 35.0 2
3 AU 20.0 0

If you don't want to modify your DataFrame but simply get the codes:

df.cc.astype('category').cat.codes

Or use the categorical column as an index:

df2 = pd.DataFrame(df.temp)
df2.index = pd.CategoricalIndex(df.cc)

Convert Categorical values to custom number in pandas dataframe

new_label = {"cat_column": {"low": 1, "high": 0}}
df.replace(new_label , inplace = True)

To do custom label encoding, create the dict of mappings and use replace() to replace your categorical values with numerical ones. You can vary your numerical value depending on your preference.

Hope this is what you are looking for.

How to convert categorical data to numerical data?

Try pd.factorize():

train['city'] = pd.factorize(train.city)[0]

Or categorical dtypes:

train['city'] = train['city'].astype('category').cat.codes

For example:

>>> train
city
0 city_151
1 city_149
2 city_151
3 city_149
4 city_149
5 city_149
6 city_151
7 city_151
8 city_150
9 city_151

factorize:

train['city'] = pd.factorize(train.city)[0]

>>> train
city
0 0
1 1
2 0
3 1
4 1
5 1
6 0
7 0
8 2
9 0

Or astype('category'):

train['city'] = train['city'].astype('category').cat.codes

>>> train
city
0 2
1 0
2 2
3 0
4 0
5 0
6 2
7 2
8 1
9 2

Convert categorical data in pandas dataframe

First, to convert a Categorical column to its numerical codes, you can do this easier with: dataframe['c'].cat.codes.

Further, it is possible to select automatically all columns with a certain dtype in a dataframe using select_dtypes. This way, you can apply above operation on multiple and automatically selected columns.

First making an example dataframe:

In [75]: df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'),  'col3':list('ababb')})

In [76]: df['col2'] = df['col2'].astype('category')

In [77]: df['col3'] = df['col3'].astype('category')

In [78]: df.dtypes
Out[78]:
col1 int64
col2 category
col3 category
dtype: object

Then by using select_dtypes to select the columns, and then applying .cat.codes on each of these columns, you can get the following result:

In [80]: cat_columns = df.select_dtypes(['category']).columns

In [81]: cat_columns
Out[81]: Index([u'col2', u'col3'], dtype='object')

In [83]: df[cat_columns] = df[cat_columns].apply(lambda x: x.cat.codes)

In [84]: df
Out[84]:
col1 col2 col3
0 1 0 0
1 2 1 1
2 3 2 0
3 4 0 1
4 5 1 1

How to map numeric data into categories / bins in Pandas dataframe

With Pandas, you should avoid row-wise operations, as these usually involve an inefficient Python-level loop. Here are a couple of alternatives.

Pandas: pd.cut

As @JonClements suggests, you can use pd.cut for this, the benefit here being that your new column becomes a Categorical.

You only need to define your boundaries (including np.inf) and category names, then apply pd.cut to the desired numeric column.

bins = [0, 2, 18, 35, 65, np.inf]
names = ['<2', '2-18', '18-35', '35-65', '65+']

df['AgeRange'] = pd.cut(df['Age'], bins, labels=names)

print(df.dtypes)

# Age int64
# Age_units object
# AgeRange category
# dtype: object

NumPy: np.digitize

np.digitize provides another clean solution. The idea is to define your boundaries and names, create a dictionary, then apply np.digitize to your Age column. Finally, use your dictionary to map your category names.

Note that for boundary cases the lower bound is used for mapping to a bin.

import pandas as pd, numpy as np

df = pd.DataFrame({'Age': [99, 53, 71, 84, 84],
'Age_units': ['Y', 'Y', 'Y', 'Y', 'Y']})

bins = [0, 2, 18, 35, 65]
names = ['<2', '2-18', '18-35', '35-65', '65+']

d = dict(enumerate(names, 1))

df['AgeRange'] = np.vectorize(d.get)(np.digitize(df['Age'], bins))

Result

   Age Age_units AgeRange
0 99 Y 65+
1 53 Y 35-65
2 71 Y 65+
3 84 Y 65+
4 84 Y 65+

How to convert dtype categorical variable to numerical?

use pd.to_numeric

data.agebin= pd.to_numeric(data.agebin, errors='coerce')

Python Pandas - Changing some column types to categories

Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
public[col] = public[col].astype('category')


Related Topics



Leave a reply



Submit