Pandas: convert categories to numbers
First, change the type of the column:
df.cc = pd.Categorical(df.cc)
Now the data look similar but are stored categorically. To capture the category codes:
df['code'] = df.cc.cat.codes
Now you have:
cc temp code
0 US 37.0 2
1 CA 12.0 1
2 US 35.0 2
3 AU 20.0 0
If you don't want to modify your DataFrame but simply get the codes:
df.cc.astype('category').cat.codes
Or use the categorical column as an index:
df2 = pd.DataFrame(df.temp)
df2.index = pd.CategoricalIndex(df.cc)
Convert Categorical values to custom number in pandas dataframe
new_label = {"cat_column": {"low": 1, "high": 0}}
df.replace(new_label , inplace = True)
To do custom label encoding, create the dict of mappings and use replace()
to replace your categorical values with numerical ones. You can vary your numerical value depending on your preference.
Hope this is what you are looking for.
How to convert categorical data to numerical data?
Try pd.factorize()
:
train['city'] = pd.factorize(train.city)[0]
Or categorical
dtypes:
train['city'] = train['city'].astype('category').cat.codes
For example:
>>> train
city
0 city_151
1 city_149
2 city_151
3 city_149
4 city_149
5 city_149
6 city_151
7 city_151
8 city_150
9 city_151
factorize
:
train['city'] = pd.factorize(train.city)[0]
>>> train
city
0 0
1 1
2 0
3 1
4 1
5 1
6 0
7 0
8 2
9 0
Or astype('category')
:
train['city'] = train['city'].astype('category').cat.codes
>>> train
city
0 2
1 0
2 2
3 0
4 0
5 0
6 2
7 2
8 1
9 2
Convert categorical data in pandas dataframe
First, to convert a Categorical column to its numerical codes, you can do this easier with: dataframe['c'].cat.codes
.
Further, it is possible to select automatically all columns with a certain dtype in a dataframe using select_dtypes
. This way, you can apply above operation on multiple and automatically selected columns.
First making an example dataframe:
In [75]: df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'), 'col3':list('ababb')})
In [76]: df['col2'] = df['col2'].astype('category')
In [77]: df['col3'] = df['col3'].astype('category')
In [78]: df.dtypes
Out[78]:
col1 int64
col2 category
col3 category
dtype: object
Then by using select_dtypes
to select the columns, and then applying .cat.codes
on each of these columns, you can get the following result:
In [80]: cat_columns = df.select_dtypes(['category']).columns
In [81]: cat_columns
Out[81]: Index([u'col2', u'col3'], dtype='object')
In [83]: df[cat_columns] = df[cat_columns].apply(lambda x: x.cat.codes)
In [84]: df
Out[84]:
col1 col2 col3
0 1 0 0
1 2 1 1
2 3 2 0
3 4 0 1
4 5 1 1
How to map numeric data into categories / bins in Pandas dataframe
With Pandas, you should avoid row-wise operations, as these usually involve an inefficient Python-level loop. Here are a couple of alternatives.
Pandas: pd.cut
As @JonClements suggests, you can use pd.cut
for this, the benefit here being that your new column becomes a Categorical.
You only need to define your boundaries (including np.inf
) and category names, then apply pd.cut
to the desired numeric column.
bins = [0, 2, 18, 35, 65, np.inf]
names = ['<2', '2-18', '18-35', '35-65', '65+']
df['AgeRange'] = pd.cut(df['Age'], bins, labels=names)
print(df.dtypes)
# Age int64
# Age_units object
# AgeRange category
# dtype: object
NumPy: np.digitize
np.digitize
provides another clean solution. The idea is to define your boundaries and names, create a dictionary, then apply np.digitize
to your Age column. Finally, use your dictionary to map your category names.
Note that for boundary cases the lower bound is used for mapping to a bin.
import pandas as pd, numpy as np
df = pd.DataFrame({'Age': [99, 53, 71, 84, 84],
'Age_units': ['Y', 'Y', 'Y', 'Y', 'Y']})
bins = [0, 2, 18, 35, 65]
names = ['<2', '2-18', '18-35', '35-65', '65+']
d = dict(enumerate(names, 1))
df['AgeRange'] = np.vectorize(d.get)(np.digitize(df['Age'], bins))
Result
Age Age_units AgeRange
0 99 Y 65+
1 53 Y 35-65
2 71 Y 65+
3 84 Y 65+
4 84 Y 65+
How to convert dtype categorical variable to numerical?
use pd.to_numeric
data.agebin= pd.to_numeric(data.agebin, errors='coerce')
Python Pandas - Changing some column types to categories
Sometimes, you just have to use a for-loop:
for col in ['parks', 'playgrounds', 'sports', 'roading']:
public[col] = public[col].astype('category')
Related Topics
What Does a Python Process Return Code -9 Mean
How to Limit Memory Usage Within a Python Process
How to Install and Import Python Modules at Runtime
Connect Wifi with Python or Linux Terminal
Socketserver.Threadingtcpserver - Cannot Bind to Address After Program Restart
Linux: Pipe into Python (Ncurses) Script, Stdin and Termios
Trying to Import a Module: Undefined Symbol: Pyunicodeucs4_Decodeutf8
Pil Installation Fails Missing:Stdarg.H
How to Set Explicitly the Terminal Size When Using Pexpect
The Correct Cmakelists.Txt File to Call a Maxon Libarary in a Python Script Using Pybind11
Sending Messages with Telegram - APIs or Cli
Conda Command Will Prompt Error: "Bad Interpreter: No Such File or Directory"
Python3.6 Importerror: Cannot Import Name 'Main' Linux Rhel6
Apt-Get Install for Different Python Versions
Postgresql: How to Install Plpythonu Extension
Index N Dimensional Array with (N-1) D Array