Replace Invalid Values with None in Pandas Dataframe

Replace invalid values with None in Pandas DataFrame

Actually in later versions of pandas this will give a TypeError:

df.replace('-', None)
TypeError: If "to_replace" and "value" are both None then regex must be a mapping

You can do it by passing either a list or a dictionary:

In [11]: df.replace('-', df.replace(['-'], [None]) # or .replace('-', {0: None})
Out[11]:
0
0 None
1 3
2 2
3 5
4 1
5 -5
6 -1
7 None
8 9

But I recommend using NaNs rather than None:

In [12]: df.replace('-', np.nan)
Out[12]:
0
0 NaN
1 3
2 2
3 5
4 1
5 -5
6 -1
7 NaN
8 9

How to replace a value in pandas, with NaN?

You can replace this just for that column using replace:

df['workclass'].replace('?', np.NaN)

or for the whole df:

df.replace('?', np.NaN)

UPDATE

OK I figured out your problem, by default if you don't pass a separator character then read_csv will use commas ',' as the separator.

Your data and in particular one example where you have a problematic line:

54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K

has in fact a comma and a space as the separator so when you passed the na_value=['?'] this didn't match because all your values have a space character in front of them all which you can't observe.

if you change your line to this:

rawfile = pd.read_csv(filename, header=None, names=DataLabels, sep=',\s', na_values=["?"])

then you should find that it all works:

27      54               NaN  180211  Some-college             10 

How can I replace values in pandas data frame?

Try

df = df.replace('?', "Null")

Or, you probably actually want to use:

import numpy as np    
df = df.replace('?', np.nan)

Replacing values in pandas data frame with None

Prefer use pd.NA rather than None. Read this

mydf.replace([5], pd.NA, inplace=True)
# mydf.replace([5], [None], inplace=True)
>>> mydf
col1 col2
0 <NA> 2
1 <NA> <NA>
2 8 6
3 4 7
4 <NA> <NA>

Replace None with NaN in pandas dataframe

You can use DataFrame.fillna or Series.fillna which will replace the Python object None, not the string 'None'.

import pandas as pd
import numpy as np

For dataframe:

df = df.fillna(value=np.nan)

For column or series:

df.mycol.fillna(value=np.nan, inplace=True)

Dataframe's replace method behaving weird

Instead of replacing 0 by None, we can use numpy.nan like so :

>>> import numpy as np
>>> temp["Glucose"] = diabetes_data["Glucose"].replace(0, np.nan)
>>> temp.loc[null_index]
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
75 1 NaN 48 20 0 24.7 0.140 22 0
182 1 NaN 74 20 23 27.7 0.299 21 0
342 1 NaN 68 35 0 32.0 0.389 22 0
349 5 NaN 80 32 0 41.0 0.346 37 1
502 6 NaN 68 41 0 39.0 0.727 41 1

What is going on:

The first two arguments to .replace are to_replace, and values, both of which default to None.

When you explicitly pass None as the second argument (i.e. for values), then there is no difference from just calling the replace function without the values argument at all. Without any further arguments passed, calling .replace will refer to the method argument: which defaults to pad: a probably very undesired effect in this case.

This means the issue isn't to do with the fact you're using int, it's to do with the value you're trying to replace the int with.

An example from the pandas documentation:

This case is actually explicitly explained in the documentation, and a workaround using a dictionary argument is provided:

Compare the behavior of s.replace({'a': None}) and s.replace('a', None) to understand the peculiarities of the to_replace parameter:

>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])

When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace({'a': None}) is equivalent to s.replace(to_replace={'a': None}, value=None, method=None):

s.replace({'a': None})
0 10
1 None
2 None
3 b
4 None
dtype: object

When value=None and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case. The command s.replace('a', None) is actually equivalent to s.replace(to_replace='a', value=None, method='pad'):

s.replace('a', None)
0 10
1 10
2 10
3 b
4 b
dtype: object

Replacing blank values (white space) with NaN in pandas

I think df.replace() does the job, since pandas 0.13:

df = pd.DataFrame([
[-0.532681, 'foo', 0],
[1.490752, 'bar', 1],
[-1.387326, 'foo', 2],
[0.814772, 'baz', ' '],
[-0.222552, ' ', 4],
[-1.176781, 'qux', ' '],
], columns='A B C'.split(), index=pd.date_range('2000-01-01','2000-01-06'))

# replace field that's entirely space (or empty) with NaN
print(df.replace(r'^\s*$', np.nan, regex=True))

Produces:

                   A    B   C
2000-01-01 -0.532681 foo 0
2000-01-02 1.490752 bar 1
2000-01-03 -1.387326 foo 2
2000-01-04 0.814772 baz NaN
2000-01-05 -0.222552 NaN 4
2000-01-06 -1.176781 qux NaN

As Temak pointed it out, use df.replace(r'^\s+$', np.nan, regex=True) in case your valid data contains white spaces.

Replacing Pandas or Numpy Nan with a None to use with MysqlDB

@bogatron has it right, you can use where, it's worth noting that you can do this natively in pandas:

df1 = df.where(pd.notnull(df), None)

Note: this changes the dtype of all columns to object.

Example:

In [1]: df = pd.DataFrame([1, np.nan])

In [2]: df
Out[2]:
0
0 1
1 NaN

In [3]: df1 = df.where(pd.notnull(df), None)

In [4]: df1
Out[4]:
0
0 1
1 None

Note: what you cannot do recast the DataFrames dtype to allow all datatypes types, using astype, and then the DataFrame fillna method:

df1 = df.astype(object).replace(np.nan, 'None')

Unfortunately neither this, nor using replace, works with None see this (closed) issue.


As an aside, it's worth noting that for most use cases you don't need to replace NaN with None, see this question about the difference between NaN and None in pandas.

However, in this specific case it seems you do (at least at the time of this answer).



Related Topics



Leave a reply



Submit