Replacing Pandas or Numpy Nan With a None to Use With Mysqldb

Replacing Pandas or Numpy Nan with a None to use with MysqlDB

@bogatron has it right, you can use where, it's worth noting that you can do this natively in pandas:

df1 = df.where(pd.notnull(df), None)

Note: this changes the dtype of all columns to object.

Example:

In [1]: df = pd.DataFrame([1, np.nan])

In [2]: df
Out[2]:
0
0 1
1 NaN

In [3]: df1 = df.where(pd.notnull(df), None)

In [4]: df1
Out[4]:
0
0 1
1 None

Note: what you cannot do recast the DataFrames dtype to allow all datatypes types, using astype, and then the DataFrame fillna method:

df1 = df.astype(object).replace(np.nan, 'None')

Unfortunately neither this, nor using replace, works with None see this (closed) issue.


As an aside, it's worth noting that for most use cases you don't need to replace NaN with None, see this question about the difference between NaN and None in pandas.

However, in this specific case it seems you do (at least at the time of this answer).

how to replace np.nan with blank in numpy array

You can use np.where() method to do that in this way:

a = np.array([[nan, 2], [3, nan]])
a = np.where(np.isnan(a), '', a)
print(a)

Output:

[['' '2.0']
['3.0' '']]

Process finished with exit code 0

Also if you want to replace it with a number value you could use np.nan_to_num() method:

a = np.array([[nan, 2], [3, nan]])
a = np.nan_to_num(a, nan=0)
print(a)

Output:

[[0. 2.]
[3. 0.]]

Process finished with exit code 0

Set list field to None instead of str('nan') in pandas

Have you tried?

df.astype({...}).to_sql('search_raw', con=self.avails_conn, index='id')
df = df.fillna(None)

Why does pandas use "NaN" from numpy, instead of its own null value?

A main dependency of pandas is numpy, in other words, pandas is built on-top of numpy. Because pandas inherits and uses many of the numpy methods, it makes sense to keep things consistent, that is, missing numeric data are represented with np.NaN.

(This choice to build upon numpy has consequences for other things too. For instance date and time operations are built upon the np.timedelta64 and np.datetime64 dtypes, not the standard datetime module.)


One thing you may not have known is that numpy has always been there with pandas

import pandas as pd
pd.np?
pd.np.nan

Though you might think this behavior could be better since you don't import numpy, this is discouraged and in the near future will be deprecated in favor of directly importing numpy

FutureWarning: The pandas.np module is deprecated and will be removed
from pandas in a future version. Import numpy directly instead


Is it conventional to use np.nan (rather than None) to represent null values in pandas?

If the data are numeric then yes, you should use np.NaN. None requires the dtype to be Object and with pandas you want numeric data stored in a numeric dtype. pandas will generally coerce to the proper null-type upon creation or import so that it can use the correct dtype

pd.Series([1, None])
#0 1.0
#1 NaN <- None became NaN so it can have dtype: float64
#dtype: float64

Why did pandas not have its own null value for most of its lifetime (until last year)? What was the motivation for adding?

pandas did not have it's own null value because it got by with np.NaN, which worked for the majority of circumstances. However with pandas it's very common to have missing data, an entire section of the documentation is devoted to this. NaN, being a float, does not fit into an integer container which means that any numeric Series with missing data is upcast to float. This can become problematic because of floating point math, and some integers cannot be represented perfectly with by a floating point number. As a result, any joins or merges could possibly fail.

# Gets upcast to float
pd.Series([1,2,np.NaN])
#0 1.0
#1 2.0
#2 NaN
#dtype: float64

# Can safely do merges/joins/math because things are still Int
pd.Series([1,2,np.NaN]).astype('Int64')
#0 1
#1 2
#2 <NA>
#dtype: Int64

function returns only None values when replacing pandas column values by regex match

You can use a pattern with a single capturing group and then simpy use Series.str.extract and chain .fillna(np.nan) to fill the non-matched values with NaN:

pattern = r'(?s)(?:an|frage(?:\s+ich)?)\s+d[iı]e\s+Staatsreg[iı]erung(.*)'
df2['que_text_new'] = df2['que_text'].astype(str).str.extract(pattern).fillna(np.nan)

Not sure you need .astype(str), but there is str(s) in your code, so it might be safer with this part.

Here,

  • Capturing groups with single char alternatives are converted to character classes, e.g. (i|ı) -> [iı]
  • Other capturing groups are converted to non-capturing ones, i.e. ( -> (?:.
  • To make np.nan work do not forget to import numpy as np.
  • (?s) is an in-pattern re.DOTALL option.

Pandas Dataframe NaN values replace by no values

try this:

df.where(pd.notnull(df), None)

example

df = pd.DataFrame(np.eye(3))
df = df.where(lambda x: x==1, np.nan)
df = df.where(pd.notnull(df), None)

Note that pd.fillna(None) will not work, it leaves the NaN values untouched.

source https://github.com/pandas-dev/pandas/issues/1972

Pandas interpolation function fails to interpolate after replacing values with .nan

If you first find and replace any value that is not a digit, that should fix your issue.

#Import modules
import pandas as pd
import numpy as np

#Import data
df = pd.read_csv('example.csv')

df['example'] = df.example.replace(r'[^\d]',np.nan,regex=True)
pd.to_numeric(df.example)


Related Topics



Leave a reply



Submit