Replacing Pandas or Numpy Nan with a None to use with MysqlDB
@bogatron has it right, you can use where
, it's worth noting that you can do this natively in pandas:
df1 = df.where(pd.notnull(df), None)
Note: this changes the dtype of all columns to object
.
Example:
In [1]: df = pd.DataFrame([1, np.nan])
In [2]: df
Out[2]:
0
0 1
1 NaN
In [3]: df1 = df.where(pd.notnull(df), None)
In [4]: df1
Out[4]:
0
0 1
1 None
Note: what you cannot do recast the DataFrames dtype
to allow all datatypes types, using astype
, and then the DataFrame fillna
method:
df1 = df.astype(object).replace(np.nan, 'None')
Unfortunately neither this, nor using replace
, works with None
see this (closed) issue.
As an aside, it's worth noting that for most use cases you don't need to replace NaN with None, see this question about the difference between NaN and None in pandas.
However, in this specific case it seems you do (at least at the time of this answer).
how to replace np.nan with blank in numpy array
You can use np.where()
method to do that in this way:
a = np.array([[nan, 2], [3, nan]])
a = np.where(np.isnan(a), '', a)
print(a)
Output:
[['' '2.0']
['3.0' '']]
Process finished with exit code 0
Also if you want to replace it with a number value you could use np.nan_to_num()
method:
a = np.array([[nan, 2], [3, nan]])
a = np.nan_to_num(a, nan=0)
print(a)
Output:
[[0. 2.]
[3. 0.]]
Process finished with exit code 0
Set list field to None instead of str('nan') in pandas
Have you tried?
df.astype({...}).to_sql('search_raw', con=self.avails_conn, index='id')
df = df.fillna(None)
Why does pandas use "NaN" from numpy, instead of its own null value?
A main dependency of pandas
is numpy
, in other words, pandas is built on-top of numpy. Because pandas inherits and uses many of the numpy methods, it makes sense to keep things consistent, that is, missing numeric data are represented with np.NaN
.
(This choice to build upon numpy has consequences for other things too. For instance date and time operations are built upon the np.timedelta64
and np.datetime64
dtypes, not the standard datetime
module.)
One thing you may not have known is that numpy
has always been there with pandas
import pandas as pd
pd.np?
pd.np.nan
Though you might think this behavior could be better since you don't import numpy, this is discouraged and in the near future will be deprecated in favor of directly importing numpy
FutureWarning: The pandas.np module is deprecated and will be removed
from pandas in a future version. Import numpy directly instead
Is it conventional to use np.nan
(rather than None
) to represent null values in pandas?
If the data are numeric then yes, you should use np.NaN
. None
requires the dtype to be Object
and with pandas you want numeric data stored in a numeric dtype. pandas
will generally coerce to the proper null-type upon creation or import so that it can use the correct dtype
pd.Series([1, None])
#0 1.0
#1 NaN <- None became NaN so it can have dtype: float64
#dtype: float64
Why did pandas not have its own null value for most of its lifetime (until last year)? What was the motivation for adding?
pandas
did not have it's own null value because it got by with np.NaN
, which worked for the majority of circumstances. However with pandas
it's very common to have missing data, an entire section of the documentation is devoted to this. NaN
, being a float, does not fit into an integer container which means that any numeric Series with missing data is upcast to float
. This can become problematic because of floating point math, and some integers cannot be represented perfectly with by a floating point number. As a result, any joins or merges
could possibly fail.
# Gets upcast to float
pd.Series([1,2,np.NaN])
#0 1.0
#1 2.0
#2 NaN
#dtype: float64
# Can safely do merges/joins/math because things are still Int
pd.Series([1,2,np.NaN]).astype('Int64')
#0 1
#1 2
#2 <NA>
#dtype: Int64
function returns only None values when replacing pandas column values by regex match
You can use a pattern with a single capturing group and then simpy use Series.str.extract
and chain .fillna(np.nan)
to fill the non-matched values with NaN
:
pattern = r'(?s)(?:an|frage(?:\s+ich)?)\s+d[iı]e\s+Staatsreg[iı]erung(.*)'
df2['que_text_new'] = df2['que_text'].astype(str).str.extract(pattern).fillna(np.nan)
Not sure you need .astype(str)
, but there is str(s)
in your code, so it might be safer with this part.
Here,
- Capturing groups with single char alternatives are converted to character classes, e.g.
(i|ı)
->[iı]
- Other capturing groups are converted to non-capturing ones, i.e.
(
->(?:
. - To make
np.nan
work do not forget toimport numpy as np
. (?s)
is an in-patternre.DOTALL
option.
Pandas Dataframe NaN values replace by no values
try this:
df.where(pd.notnull(df), None)
example
df = pd.DataFrame(np.eye(3))
df = df.where(lambda x: x==1, np.nan)
df = df.where(pd.notnull(df), None)
Note that pd.fillna(None) will not work, it leaves the NaN values untouched.
source https://github.com/pandas-dev/pandas/issues/1972
Pandas interpolation function fails to interpolate after replacing values with .nan
If you first find and replace any value that is not a digit, that should fix your issue.
#Import modules
import pandas as pd
import numpy as np
#Import data
df = pd.read_csv('example.csv')
df['example'] = df.example.replace(r'[^\d]',np.nan,regex=True)
pd.to_numeric(df.example)
Related Topics
How to Locate Elements on Webpage With Headless Chrome
Plotting Data from Multiple Pandas Data Frames in One Plot
How to Convert Float into Hours Minutes Seconds
Conversion of String to Upper Case Without Inbuilt Methods
Python: Getting Around Division by Zero
Python: Using Doctests for Classes
Convert Float to Float Time in Python
Pandas - Replace Outliers With Groupby Mean
Python Dataframe Query With Spaces in Column Name
Error! C:\File\Example.Db Is Not Utf-8 Encoded Ipython Notebook
Pandas Dataframe Check If Column Value Exists in a Group of Columns
Python Pip Install Fails: Invalid Command Egg_Info
How to Execute a Script Remotely in Python Using Ssh
Replacing Values in a Dataframe for Given Indices
Python Serial: How to Use the Read or Readline Function to Read More Than 1 Character At a Time
Django Rest Framework Csrf Failed: Csrf Cookie Not Set