Prevent pandas from interpreting 'NA' as NaN in a string
You could use parameters keep_default_na
and na_values
to set all NA values by hand docs:
import pandas as pd
from io import StringIO
data = """
PDB CHAIN SP_PRIMARY RES_BEG RES_END PDB_BEG PDB_END SP_BEG SP_END
5d8b N P60490 1 146 1 146 1 146
5d8b NA P80377 _ 126 1 126 1 126
5d8b O P60491 1 118 1 118 1 118
"""
df = pd.read_csv(StringIO(data), sep=' ', keep_default_na=False, na_values=['_'])
In [130]: df
Out[130]:
PDB CHAIN SP_PRIMARY RES_BEG RES_END PDB_BEG PDB_END SP_BEG SP_END
0 5d8b N P60490 1 146 1 146 1 146
1 5d8b NA P80377 NaN 126 1 126 1 126
2 5d8b O P60491 1 118 1 118 1 118
In [144]: df.CHAIN.apply(type)
Out[144]:
0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
Name: CHAIN, dtype: object
EDIT
All default NA
values from na-values (as of pandas
1.0.0):
The default NaN recognized values are ['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA', '', '#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', ''].
Prevent Pandas read_csv from interpreting NA as NaN but retaining NaN for empty values
For me, this works:
df = pd.read_csv('file.csv', keep_default_na=False, na_values=[''])
which gives:
region date expenses
0 NA 1/1/2019 53.0
1 EU 1/2/2019 NaN
But I'd rather play safe, due to possible other NaN
in other columns, and do
df = pd.read_csv('file.csv')
df['region'] = df['region'].fillna('NA')
How to prevent pandas from removing 'NA' character string when reading a csv?
Read the dataframe with keep_default_na=False
, possibly specifying with na_values
the set of values that you want to consider as "genuine" NaNs:
# custom admissible NaNs values, 'NA' is not in this list
na_values = ['', '#N/A', '#N/A N/A', '#NA', '-1.#IND',
'-1.#QNAN', '-NaN', '-nan', '1.#IND',
'1.#QNAN', 'N/A', 'NULL', 'NaN',
'n/a', 'nan', 'null'
]
data = pd.read_csv('C:\\Users\\User\\Desktop\\' + filename,
sep=',',
quotechar='"',
encoding='mbcs',
low_memory=False,
na_values = na_values # specify custom NaN values
keep_default_na=False) # and use them
Here's a reproducible example of what could be happening here:
# create dataframe with NA and write it to file
import pandas as pd
df = pd.DataFrame({'Line Code':['MV', 'RM', 'NA', 'AB'],
'Product SKU':['Product1', 'Product2', 'Product3', 'Product4']})
df.to_csv("mydf.csv", index = False)
# read it in, in two different fashions
df_problematic = pd.read_csv("mydf.csv")
df_ok = pd.read_csv("mydf.csv", keep_default_na = False)
in df_problematic
, the 'NA' value is interpreted as NaN, which is not what you want (refer to the read_csv
docs for options when reading csv files in pandas and for info about the default list of symbols interpreted as NaNs).
Prevent pandas from interpreting 'NA' as NaN in a string : csv file
for NaN
df[~df.isnull()]
for NA
df.dropna()
String NA conflict with pandas na type
Set na_filter
parameter as False
df = pd.read_csv("aa.csv", na_filter=False)
Why does pandas identify string NaN (a nitride of sodium) as a missing value?
As per pandas documentation for read_csv, 'NaN' is one of default missing value indicators.
If you're sure there are no missing values in your csv file, you could simply pass an argument na_filter = False
to your read_csv()
call to stop missing value parse.
Otherwise, you could use keep_default_na = False
to exclude the default values and specify your own with na_values
parameter.
how to stop pandas from inferring string value Infinity as inf and changing datatype to float64
You can pass the dtype
argument to set explicit column types for particular column names, like so:
pd.read_csv(file_name, dtype={'Vendor': str})
Related Topics
How to Join Two Wav Files Using Python
How to Change the Range of the X-Axis with Datetimes in Matplotlib
Valueerror: Numpy.Dtype Has the Wrong Size, Try Recompiling
Boto3 to Download All Files from a S3 Bucket
Removing a List of Characters in String
Remove Namespace and Prefix from Xml in Python Using Lxml
Can Existing Virtualenv Be Upgraded Gracefully
Pip - Fatal Error in Launcher: Unable to Create Process Using '"'
What Is the _Dict_._Dict_ Attribute of a Python Class
How to Use Brew Installed Python as the Default Python
How to Group a List of Tuples/Objects by Similar Index/Attribute in Python
How to Use Pip to Install a Package from a Private Github Repository