Replace Empty Strings With None/Null Values in Dataframe

Pandas does not fill nan values with empty string

Accessing with square brackets and a list of columns creates a copy, so you modify a temporary object, not the original dataframe.

You have three possible solutions, either pass a dict of column -> replacement for each column, assign or loop over the columns.

Looping

for col in (col_buyername, col_product):
df[col].fillna('', inplace=True)

Assignment

df[[col_buyername, col_product]] = df[[col_buyername, col_product]].fillna('')

dict

df.fillna({col_buyername: '', col_product: ''}, inplace=True)

The loop and the dict approach should be a little more efficient than the reassignment.

For more info on when pandas created copies and when not, see https://stackoverflow.com/a/53954986/3838691

How to replace empty strings in a dataframe with NA (missing value) not NA string

By specifying just NA, according to ?NA -"NA is a logical constant of length 1 which contains a missing value."

The class can be checked

class(NA)
#[1] "logical"
class(NA_character_)
#[1] "character"

and both of them is identified by standard functions such as is.na

is.na(NA)
#[1] TRUE
is.na(NA_character_)
#[1] TRUE

The if_else is type sensitive, so instead of specifying as NA which returns a logical output, it can specified as either NA_real_, NA_integer_, NA_character_ depending on the type of the 'boat' column. Assuming that the 'boat' is character class, we may need NA_character_

titanic %>% 
mutate(boat = if_else(boat=="", NA_character_ ,boat))

How to replace None only with empty string using pandas?

It looks like None is being promoted to NaN and so you cannot use replace like usual, the following works:

In [126]:
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
df.loc[mask[col], col] = ''
df

Out[126]:
A B C D E
0 A 2014-01-02 02:00:00 A 1
1 B 2014-01-02 03:00:00 B B 2
2 2014-01-02 04:00:00 C C NaN
3 C NaT C 4

So we generate a mask of the None values using applymap, we then use this mask to iterate over each column of interest and using the boolean mask set the values.

Replace null with empty string when writing Spark dataframe

check this out. you can when and otherwise.

    df.show()

#InputDF
# +-------------+----------+
# |UNIQUE_MEM_ID| DATE|
# +-------------+----------+
# | 1156| null|
# | 3787|2016-07-05|
# | 1156| null|
# +-------------+----------+


df.withColumn("DATE", F.when(F.col("DATE").isNull(), '').otherwise(F.col("DATE"))).show()

#OUTPUTDF
# +-------------+----------+
# |UNIQUE_MEM_ID| DATE|
# +-------------+----------+
# | 1156| |
# | 3787|2016-07-05|
# | 1156| |
# +-------------+----------+

To apply the above logic to all the columns of dataframe. you can use for loop and iterate through columns and fill empty string when column value is null.

 df.select( *[ F.when(F.col(column).isNull(),'').otherwise(F.col(column)).alias(column) for column in df.columns]).show()


Related Topics



Leave a reply



Submit