One Function to Detect Nan, Na, Inf, -Inf, etc.

Error in a function for Replacing all nan values in a dataframe

You can define the type of null values when you read the file using pd.read_csv(). Per the docs:

na_values : scalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

In your case, you can try:

data=pd.read_csv("diabetes.csv", na_values=["_","-","?","","na","n/a"])

Fill in missing values by group in data.table

There is now a native data.table way of filling missing values (as of 1.12.4).

This question spawned a github issue which was recently closed with the creation of functions nafill and setnafill. You can now use

DT[, value_filled_in := nafill(value, type = "locf")]

It is also possible to fill NA with a constant value or next observation carried back.

One difference to the approach in the question is that these functions currently only work on NA not NaN whereas is.na is TRUE for NaN - this is planned to be fixed in the next release through an extra argument.

I have no involvement with the project but I saw that although the github issue links here, there was no link the other way so I'm answering on behalf of future visitors.

Update: By default NaN is now treated same as NA.

pandas replace np.nan based on multiple conditions

Try rewriting your np.where statement:

df['is_less'] = np.where( (df['A'].isnull()) | (df['B'].isnull() ),np.nan, # check if A or B are np.nan
np.where(df['B'].ge(df['A']),'no','yes')) # check if B >= A

prints:

      A     B is_less
0 NaN 10.0 nan
1 10.0 NaN nan
2 1.0 5.0 no
3 5.0 1.0 yes

Greater than or equal

  • pandas.ge

Is there a python function to fill missing data with consecutive value

One way is to use loc with an array:

df.loc[df['b'].isnull(), 'b'] = [1, 2]

What you're attempting is possible but cumbersome with fillna:

nulls = df['b'].isnull()
df['b'] = df['b'].fillna(pd.Series([1, 2], index=nulls[nulls].index))

You may be looking for interpolate but the above solutions are generic given an input list or array.

If, on the other hand, you want to fill nulls with a sequence 1, 2, 3, etc, you can use cumsum:

# fillna solution
df['b'] = df['b'].fillna(df['b'].isnull().cumsum())

# loc solution
nulls = df['b'].isnull()
df.loc[nulls, 'b'] = nulls.cumsum()

How to find the difference between elements, ignoring NA values

you can filter out the nan then use diff

s = pd.Series([np.nan, np.nan, np.nan, '2019-12-11', np.nan, '2019-12-14', np.nan, np.nan, '2019-12-20', '2019-12-23'])
s = pd.to_datetime(s)

s[~s.isna()].diff()

# 3 NaT
# 5 3 days
# 8 6 days
# 9 3 days
# dtype: timedelta64[ns]

another option would be

s.ffill().diff()

# 0 NaT
# 1 NaT
# 2 NaT
# 3 NaT
# 4 0 days
# 5 3 days
# 6 0 days
# 7 0 days
# 8 6 days
# 9 3 days
# dtype: timedelta64[ns]

Replace NaN Values with the Means of other Cols based on Condition

You could implement the function like this:

def replace_missing_with_conditional_mean(df, condition_cols, cols):
s = df.groupby(condition_cols)[cols].transform('mean')
return df.fillna(s.to_dict('series'))

res = replace_missing_with_conditional_mean(df, ['Col1', 'Col2'], ['Col3'])
print(res)

Output

  Col1 Col2  Col3
0 A c 1.0
1 A c 3.0
2 B c 5.0
3 A d 6.0
4 A c 2.0

Cannot assign nan /empty value in np.where

I was able to fix it using pandas.NA instead which fillna, for some reason, recognizes as blanks to fill with ffill

Fix:

df['x'] = np.where(df['y']>0.05,1,pd.NA)
df['x'] = df['x'].fillna(method="ffill")


Related Topics



Leave a reply



Submit