Pandas Fill Missing Values in Dataframe from Another Dataframe

Pandas fill missing values in dataframe from another dataframe

If you have two DataFrames of the same shape, then:

df[df.isnull()] = d2

Will do the trick.

visual representation

Only locations where df.isnull() evaluates to True (highlighted in green) will be eligible for assignment.

In practice, the DataFrames aren't always the same size / shape, and transforming methods (especially .shift()) are useful.

Data coming in is invariably dirty, incomplete, or inconsistent. Par for the course. There's a pretty extensive pandas tutorial and associated cookbook for dealing with these situations.

How to fill missing data from a dataframe with another dataframe when having a common key

Try this list comprehension:

df['GrossRate'] = [x if x != 0 else y for x, y in zip(df['GrossRate'], dfB['GrossRate'])]

Fill missing values of 1 data frame from another data frame using pandas

try this:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A":["a", "b", "c", "d", "e"], "B":[1, 2, 0, 0, 0]})
s = pd.Series([10, 20, 30, 40], index=["a", "b", "c", "d"])

mask = df["B"] == 0
df.loc[mask, "B"] = s[df.loc[mask, "A"]].values

df:

   A  B
0 a 1
1 b 2
2 c 0
3 d 0
4 e 0

s:

a    10
b 20
c 30
d 40
dtype: int64

output:

   A     B
0 a 1.0
1 b 2.0
2 c 30.0
3 d 40.0
4 e NaN

Fill Nulls with values in another dataframe in pandas

My standard method is to combine series.replace / series.fillna with series.map(dict).

fill_dict = dataframe2.set_index('Col1')['Col2'].to_dict()
dataframe1['Col2'] = dataframe1['Col2'].replace('Null', dataframe1['Col1'].map(fill_dict))

How to fill missing values in DataFrame using another DataFrame in Pandas

One way would be to melt and resample your df2 and create a dictionary to map back to df1:

#make sure columns are in datetime format
df1['sprint_created'] = pd.to_datetime(df1['sprint_created'])
df2['sprint_start'] = pd.to_datetime(df2['sprint_start'])
df2['sprint_end'] = pd.to_datetime(df2['sprint_end'])

#melt dataframe of the two date columns and resample by group
new = (df2.melt(id_vars='sprint').drop('variable', axis=1).set_index('value')
.groupby('sprint', group_keys=False).resample('D').ffill().reset_index())

#create dictionary of date and the sprint and map back to df1
dct = dict(zip(new['value'], new['sprint']))
df1['sprint'] = df1['sprint_created'].map(dct)
#or df1['sprint'] = df1['sprint'].fillna(df1['sprint_created'].map(dct))
df1
Out[1]:
sprint sprint_created
0 S100 2020-01-01
1 S101 2020-01-10
2 S102 2020-01-20
3 S103 2020-01-31
4 S101 2020-01-10

replacing values in a pandas dataframe with values from another dataframe based common columns

First separate the rows where you have NaN values out into a new dataframe called df3 and drop the rows where there are NaN values from df1.

Then do a left join based on the new dataframe.

df4 = pd.merge(df3,df2,how='left',on=['types','o_period'])

After that is done, append the rows from df4 back into df1.

Another way is to combine the 2 columns you want to lookup into a single column

df1["types_o"] = df1["types_o"].astype(str) + df1["o_period"].astype(str)

df2["types_o"] = df2["types_o"].astype(str) + df2["o_period"].astype(str)

Then you can do a look up on the missing values.

df1.types_o.replace('Nan', np.NaN, inplace=True)

df1.loc[df1['s_months'].isnull(),'s_months'] = df2['types_o'].map(df1.types_o)

df1.loc[df1['incidents'].isnull(),'incidents'] = df2['types_o'].map(df1.types_o)

You didn't paste any code or examples of your data which is easily reproducible so this is the best I can do.



Related Topics



Leave a reply



Submit