Pandas fill missing values in dataframe from another dataframe
If you have two DataFrames of the same shape, then:
df[df.isnull()] = d2
Will do the trick.
Only locations where df.isnull()
evaluates to True
(highlighted in green) will be eligible for assignment.
In practice, the DataFrames aren't always the same size / shape, and transforming methods (especially .shift()
) are useful.
Data coming in is invariably dirty, incomplete, or inconsistent. Par for the course. There's a pretty extensive pandas tutorial and associated cookbook for dealing with these situations.
How to fill missing data from a dataframe with another dataframe when having a common key
Try this list comprehension:
df['GrossRate'] = [x if x != 0 else y for x, y in zip(df['GrossRate'], dfB['GrossRate'])]
Fill missing values of 1 data frame from another data frame using pandas
try this:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A":["a", "b", "c", "d", "e"], "B":[1, 2, 0, 0, 0]})
s = pd.Series([10, 20, 30, 40], index=["a", "b", "c", "d"])
mask = df["B"] == 0
df.loc[mask, "B"] = s[df.loc[mask, "A"]].values
df:
A B
0 a 1
1 b 2
2 c 0
3 d 0
4 e 0
s:
a 10
b 20
c 30
d 40
dtype: int64
output:
A B
0 a 1.0
1 b 2.0
2 c 30.0
3 d 40.0
4 e NaN
Fill Nulls with values in another dataframe in pandas
My standard method is to combine series.replace / series.fillna with series.map(dict).
fill_dict = dataframe2.set_index('Col1')['Col2'].to_dict()
dataframe1['Col2'] = dataframe1['Col2'].replace('Null', dataframe1['Col1'].map(fill_dict))
How to fill missing values in DataFrame using another DataFrame in Pandas
One way would be to melt
and resample
your df2
and create a dictionary to map
back to df1
:
#make sure columns are in datetime format
df1['sprint_created'] = pd.to_datetime(df1['sprint_created'])
df2['sprint_start'] = pd.to_datetime(df2['sprint_start'])
df2['sprint_end'] = pd.to_datetime(df2['sprint_end'])
#melt dataframe of the two date columns and resample by group
new = (df2.melt(id_vars='sprint').drop('variable', axis=1).set_index('value')
.groupby('sprint', group_keys=False).resample('D').ffill().reset_index())
#create dictionary of date and the sprint and map back to df1
dct = dict(zip(new['value'], new['sprint']))
df1['sprint'] = df1['sprint_created'].map(dct)
#or df1['sprint'] = df1['sprint'].fillna(df1['sprint_created'].map(dct))
df1
Out[1]:
sprint sprint_created
0 S100 2020-01-01
1 S101 2020-01-10
2 S102 2020-01-20
3 S103 2020-01-31
4 S101 2020-01-10
replacing values in a pandas dataframe with values from another dataframe based common columns
First separate the rows where you have NaN values out into a new dataframe called df3 and drop the rows where there are NaN values from df1.
Then do a left join based on the new dataframe.
df4 = pd.merge(df3,df2,how='left',on=['types','o_period'])
After that is done, append the rows from df4 back into df1.
Another way is to combine the 2 columns you want to lookup into a single column
df1["types_o"] = df1["types_o"].astype(str) + df1["o_period"].astype(str)
df2["types_o"] = df2["types_o"].astype(str) + df2["o_period"].astype(str)
Then you can do a look up on the missing values.
df1.types_o.replace('Nan', np.NaN, inplace=True)
df1.loc[df1['s_months'].isnull(),'s_months'] = df2['types_o'].map(df1.types_o)
df1.loc[df1['incidents'].isnull(),'incidents'] = df2['types_o'].map(df1.types_o)
You didn't paste any code or examples of your data which is easily reproducible so this is the best I can do.
Related Topics
Rename Specific Column(S) in Pandas
How to Get Around Declaring an Unused Variable in a for Loop
Python: Platform Independent Way to Modify Path Environment Variable
How to Run Pygame or Pyglet in a Browser
Add 'Decimal-Mark' Thousands Separators to a Number
What Can Multiprocessing and Dill Do Together
How to Select Elements of an Array Given Condition
How to Patch a Python Decorator Before It Wraps a Function
It Is More Efficient to Use If-Return-Return or If-Else-Return
How to Pass Extra Arguments to a Python Decorator
Runtimeerror: Main Thread Is Not in Main Loop
Matplotlib Savefig() Plots Different from Show()
Can Existing Virtualenv Be Upgraded Gracefully
How to Group a List of Tuples/Objects by Similar Index/Attribute in Python