creating and filling empty dates with zeroes
Use:
#added parse_dates for datetimes
df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/x_restock.csv',
parse_dates=['Date'])
First solution is for add complete range of datetimes from minimal and maximal datetimes in DataFrame.reindex
by MultiIndex.from_product
:
mux = pd.MultiIndex.from_product([df['Product_ID'].unique(),
pd.date_range(df.Date.min(), df.Date.max())],
names=['Product_ID','Dates'])
df1 = df.set_index(['Product_ID','Date']).reindex(mux, fill_value=0).reset_index()
print (df1)
Product_ID Dates restocking_events
0 1004746 2021-11-13 0
1 1004746 2021-11-14 0
2 1004746 2021-11-15 0
3 1004746 2021-11-16 1
4 1004746 2021-11-17 0
... ... ...
3379 976460 2021-11-26 1
3380 976460 2021-11-27 0
3381 976460 2021-11-28 0
3382 976460 2021-11-29 0
3383 976460 2021-11-30 0
[3384 rows x 3 columns]
Another idea with helper DataFrame:
from itertools import product
dfdate=pd.DataFrame(product(df['Product_ID'].unique(),
pd.date_range(df.Date.min(), df.Date.max())),
columns=['Product_ID','Date'])
print (dfdate)
Product_ID Date
0 1004746 2021-11-13
1 1004746 2021-11-14
2 1004746 2021-11-15
3 1004746 2021-11-16
4 1004746 2021-11-17
... ...
3379 976460 2021-11-26
3380 976460 2021-11-27
3381 976460 2021-11-28
3382 976460 2021-11-29
3383 976460 2021-11-30
[3384 rows x 2 columns]
df = dfdate.merge(df, how='left').fillna({'restocking_events':0}, downcast='int')
print (df)
Product_ID Date restocking_events
0 1004746 2021-11-13 0
1 1004746 2021-11-14 0
2 1004746 2021-11-15 0
3 1004746 2021-11-16 1
4 1004746 2021-11-17 0
... ... ...
3379 976460 2021-11-26 1
3380 976460 2021-11-27 0
3381 976460 2021-11-28 0
3382 976460 2021-11-29 0
3383 976460 2021-11-30 0
[3384 rows x 3 columns]
Or if need consecutive datetimes per groups use DataFrame.asfreq
:
df2 = (df.set_index('Date')
.groupby('Product_ID')['restocking_events']
.apply(lambda x: x.asfreq('d', fill_value=0))
.reset_index())
print (df2)
Product_ID Date restocking_events
0 112714 2021-11-15 1
1 112714 2021-11-16 1
2 112714 2021-11-17 0
3 112714 2021-11-18 1
4 112714 2021-11-19 0
... ... ...
2209 3630918 2021-11-25 0
2210 3630918 2021-11-26 0
2211 3630918 2021-11-27 0
2212 3630918 2021-11-28 0
2213 3630918 2021-11-29 1
[2214 rows x 3 columns]
Pandas filling missing dates and values within group with duplicate index values
Here is one way, reindexing each user
to have a date range from your minimum date to your maximum date:
# setup your dataframe as you had it before:
x = pandas.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-06','2016-01-06'], 'val': [1,33,2,1,2]})
udates=x['dt'].unique()
x['dt'] = pandas.to_datetime(x['dt'])
# fill with new dates:
filled_df = (x.set_index('dt')
.groupby('user')
.apply(lambda d: d.reindex(pd.date_range(min(x.dt),
max(x.dt),
freq='D')))
.drop('user', axis=1)
.reset_index('user')
.fillna(0))
>>> filled_df
user val
2016-01-01 a 1.0
2016-01-02 a 33.0
2016-01-03 a 0.0
2016-01-04 a 0.0
2016-01-05 a 0.0
2016-01-06 a 2.0
2016-01-01 b 0.0
2016-01-02 b 0.0
2016-01-03 b 0.0
2016-01-04 b 0.0
2016-01-05 b 2.0
2016-01-06 b 1.0
Pandas filling missing date values with a constant date
Convert values to datetimes with non datetimes to NaT
, so possible replacement by fillna
:
df['termination_date'] = (pd.to_datetime(df['termination_date'], errors='coerce')
.fillna(pd.to_datetime('2020-07-31')))
#because same times 00:00:00 are not shown
print (df)
termination_date
0 2020-06-28
1 2020-07-31
2 2020-07-13
3 2020-08-11
4 2020-07-31
5 2020-08-11
print(df['termination_date'].tolist())
[Timestamp('2020-06-28 00:00:00'), Timestamp('2020-07-31 00:00:00'),
Timestamp('2020-07-13 00:00:00'), Timestamp('2020-08-11 00:00:00'),
Timestamp('2020-07-31 00:00:00'), Timestamp('2020-08-11 00:00:00')]
print (df.termination_date.dtypes)
datetime64[ns]
Pandas fill missing values in dataframe from another dataframe
If you have two DataFrames of the same shape, then:
df[df.isnull()] = d2
Will do the trick.
Only locations where df.isnull()
evaluates to True
(highlighted in green) will be eligible for assignment.
In practice, the DataFrames aren't always the same size / shape, and transforming methods (especially .shift()
) are useful.
Data coming in is invariably dirty, incomplete, or inconsistent. Par for the course. There's a pretty extensive pandas tutorial and associated cookbook for dealing with these situations.
Related Topics
Key Error When Selecting Columns in Pandas Dataframe After Read_Csv
Remove White Space from Entire Dataframe
Python Pandas Valueerror Arrays Must Be All Same Length
Taking Data from Drop-Down Menu Using Flask
What Is the Fastest Way to Stack Numpy Arrays in a Loop
Pandas - Find Rows With Matching Values in Two Columns and Multiply Value in Another Column
How to Perform Union on Two Dataframes With Different Amounts of Columns in Spark
How to Remove Words in a Column in Pandas
Python - How to Make User Input Not Case Sensitive
How to Remove Any Url Within a String in Python
Easiest Way to Replace a String Using a Dictionary of Replacements
How Does \R (Carriage Return) Work in Python
How to Close a Tkinter Window by Pressing a Button
How to Make Type Cast for Python Custom Class
How to Compare 2 Indexes in Same List in Python
How to Get Maximum Length of Each Column in the Data Frame Using Pandas Python