Fill Missing Combinations in a Dataframe

Adding values for missing data combinations in Pandas

create a MultiIndex by MultiIndex.from_product() and then set_index(), reindex(), reset_index().

import pandas as pd
import io

all_person_ids = [0, 1, 2]
all_statuses = ['pass', 'fail']
all_years = [1980, 1981, 1982]
df = pd.read_csv(io.BytesIO("""person_id   status    year    count
0           pass    1980    4
0           fail    1982    1
1           pass    1981    2"""), delim_whitespace=True)
names = ["person_id", "status", "year"]

mind = pd.MultiIndex.from_product(
    [all_person_ids, all_statuses, all_years], names=names)
df.set_index(names).reindex(mind, fill_value=0).reset_index()

Fill missing combinations with ones in a groupby object

We can do pivot_table then stack

out = df.pivot_table(index='date',columns='group',values='ret',aggfunc = 'mean').fillna(1).stack().reset_index(name='value')
         date  group  value
0  1986-01-31      1    1.1
1  1986-01-31      2    1.5
2  1986-01-31      3    1.1
3  1986-02-28      1    1.0
4  1986-02-28      2    1.2
5  1986-02-28      3    1.0

Complete dataframe with missing combinations of values

You can use the tidyr::complete function:

complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0))

# A tibble: 14 x 3
   distance years  area
   <fct>    <dbl> <dbl>
 1 100         1.   40.
 2 100         2.    0.
 3 100         3.    0.
 4 100         4.    0.
 5 100         5.   50.
 6 100         6.   60.
 7 100         7.    0.
 8 NPR         1.    0.
 9 NPR         2.    0.
10 NPR         3.   10.
11 NPR         4.   20.
12 NPR         5.    0.
13 NPR         6.    0.
14 NPR         7.   30.

or slightly shorter:

complete(df, distance, years = 1:7, fill = list(area = 0))

Pandas: Create missing combination rows with zero values

Another way using unstack with fill_value=0 and stack, reset_index

df.set_index(['col1','col2']).unstack(fill_value=0).stack().reset_index()

Out[311]:
   col1 col2  value
0     1    A      2
1     1    B      4
2     1    C      0
3     2    A      6
4     2    B      8
5     2    C     10

Fill missing combinations in a dataframe

Using complete from tidyr:

library(tidyr)
as.data.frame(complete(df,REGION,CATEGORY,fill=list(VALUE1=0,VALUE2=0)))

Output:

    REGION CATEGORY VALUE1 VALUE2
1 REGION A        A      2      1
2 REGION A        B      3      2
3 REGION B        A      0      0
4 REGION B        B      4      3

If there are many variables, you could also just do as.data.frame(complete(df,REGION,CATEGORY)) and replace the NA's afterwards.

Hope this helps!

Fill a list/pandas.dataframe with all the missing data combinations (like complete() in R)

You could use a reindex.

First you'll need a list of the valid (type, food) pairs. I'll get it from the data itself, rather than writing them out.

In [88]: kinds = list(df[['Type', 'Food']].drop_duplicates().itertuples(index=False))

In [89]: kinds
Out[89]:
[('Fruit', 'Banana'),
 ('Fruit', 'Apple'),
 ('Vegetable', 'Broccoli'),
 ('Vegetable', 'Lettuce'),
 ('Vegetable', 'Peppers'),
 ('Vegetable', 'Corn'),
 ('Seasoning', 'Olive Oil'),
 ('Seasoning', 'Vinegar')]

Now we'll generate all the pairs for those kinds with the houses using itertools.product.

In [93]: from itertools import product

In [94]: houses = ['House-%s' % x for x in range(1, 8)]

In [95]: idx = [(x.Type, x.Food, house) for x, house in product(kinds, houses)]

In [96]: idx[:2]
Out[96]: [('Fruit', 'Banana', 'House-1'), ('Fruit', 'Banana', 'House-2')]

And now you can use set_index and reindex to get the missing observations.

In [98]: df.set_index(['Type', 'Food', 'Loc']).reindex(idx, fill_value=0)
Out[98]:
                           Num
Type      Food    Loc
Fruit     Banana  House-1   15
                  House-2    4
                  House-3    0
                  House-4    0
                  House-5    0
...                        ...
Seasoning Vinegar House-3    0
                  House-4    0
                  House-5    0
                  House-6    0
                  House-7    2

[56 rows x 1 columns]

How to complete data frame missing combinations while accounting for the missing ones

Here is a tidyverse solution:
First we create a copy of num then we use complete together with nesting:

library(dplyr)
library(tidyr)

df %>% 
  mutate(num_new = num) %>% 
  complete(lttrs, nesting(num_new)) %>% 
  data.frame()

 lttrs num_new num
1      a       1   1
2      a       2   2
3      a       3  NA
4      a       4   4
5      a       5   5
6      a       6  NA
7      a       7   7
8      a       8  NA
9      a       9  NA
10     a      10  NA
11     b       1   1
12     b       2   2
13     b       3   3
14     b       4  NA
15     b       5  NA
16     b       6  NA
17     b       7   7
18     b       8  NA
19     b       9   9
20     b      10  NA
21     c       1  NA
22     c       2  NA
23     c       3   3
24     c       4  NA
25     c       5   5
26     c       6   6
27     c       7   7
28     c       8  NA
29     c       9  NA
30     c      10  10
31     d       1  NA
32     d       2   2
33     d       3  NA
34     d       4   4
35     d       5   5
36     d       6  NA
37     d       7  NA
38     d       8   8
39     d       9   9
40     d      10  NA
41     e       1   1
42     e       2   2
43     e       3   3
44     e       4  NA
45     e       5  NA
46     e       6  NA
47     e       7  NA
48     e       8   8
49     e       9   9
50     e      10  NA

Pandas: fill missing value based on combination in dataframe

df = df.replace('missing', np.nan).sort_values(['postal code', 'district'])
df.groupby('postal code').ffill().sort_index()

   postal code district
0        10001    North
1        10002     West
2        10001    North

I sort because np.nan will be placed at the end and ready to be forward filled.

Filling Missing Dates for a combination of columns

You can use:

from  itertools import product

#get all unique combinations of columns
COLS_COMBO = df_1[['COL1','COL2']].drop_duplicates().values.tolist()
#remove times and create MS date range
dates = df_1['Date'].dt.floor('d')
months_range = pd.date_range(dates.min(), dates.max(), freq='MS')
print(COLS_COMBO)
print(months_range)

#create all combinations of values
df = pd.DataFrame([(c, a, b) for (a, b), c in product(COLS_COMBO, months_range)], 
                   columns=['Date','COL1','COL2'])
print (df)
         Date COL1 COL2
0  2018-01-01    A    1
1  2018-02-01    A    1
2  2018-03-01    A    1
3  2018-04-01    A    1
4  2018-05-01    A    1
5  2018-01-01    A    2
6  2018-02-01    A    2
7  2018-03-01    A    2
8  2018-04-01    A    2
9  2018-05-01    A    2
10 2018-01-01    B    1
11 2018-02-01    B    1
12 2018-03-01    B    1
13 2018-04-01    B    1
14 2018-05-01    B    1
15 2018-01-01    B    2
16 2018-02-01    B    2
17 2018-03-01    B    2
18 2018-04-01    B    2
19 2018-05-01    B    2

#add to original df_1 and remove duplicates
df_1 = pd.concat([df_1, df], ignore_index=True).drop_duplicates()
print (df_1)
         Date COL1 COL2
0  2018-01-01    A    1
1  2018-02-01    A    2
2  2018-03-01    B    1
3  2018-05-01    B    2
4  2018-05-01    A    1
6  2018-02-01    A    1
7  2018-03-01    A    1
8  2018-04-01    A    1
10 2018-01-01    A    2
12 2018-03-01    A    2
13 2018-04-01    A    2
14 2018-05-01    A    2
15 2018-01-01    B    1
16 2018-02-01    B    1
18 2018-04-01    B    1
19 2018-05-01    B    1
20 2018-01-01    B    2
21 2018-02-01    B    2
22 2018-03-01    B    2
23 2018-04-01    B    2