Complete Dataframe With Missing Combinations of Values

Complete dataframe with missing combinations of values

You can use the tidyr::complete function:

complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0))

# A tibble: 14 x 3
   distance years  area
   <fct>    <dbl> <dbl>
 1 100         1.   40.
 2 100         2.    0.
 3 100         3.    0.
 4 100         4.    0.
 5 100         5.   50.
 6 100         6.   60.
 7 100         7.    0.
 8 NPR         1.    0.
 9 NPR         2.    0.
10 NPR         3.   10.
11 NPR         4.   20.
12 NPR         5.    0.
13 NPR         6.    0.
14 NPR         7.   30.

or slightly shorter:

complete(df, distance, years = 1:7, fill = list(area = 0))

How to complete data frame missing combinations while accounting for the missing ones

Here is a tidyverse solution:
First we create a copy of num then we use complete together with nesting:

library(dplyr)
library(tidyr)

df %>% 
  mutate(num_new = num) %>% 
  complete(lttrs, nesting(num_new)) %>% 
  data.frame()

 lttrs num_new num
1      a       1   1
2      a       2   2
3      a       3  NA
4      a       4   4
5      a       5   5
6      a       6  NA
7      a       7   7
8      a       8  NA
9      a       9  NA
10     a      10  NA
11     b       1   1
12     b       2   2
13     b       3   3
14     b       4  NA
15     b       5  NA
16     b       6  NA
17     b       7   7
18     b       8  NA
19     b       9   9
20     b      10  NA
21     c       1  NA
22     c       2  NA
23     c       3   3
24     c       4  NA
25     c       5   5
26     c       6   6
27     c       7   7
28     c       8  NA
29     c       9  NA
30     c      10  10
31     d       1  NA
32     d       2   2
33     d       3  NA
34     d       4   4
35     d       5   5
36     d       6  NA
37     d       7  NA
38     d       8   8
39     d       9   9
40     d      10  NA
41     e       1   1
42     e       2   2
43     e       3   3
44     e       4  NA
45     e       5  NA
46     e       6  NA
47     e       7  NA
48     e       8   8
49     e       9   9
50     e      10  NA

Adding values for missing data combinations in Pandas

create a MultiIndex by MultiIndex.from_product() and then set_index(), reindex(), reset_index().

import pandas as pd
import io

all_person_ids = [0, 1, 2]
all_statuses = ['pass', 'fail']
all_years = [1980, 1981, 1982]
df = pd.read_csv(io.BytesIO("""person_id   status    year    count
0           pass    1980    4
0           fail    1982    1
1           pass    1981    2"""), delim_whitespace=True)
names = ["person_id", "status", "year"]

mind = pd.MultiIndex.from_product(
    [all_person_ids, all_statuses, all_years], names=names)
df.set_index(names).reindex(mind, fill_value=0).reset_index()

Fill missing combinations in a dataframe

Using complete from tidyr:

library(tidyr)
as.data.frame(complete(df,REGION,CATEGORY,fill=list(VALUE1=0,VALUE2=0)))

Output:

    REGION CATEGORY VALUE1 VALUE2
1 REGION A        A      2      1
2 REGION A        B      3      2
3 REGION B        A      0      0
4 REGION B        B      4      3

If there are many variables, you could also just do as.data.frame(complete(df,REGION,CATEGORY)) and replace the NA's afterwards.

Hope this helps!

How to fill rows with missing combinations pandas

Set the index of dataframe to time then reindex the time column per id and fill the NaN values in val column with b

(
    foo
    .set_index('time').groupby('id')
    .apply(lambda g: g.reindex(range(1, g.index.max() + 1))) 
    .drop('id', axis=1).fillna({'val': 'b'}).reset_index()
)

If you want to try something :fancy:, here is another solution:

(
    foo.groupby('id')['time'].max()
      .map(range).explode().add(1).reset_index(name='time')
      .merge(foo, how='left').fillna({'val': 'b'})
)

    id  time val
0    1     1   b
1    1     2   a
2    1     3   a
3    1     4   b
4    1     5   a
5    2     1   a
6    2     2   b
7    2     3   a
8    2     4   a
9    3     1   a
10   3     2   a
11   3     3   b
12   3     4   b
13   3     5   b
14   3     6   a
15   3     7   a
16   3     8   a

Complete a data.frame with new values by group

You can complete the missing observations per id :

library(dplyr)

df %>% group_by(id) %>% tidyr::complete(year = min(year):max(year), semester)

#      id  year semester
#   <dbl> <dbl>    <dbl>
# 1     1  2000        1
# 2     1  2000        2
# 3     1  2001        1
# 4     1  2001        2
# 5     2  1999        1
# 6     2  1999        2
# 7     2  2000        1
# 8     2  2000        2
# 9     2  2001        1
#10     2  2001        2

Fill a list/pandas.dataframe with all the missing data combinations (like complete() in R)

You could use a reindex.

First you'll need a list of the valid (type, food) pairs. I'll get it from the data itself, rather than writing them out.

In [88]: kinds = list(df[['Type', 'Food']].drop_duplicates().itertuples(index=False))

In [89]: kinds
Out[89]:
[('Fruit', 'Banana'),
 ('Fruit', 'Apple'),
 ('Vegetable', 'Broccoli'),
 ('Vegetable', 'Lettuce'),
 ('Vegetable', 'Peppers'),
 ('Vegetable', 'Corn'),
 ('Seasoning', 'Olive Oil'),
 ('Seasoning', 'Vinegar')]

Now we'll generate all the pairs for those kinds with the houses using itertools.product.

In [93]: from itertools import product

In [94]: houses = ['House-%s' % x for x in range(1, 8)]

In [95]: idx = [(x.Type, x.Food, house) for x, house in product(kinds, houses)]

In [96]: idx[:2]
Out[96]: [('Fruit', 'Banana', 'House-1'), ('Fruit', 'Banana', 'House-2')]

And now you can use set_index and reindex to get the missing observations.

In [98]: df.set_index(['Type', 'Food', 'Loc']).reindex(idx, fill_value=0)
Out[98]:
                           Num
Type      Food    Loc
Fruit     Banana  House-1   15
                  House-2    4
                  House-3    0
                  House-4    0
                  House-5    0
...                        ...
Seasoning Vinegar House-3    0
                  House-4    0
                  House-5    0
                  House-6    0
                  House-7    2

[56 rows x 1 columns]

Dataframe to fill in with missing values - complete() function

Try complete as follows -

df2 <- tidyr::complete(df2, ID = unique(df$ID), fill = list(dim = 0))

Fill missing combinations with ones in a groupby object

We can do pivot_table then stack

out = df.pivot_table(index='date',columns='group',values='ret',aggfunc = 'mean').fillna(1).stack().reset_index(name='value')
         date  group  value
0  1986-01-31      1    1.1
1  1986-01-31      2    1.5
2  1986-01-31      3    1.1
3  1986-02-28      1    1.0
4  1986-02-28      2    1.2
5  1986-02-28      3    1.0

Pandas: Create missing combination rows with zero values

Another way using unstack with fill_value=0 and stack, reset_index

df.set_index(['col1','col2']).unstack(fill_value=0).stack().reset_index()

Out[311]:
   col1 col2  value
0     1    A      2
1     1    B      4
2     1    C      0
3     2    A      6
4     2    B      8
5     2    C     10

Complete Dataframe With Missing Combinations of Values