﻿ Complete Dataframe With Missing Combinations of Values - ITCodar

# Complete Dataframe With Missing Combinations of Values

## Complete dataframe with missing combinations of values

You can use the `tidyr::complete` function:

``complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0))# A tibble: 14 x 3   distance years  area   <fct>    <dbl> <dbl> 1 100         1.   40. 2 100         2.    0. 3 100         3.    0. 4 100         4.    0. 5 100         5.   50. 6 100         6.   60. 7 100         7.    0. 8 NPR         1.    0. 9 NPR         2.    0.10 NPR         3.   10.11 NPR         4.   20.12 NPR         5.    0.13 NPR         6.    0.14 NPR         7.   30.``

or slightly shorter:

``complete(df, distance, years = 1:7, fill = list(area = 0))``

## How to complete data frame missing combinations while accounting for the missing ones

Here is a tidyverse solution:
First we create a copy of `num` then we use `complete` together with `nesting`:

``library(dplyr)library(tidyr)df %>%   mutate(num_new = num) %>%   complete(lttrs, nesting(num_new)) %>%   data.frame()``
`` lttrs num_new num1      a       1   12      a       2   23      a       3  NA4      a       4   45      a       5   56      a       6  NA7      a       7   78      a       8  NA9      a       9  NA10     a      10  NA11     b       1   112     b       2   213     b       3   314     b       4  NA15     b       5  NA16     b       6  NA17     b       7   718     b       8  NA19     b       9   920     b      10  NA21     c       1  NA22     c       2  NA23     c       3   324     c       4  NA25     c       5   526     c       6   627     c       7   728     c       8  NA29     c       9  NA30     c      10  1031     d       1  NA32     d       2   233     d       3  NA34     d       4   435     d       5   536     d       6  NA37     d       7  NA38     d       8   839     d       9   940     d      10  NA41     e       1   142     e       2   243     e       3   344     e       4  NA45     e       5  NA46     e       6  NA47     e       7  NA48     e       8   849     e       9   950     e      10  NA``

## Adding values for missing data combinations in Pandas

create a MultiIndex by MultiIndex.from_product() and then `set_index()`, `reindex()`, `reset_index()`.

``import pandas as pdimport ioall_person_ids = [0, 1, 2]all_statuses = ['pass', 'fail']all_years = [1980, 1981, 1982]df = pd.read_csv(io.BytesIO("""person_id   status    year    count0           pass    1980    40           fail    1982    11           pass    1981    2"""), delim_whitespace=True)names = ["person_id", "status", "year"]mind = pd.MultiIndex.from_product(    [all_person_ids, all_statuses, all_years], names=names)df.set_index(names).reindex(mind, fill_value=0).reset_index()``

## Fill missing combinations in a dataframe

Using `complete` from tidyr:

``library(tidyr)as.data.frame(complete(df,REGION,CATEGORY,fill=list(VALUE1=0,VALUE2=0)))``

Output:

``    REGION CATEGORY VALUE1 VALUE21 REGION A        A      2      12 REGION A        B      3      23 REGION B        A      0      04 REGION B        B      4      3``

If there are many variables, you could also just do `as.data.frame(complete(df,REGION,CATEGORY))` and replace the `NA`'s afterwards.

Hope this helps!

## How to fill rows with missing combinations pandas

Set the index of dataframe to `time` then `reindex` the `time` column per `id` and fill the `NaN` values in `val` column with `b`

``(    foo    .set_index('time').groupby('id')    .apply(lambda g: g.reindex(range(1, g.index.max() + 1)))     .drop('id', axis=1).fillna({'val': 'b'}).reset_index())``

If you want to try something :fancy:, here is another solution:

``(    foo.groupby('id')['time'].max()      .map(range).explode().add(1).reset_index(name='time')      .merge(foo, how='left').fillna({'val': 'b'}))``

``    id  time val0    1     1   b1    1     2   a2    1     3   a3    1     4   b4    1     5   a5    2     1   a6    2     2   b7    2     3   a8    2     4   a9    3     1   a10   3     2   a11   3     3   b12   3     4   b13   3     5   b14   3     6   a15   3     7   a16   3     8   a``

## Complete a data.frame with new values by group

You can `complete` the missing observations per `id` :

``library(dplyr)df %>% group_by(id) %>% tidyr::complete(year = min(year):max(year), semester)#      id  year semester#   <dbl> <dbl>    <dbl># 1     1  2000        1# 2     1  2000        2# 3     1  2001        1# 4     1  2001        2# 5     2  1999        1# 6     2  1999        2# 7     2  2000        1# 8     2  2000        2# 9     2  2001        1#10     2  2001        2``

## Fill a list/pandas.dataframe with all the missing data combinations (like complete() in R)

You could use a `reindex`.

First you'll need a list of the valid `(type, food)` pairs. I'll get it from the data itself, rather than writing them out.

``In [88]: kinds = list(df[['Type', 'Food']].drop_duplicates().itertuples(index=False))In [89]: kindsOut[89]:[('Fruit', 'Banana'), ('Fruit', 'Apple'), ('Vegetable', 'Broccoli'), ('Vegetable', 'Lettuce'), ('Vegetable', 'Peppers'), ('Vegetable', 'Corn'), ('Seasoning', 'Olive Oil'), ('Seasoning', 'Vinegar')]``

Now we'll generate all the pairs for those `kinds` with the houses using `itertools.product`.

``In [93]: from itertools import productIn [94]: houses = ['House-%s' % x for x in range(1, 8)]In [95]: idx = [(x.Type, x.Food, house) for x, house in product(kinds, houses)]In [96]: idx[:2]Out[96]: [('Fruit', 'Banana', 'House-1'), ('Fruit', 'Banana', 'House-2')]``

And now you can use `set_index` and `reindex` to get the missing observations.

``In [98]: df.set_index(['Type', 'Food', 'Loc']).reindex(idx, fill_value=0)Out[98]:                           NumType      Food    LocFruit     Banana  House-1   15                  House-2    4                  House-3    0                  House-4    0                  House-5    0...                        ...Seasoning Vinegar House-3    0                  House-4    0                  House-5    0                  House-6    0                  House-7    2[56 rows x 1 columns]``

## Dataframe to fill in with missing values - complete() function

Try `complete` as follows -

``df2 <- tidyr::complete(df2, ID = unique(df\$ID), fill = list(dim = 0))``

## Fill missing combinations with ones in a groupby object

We can do `pivot_table` then `stack`

``out = df.pivot_table(index='date',columns='group',values='ret',aggfunc = 'mean').fillna(1).stack().reset_index(name='value')         date  group  value0  1986-01-31      1    1.11  1986-01-31      2    1.52  1986-01-31      3    1.13  1986-02-28      1    1.04  1986-02-28      2    1.25  1986-02-28      3    1.0``

## Pandas: Create missing combination rows with zero values

Another way using `unstack` with `fill_value=0` and `stack`, `reset_index`

``df.set_index(['col1','col2']).unstack(fill_value=0).stack().reset_index()Out[311]:   col1 col2  value0     1    A      21     1    B      42     1    C      03     2    A      64     2    B      85     2    C     10``