Complete Dataframe With Missing Combinations of Values

Complete dataframe with missing combinations of values

You can use the tidyr::complete function:

complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0))

# A tibble: 14 x 3
distance years area
<fct> <dbl> <dbl>
1 100 1. 40.
2 100 2. 0.
3 100 3. 0.
4 100 4. 0.
5 100 5. 50.
6 100 6. 60.
7 100 7. 0.
8 NPR 1. 0.
9 NPR 2. 0.
10 NPR 3. 10.
11 NPR 4. 20.
12 NPR 5. 0.
13 NPR 6. 0.
14 NPR 7. 30.

or slightly shorter:

complete(df, distance, years = 1:7, fill = list(area = 0))

How to complete data frame missing combinations while accounting for the missing ones

Here is a tidyverse solution:
First we create a copy of num then we use complete together with nesting:

library(dplyr)
library(tidyr)

df %>%
mutate(num_new = num) %>%
complete(lttrs, nesting(num_new)) %>%
data.frame()
 lttrs num_new num
1 a 1 1
2 a 2 2
3 a 3 NA
4 a 4 4
5 a 5 5
6 a 6 NA
7 a 7 7
8 a 8 NA
9 a 9 NA
10 a 10 NA
11 b 1 1
12 b 2 2
13 b 3 3
14 b 4 NA
15 b 5 NA
16 b 6 NA
17 b 7 7
18 b 8 NA
19 b 9 9
20 b 10 NA
21 c 1 NA
22 c 2 NA
23 c 3 3
24 c 4 NA
25 c 5 5
26 c 6 6
27 c 7 7
28 c 8 NA
29 c 9 NA
30 c 10 10
31 d 1 NA
32 d 2 2
33 d 3 NA
34 d 4 4
35 d 5 5
36 d 6 NA
37 d 7 NA
38 d 8 8
39 d 9 9
40 d 10 NA
41 e 1 1
42 e 2 2
43 e 3 3
44 e 4 NA
45 e 5 NA
46 e 6 NA
47 e 7 NA
48 e 8 8
49 e 9 9
50 e 10 NA

Adding values for missing data combinations in Pandas

create a MultiIndex by MultiIndex.from_product() and then set_index(), reindex(), reset_index().

import pandas as pd
import io

all_person_ids = [0, 1, 2]
all_statuses = ['pass', 'fail']
all_years = [1980, 1981, 1982]
df = pd.read_csv(io.BytesIO("""person_id status year count
0 pass 1980 4
0 fail 1982 1
1 pass 1981 2"""), delim_whitespace=True)
names = ["person_id", "status", "year"]

mind = pd.MultiIndex.from_product(
[all_person_ids, all_statuses, all_years], names=names)
df.set_index(names).reindex(mind, fill_value=0).reset_index()

Fill missing combinations in a dataframe

Using complete from tidyr:

library(tidyr)
as.data.frame(complete(df,REGION,CATEGORY,fill=list(VALUE1=0,VALUE2=0)))

Output:

    REGION CATEGORY VALUE1 VALUE2
1 REGION A A 2 1
2 REGION A B 3 2
3 REGION B A 0 0
4 REGION B B 4 3

If there are many variables, you could also just do as.data.frame(complete(df,REGION,CATEGORY)) and replace the NA's afterwards.

Hope this helps!

How to fill rows with missing combinations pandas

Set the index of dataframe to time then reindex the time column per id and fill the NaN values in val column with b

(
foo
.set_index('time').groupby('id')
.apply(lambda g: g.reindex(range(1, g.index.max() + 1)))
.drop('id', axis=1).fillna({'val': 'b'}).reset_index()
)

If you want to try something :fancy:, here is another solution:

(
foo.groupby('id')['time'].max()
.map(range).explode().add(1).reset_index(name='time')
.merge(foo, how='left').fillna({'val': 'b'})
)


    id  time val
0 1 1 b
1 1 2 a
2 1 3 a
3 1 4 b
4 1 5 a
5 2 1 a
6 2 2 b
7 2 3 a
8 2 4 a
9 3 1 a
10 3 2 a
11 3 3 b
12 3 4 b
13 3 5 b
14 3 6 a
15 3 7 a
16 3 8 a

Complete a data.frame with new values by group

You can complete the missing observations per id :

library(dplyr)

df %>% group_by(id) %>% tidyr::complete(year = min(year):max(year), semester)

# id year semester
# <dbl> <dbl> <dbl>
# 1 1 2000 1
# 2 1 2000 2
# 3 1 2001 1
# 4 1 2001 2
# 5 2 1999 1
# 6 2 1999 2
# 7 2 2000 1
# 8 2 2000 2
# 9 2 2001 1
#10 2 2001 2

Fill a list/pandas.dataframe with all the missing data combinations (like complete() in R)

You could use a reindex.

First you'll need a list of the valid (type, food) pairs. I'll get it from the data itself, rather than writing them out.

In [88]: kinds = list(df[['Type', 'Food']].drop_duplicates().itertuples(index=False))

In [89]: kinds
Out[89]:
[('Fruit', 'Banana'),
('Fruit', 'Apple'),
('Vegetable', 'Broccoli'),
('Vegetable', 'Lettuce'),
('Vegetable', 'Peppers'),
('Vegetable', 'Corn'),
('Seasoning', 'Olive Oil'),
('Seasoning', 'Vinegar')]

Now we'll generate all the pairs for those kinds with the houses using itertools.product.

In [93]: from itertools import product

In [94]: houses = ['House-%s' % x for x in range(1, 8)]

In [95]: idx = [(x.Type, x.Food, house) for x, house in product(kinds, houses)]

In [96]: idx[:2]
Out[96]: [('Fruit', 'Banana', 'House-1'), ('Fruit', 'Banana', 'House-2')]

And now you can use set_index and reindex to get the missing observations.

In [98]: df.set_index(['Type', 'Food', 'Loc']).reindex(idx, fill_value=0)
Out[98]:
Num
Type Food Loc
Fruit Banana House-1 15
House-2 4
House-3 0
House-4 0
House-5 0
... ...
Seasoning Vinegar House-3 0
House-4 0
House-5 0
House-6 0
House-7 2

[56 rows x 1 columns]

Dataframe to fill in with missing values - complete() function

Try complete as follows -

df2 <- tidyr::complete(df2, ID = unique(df$ID), fill = list(dim = 0))

Fill missing combinations with ones in a groupby object

We can do pivot_table then stack

out = df.pivot_table(index='date',columns='group',values='ret',aggfunc = 'mean').fillna(1).stack().reset_index(name='value')
date group value
0 1986-01-31 1 1.1
1 1986-01-31 2 1.5
2 1986-01-31 3 1.1
3 1986-02-28 1 1.0
4 1986-02-28 2 1.2
5 1986-02-28 3 1.0

Pandas: Create missing combination rows with zero values

Another way using unstack with fill_value=0 and stack, reset_index

df.set_index(['col1','col2']).unstack(fill_value=0).stack().reset_index()

Out[311]:
col1 col2 value
0 1 A 2
1 1 B 4
2 1 C 0
3 2 A 6
4 2 B 8
5 2 C 10


Related Topics



Leave a reply



Submit