Complete dataframe with missing combinations of values
You can use the tidyr::complete
function:
complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0))
# A tibble: 14 x 3
distance years area
<fct> <dbl> <dbl>
1 100 1. 40.
2 100 2. 0.
3 100 3. 0.
4 100 4. 0.
5 100 5. 50.
6 100 6. 60.
7 100 7. 0.
8 NPR 1. 0.
9 NPR 2. 0.
10 NPR 3. 10.
11 NPR 4. 20.
12 NPR 5. 0.
13 NPR 6. 0.
14 NPR 7. 30.
or slightly shorter:
complete(df, distance, years = 1:7, fill = list(area = 0))
How to complete data frame missing combinations while accounting for the missing ones
Here is a tidyverse solution:
First we create a copy of num
then we use complete
together with nesting
:
library(dplyr)
library(tidyr)
df %>%
mutate(num_new = num) %>%
complete(lttrs, nesting(num_new)) %>%
data.frame()
lttrs num_new num
1 a 1 1
2 a 2 2
3 a 3 NA
4 a 4 4
5 a 5 5
6 a 6 NA
7 a 7 7
8 a 8 NA
9 a 9 NA
10 a 10 NA
11 b 1 1
12 b 2 2
13 b 3 3
14 b 4 NA
15 b 5 NA
16 b 6 NA
17 b 7 7
18 b 8 NA
19 b 9 9
20 b 10 NA
21 c 1 NA
22 c 2 NA
23 c 3 3
24 c 4 NA
25 c 5 5
26 c 6 6
27 c 7 7
28 c 8 NA
29 c 9 NA
30 c 10 10
31 d 1 NA
32 d 2 2
33 d 3 NA
34 d 4 4
35 d 5 5
36 d 6 NA
37 d 7 NA
38 d 8 8
39 d 9 9
40 d 10 NA
41 e 1 1
42 e 2 2
43 e 3 3
44 e 4 NA
45 e 5 NA
46 e 6 NA
47 e 7 NA
48 e 8 8
49 e 9 9
50 e 10 NA
Adding values for missing data combinations in Pandas
create a MultiIndex by MultiIndex.from_product() and then set_index()
, reindex()
, reset_index()
.
import pandas as pd
import io
all_person_ids = [0, 1, 2]
all_statuses = ['pass', 'fail']
all_years = [1980, 1981, 1982]
df = pd.read_csv(io.BytesIO("""person_id status year count
0 pass 1980 4
0 fail 1982 1
1 pass 1981 2"""), delim_whitespace=True)
names = ["person_id", "status", "year"]
mind = pd.MultiIndex.from_product(
[all_person_ids, all_statuses, all_years], names=names)
df.set_index(names).reindex(mind, fill_value=0).reset_index()
Fill missing combinations in a dataframe
Using complete
from tidyr:
library(tidyr)
as.data.frame(complete(df,REGION,CATEGORY,fill=list(VALUE1=0,VALUE2=0)))
Output:
REGION CATEGORY VALUE1 VALUE2
1 REGION A A 2 1
2 REGION A B 3 2
3 REGION B A 0 0
4 REGION B B 4 3
If there are many variables, you could also just do as.data.frame(complete(df,REGION,CATEGORY))
and replace the NA
's afterwards.
Hope this helps!
How to fill rows with missing combinations pandas
Set the index of dataframe to time
then reindex
the time
column per id
and fill the NaN
values in val
column with b
(
foo
.set_index('time').groupby('id')
.apply(lambda g: g.reindex(range(1, g.index.max() + 1)))
.drop('id', axis=1).fillna({'val': 'b'}).reset_index()
)
If you want to try something :fancy:, here is another solution:
(
foo.groupby('id')['time'].max()
.map(range).explode().add(1).reset_index(name='time')
.merge(foo, how='left').fillna({'val': 'b'})
)
id time val
0 1 1 b
1 1 2 a
2 1 3 a
3 1 4 b
4 1 5 a
5 2 1 a
6 2 2 b
7 2 3 a
8 2 4 a
9 3 1 a
10 3 2 a
11 3 3 b
12 3 4 b
13 3 5 b
14 3 6 a
15 3 7 a
16 3 8 a
Complete a data.frame with new values by group
You can complete
the missing observations per id
:
library(dplyr)
df %>% group_by(id) %>% tidyr::complete(year = min(year):max(year), semester)
# id year semester
# <dbl> <dbl> <dbl>
# 1 1 2000 1
# 2 1 2000 2
# 3 1 2001 1
# 4 1 2001 2
# 5 2 1999 1
# 6 2 1999 2
# 7 2 2000 1
# 8 2 2000 2
# 9 2 2001 1
#10 2 2001 2
Fill a list/pandas.dataframe with all the missing data combinations (like complete() in R)
You could use a reindex
.
First you'll need a list of the valid (type, food)
pairs. I'll get it from the data itself, rather than writing them out.
In [88]: kinds = list(df[['Type', 'Food']].drop_duplicates().itertuples(index=False))
In [89]: kinds
Out[89]:
[('Fruit', 'Banana'),
('Fruit', 'Apple'),
('Vegetable', 'Broccoli'),
('Vegetable', 'Lettuce'),
('Vegetable', 'Peppers'),
('Vegetable', 'Corn'),
('Seasoning', 'Olive Oil'),
('Seasoning', 'Vinegar')]
Now we'll generate all the pairs for those kinds
with the houses using itertools.product
.
In [93]: from itertools import product
In [94]: houses = ['House-%s' % x for x in range(1, 8)]
In [95]: idx = [(x.Type, x.Food, house) for x, house in product(kinds, houses)]
In [96]: idx[:2]
Out[96]: [('Fruit', 'Banana', 'House-1'), ('Fruit', 'Banana', 'House-2')]
And now you can use set_index
and reindex
to get the missing observations.
In [98]: df.set_index(['Type', 'Food', 'Loc']).reindex(idx, fill_value=0)
Out[98]:
Num
Type Food Loc
Fruit Banana House-1 15
House-2 4
House-3 0
House-4 0
House-5 0
... ...
Seasoning Vinegar House-3 0
House-4 0
House-5 0
House-6 0
House-7 2
[56 rows x 1 columns]
Dataframe to fill in with missing values - complete() function
Try complete
as follows -
df2 <- tidyr::complete(df2, ID = unique(df$ID), fill = list(dim = 0))
Fill missing combinations with ones in a groupby object
We can do pivot_table
then stack
out = df.pivot_table(index='date',columns='group',values='ret',aggfunc = 'mean').fillna(1).stack().reset_index(name='value')
date group value
0 1986-01-31 1 1.1
1 1986-01-31 2 1.5
2 1986-01-31 3 1.1
3 1986-02-28 1 1.0
4 1986-02-28 2 1.2
5 1986-02-28 3 1.0
Pandas: Create missing combination rows with zero values
Another way using unstack
with fill_value=0
and stack
, reset_index
df.set_index(['col1','col2']).unstack(fill_value=0).stack().reset_index()
Out[311]:
col1 col2 value
0 1 A 2
1 1 B 4
2 1 C 0
3 2 A 6
4 2 B 8
5 2 C 10
Related Topics
R: Pulling Data from One Column to Create New Columns
Replacing Nas With Latest Non-Na Value
How to Import Multiple .Csv Files At Once
Filter Data.Frame Rows by a Logical Condition
Plotting Two Variables as Lines Using Ggplot2 on the Same Graph
Add Legend to Ggplot2 Line Plot
Calculate Group Mean, Sum, or Other Summary Stats. and Assign Column to Original Data
How to Find the Closest Date to a Given Date
Pass a String as Variable Name in Dplyr::Filter
How to Join (Merge) Data Frames (Inner, Outer, Left, Right)
Why Are These Numbers Not Equal
Split Comma-Separated Strings in a Column into Separate Rows
Collapse/Concatenate/Aggregate a Column to a Single Comma Separated String Within Each Group
Order Bars in Ggplot2 Bar Graph
Error: Could Not Find Function ... in R
Numbering Rows Within Groups in a Data Frame