Adding Row to a Data Frame with Missing Values

How to add row to pandas DataFrame with missing value efficiently?

You also could try to use pd.concat and combine_first. Your 2nd method isn't working properly (or may be I missed something). Results:

df1 = pd.DataFrame(a, index=[0])
df2 = pd.DataFrame(b, index=[1])

d = pd.DataFrame()
d = d.append(df1)
d = d.append(df2).fillna(0)

In [107]: d
Out[107]:
a b c m
0 10 1.3 0.00 0.0
1 0 32.5 3.14 5.1

column_name = ['a', 'b', 'c', 'm']
d = pd.DataFrame(columns = column_name)
d.add(a)
d.add(b)

In [113]: d
Out[113]:
Empty DataFrame
Columns: [a, b, c, m]
Index: []

In [115]: pd.concat([df1, df2]).fillna(0)
Out[115]:
a b c m
0 10 1.3 0.00 0.0
1 0 32.5 3.14 5.1

d = pd.DataFrame()
In [144]: d.combine_first(df1).combine_first(df2).fillna(0)
Out[144]:
a b c m
0 10 1.3 0.00 0.0
1 0 32.5 3.14 5.1

Benchmarking:

In [86]: %%timeit
d = pd.DataFrame()
d = d.append(df1)
d = d.append(df2).fillna(0)
....:
100 loops, best of 3: 3.29 ms per loop

In [87]: %timeit c = pd.concat([df1, df2]).fillna(0)
100 loops, best of 3: 1.94 ms per loop

In [153]: %%timeit
.....: d = pd.DataFrame()
.....: d.combine_first(df1).combine_first(df2).fillna(0)
.....:
100 loops, best of 3: 3.17 ms per loop

From these method pd.concat is faster

How to sum up missing values per row in pandas dataframe

Solution for processing all columns without Country - first convert it to index, test missing values and aggregate sum, last sum columns:

s = df.set_index('Country').isna().groupby('Country').sum().sum(axis=1)
print (s)
Country
Austria 1
Belgium 0
USA 4
dtype: int64

If need remove 0 values add boolean indexing:

s = s[s.ne(0)]

Python insert rows into a data-frame when values missing in field

Try this:

import pandas as pd
import numpy as np
df=pd.DataFrame({'seq':[0,1,2,3,4,5], 'location':['cal','cal','cal','il','il','il'],'lat':[29,29.1,28.2,15.2,15.6,14], 'lon':[-95,-98,-95.6,-88, -87.5,-88.9], 'name': ['mike', 'john', 'tyler', 'rob', 'ashley', 'john']})

df_new1 = pd.DataFrame({'location' : ['warehouse'], 'lat': [22], 'lon': [-50]}) # sample data row1
df = pd.concat([df_new1, df], sort=False).reset_index(drop = True)
print(df)

df_new2 = pd.DataFrame({'location' : ['abc'], 'lat': [28], 'name': ['abcd']}) # sample data row2
df = pd.concat([df_new2, df], sort=False).reset_index(drop = True)
print(df)

output:

    lat   location   lon    name  seq
0 22.0 warehouse -50.0 NaN NaN
0 29.0 cal -95.0 mike 0.0
1 29.1 cal -98.0 john 1.0
2 28.2 cal -95.6 tyler 2.0
3 15.2 il -88.0 rob 3.0
4 15.6 il -87.5 ashley 4.0
5 14.0 il -88.9 john 5.0

lat location name lon seq
0 28.0 abc abcd NaN NaN
1 22.0 warehouse NaN -50.0 NaN
2 29.0 cal mike -95.0 0.0
3 29.1 cal john -98.0 1.0
4 28.2 cal tyler -95.6 2.0
5 15.2 il rob -88.0 3.0
6 15.6 il ashley -87.5 4.0
7 14.0 il john -88.9 5.0

Add rows from one dataframe to another based on missing values in a given column pandas

Use concat with filtered backup rows with not exist in target.key1 filtered by Series.isin in boolean indexing:

merged = pd.concat([target, backup[~backup.key1.isin(target.key1)]])
print (merged)
key1 A B
0 K1 A1 B1
1 K2 A2 B2
2 K3 A3 B3
3 K5 NaN B5
3 K4 A4 B4

How to add an empty column to a dataframe?

If I understand correctly, assignment should fill:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
A B
0 1 2
1 2 3
2 3 4
>>> df["C"] = ""
>>> df["D"] = np.nan
>>> df
A B C D
0 1 2 NaN
1 2 3 NaN
2 3 4 NaN

How to add the missing rows from one dataframe to another based on condition in Pandas?

  1. you can concat df1 with the records in df2 which are not in df1 : df2[~df2.isin(df1)].dropna()
  2. you then sort your values and reset_index

Long story short, you could do it in one line :

pd.concat([df1, df2[~df2.isin(df1)].dropna()]).sort_values(['index','type','class']).reset_index(drop=True)

Will give the following output:

    index   type    class
0 001 red A
1 001 red A
2 001 red A
3 002 yellow A
4 002 red A
5 003 green A
6 003 green B
7 004 blue A
8 004 blue A

Adding rows with value 0 for missing rows in python

You can set column 'id' as index, then use reindex method to conform df to new index with index from 1 to 5. The reindex method places NaN values in locations that had no values in the previous index, so you use fillna method to fill these with 0s, then reset the index and finally cast df to int dtype:

df = df.set_index('id').reindex(range(1,6)).fillna(0).reset_index().astype(int)

Output:

   id  value1  value2
0 1 0 0
1 2 13 33
2 3 0 0
3 4 0 0
4 5 45 24

Missing data, insert rows in Pandas and fill with NAN

set_index and reset_index are your friends.

df = DataFrame({"A":[0,0.5,1.0,3.5,4.0,4.5], "B":[1,4,6,2,4,3], "C":[3,2,1,0,5,3]})

First move column A to the index:

In [64]: df.set_index("A")
Out[64]:
B C
A
0.0 1 3
0.5 4 2
1.0 6 1
3.5 2 0
4.0 4 5
4.5 3 3

Then reindex with a new index, here the missing data is filled in with nans. We use the Index object since we can name it; this will be used in the next step.

In [66]: new_index = Index(arange(0,5,0.5), name="A")
In [67]: df.set_index("A").reindex(new_index)
Out[67]:
B C
0.0 1 3
0.5 4 2
1.0 6 1
1.5 NaN NaN
2.0 NaN NaN
2.5 NaN NaN
3.0 NaN NaN
3.5 2 0
4.0 4 5
4.5 3 3

Finally move the index back to the columns with reset_index. Since we named the index, it all works magically:

In [69]: df.set_index("A").reindex(new_index).reset_index()
Out[69]:
A B C
0 0.0 1 3
1 0.5 4 2
2 1.0 6 1
3 1.5 NaN NaN
4 2.0 NaN NaN
5 2.5 NaN NaN
6 3.0 NaN NaN
7 3.5 2 0
8 4.0 4 5
9 4.5 3 3

Inserting rows into data frame when values missing in category

Option 1

Thanks to @Frank for the better solution, using tidyr:

library(tidyr)
complete(df, day, product, fill = list(sales = 0))

Using this approach, you no longer need to worry about selecting product names, etc.

Which gives you:

  day product      sales
1 a 1 0.52042809
2 b 1 0.00000000
3 c 1 0.46373882
4 a 2 0.11155348
5 b 2 0.04937618
6 c 2 0.26433153
7 a 3 0.69100939
8 b 3 0.90596172
9 c 3 0.00000000


Option 2

You can do this using the tidyr package (and dplyr)

df %>% 
spread(product, sales, fill = 0) %>%
gather(`1`:`3`, key = "product", value = "sales")

Which gives the same result

This works by using spread to create a wide data frame, with each product as its own column. The argument fill = 0 will cause all empty cells to be filled with a 0 (the default is NA).

Next, gather works to convert the 'wide' data frame back into the original 'long' data frame. The first argument is the columns of the products (in this case '1':'3'). We then set the key and value to the original column names.

I would suggestion option 1, but option 2 might still prove to have some use in certain circumstances.


Both options should work for all days you have at least one sale recorded. If there are missing days, I suggest you look into the package padr and then using the above tidyr to do the rest.



Related Topics



Leave a reply



Submit