How to Remove Blanks/Na's from Dataframe and Shift the Values Up

How to remove blanks/NA's from dataframe and shift the values up

You can use apply with dropna:

np.random.seed(100)
df = pd.DataFrame(np.random.randn(5,4))
df.iloc[1,2] = np.NaN
df.iloc[0,1] = np.NaN
df.iloc[2,1] = np.NaN
df.iloc[2,0] = np.NaN
print (df)
0 1 2 3
0 -1.749765 NaN 1.153036 -0.252436
1 0.981321 0.514219 NaN -1.070043
2 NaN NaN -0.458027 0.435163
3 -0.583595 0.816847 0.672721 -0.104411
4 -0.531280 1.029733 -0.438136 -1.118318

df1 = df.apply(lambda x: pd.Series(x.dropna().values))
print (df1)
0 1 2 3
0 -1.749765 0.514219 1.153036 -0.252436
1 0.981321 0.816847 -0.458027 -1.070043
2 -0.583595 1.029733 0.672721 0.435163
3 -0.531280 NaN -0.438136 -0.104411
4 NaN NaN NaN -1.118318

And then if need replace to empty space, what create mixed values - strings with numeric - some functions can be broken:

df1 = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
print (df1)
0 1 2 3
0 -1.74977 0.514219 1.15304 -0.252436
1 0.981321 0.816847 -0.458027 -1.070043
2 -0.583595 1.02973 0.672721 0.435163
3 -0.53128 -0.438136 -0.104411
4 -1.118318

Remove NaN values and shift values from the next column

Here is one way:

df_out = df.apply(lambda x: pd.Series(x.dropna().to_numpy()), axis=1)
df_out = df_out.set_axis(df.columns[:df_out.shape[1]], axis=1).reindex(df.columns, axis=1)
df_out

Output:

       CLIENT ANIMAL_1 ANIMAL_2 ANIMAL_3  ANIMAL_4
ROW_1 1 cow frog dog NaN
ROW_2 2 pig cat NaN NaN

Details, use dropna on each row, but then you need to to convert to numpy array to remove indexes, then assign column headers to the original dataframe and reindex along columns to pick up all null columns at the end of the dataframe.

Remove NaN values from pandas dataframe and reshape table

You need apply with dropna, only is necessary create numpy array and reassign Series for reset indices:

df.apply(lambda x: pd.Series(x.dropna().values))

Sample:

df = pd.DataFrame({'B':[4,np.nan,4,np.nan,np.nan,4],
'C':[7,np.nan,9,np.nan,2,np.nan],
'D':[1,3,np.nan,7,np.nan,np.nan],
'E':[np.nan,3,np.nan,9,2,np.nan]})

print (df)
B C D E
0 4.0 7.0 1.0 NaN
1 NaN NaN 3.0 3.0
2 4.0 9.0 NaN NaN
3 NaN NaN 7.0 9.0
4 NaN 2.0 NaN 2.0
5 4.0 NaN NaN NaN

df1 = df.apply(lambda x: pd.Series(x.dropna().values))
print (df1)
B C D E
0 4.0 7.0 1.0 3.0
1 4.0 9.0 3.0 9.0
2 4.0 2.0 7.0 2.0

Remove NaNs from Dataframe?

Another possible solution is to use dropna(), reset_index() and concat().

pd.concat([df[x].dropna().reset_index(drop=True) for x in df.columns], axis=1)

Code

import pandas as pd
import numpy as np
li=[['A',np.nan],['P',np.nan],[np.nan,'E'],[np.nan,'R'],['U',np.nan],[np.nan,'Y']]
df=pd.DataFrame(li,columns=['col1','col2'])
df2=pd.concat([df[x].dropna().reset_index(drop=True) for x in df.columns], axis=1)
print(df2)

Output

  col1 col2
0 A E
1 P R
2 U Y

How to resize dataframe by dropping NaN in individual cells?

You can forward fill missing values by ffill, remove NaNs rows and remove duplicates:

df = df.ffill().dropna().drop_duplicates()
print (df)
column1 column2 column3 column4
1 1 2 3.0 4.0

Or if need first non missing values per groups specified by some column(s):

df = df.groupby(['column1','column2'], as_index=False).first()
print (df)

column1 column2 column3 column4
0 1 2 3.0 4.0

Pandas: Remove all NaN values in all columns

As you could see, each column has different number of rows.

A DataFrame is a tabular data structure: you can look up an index and a column, and find the value. If the number of rows is different per columns, then the index is meaningless and misleading. A dict might be a better alternative:

{c: df[c].dropna().values for c in df.columns}

or

{c: list(df[c]) for c in df.columns}

move column above and delete rows in pandas python dataframe

You can shift back each column by the number of preceding missing values which is found with first_valid_index:

df.apply(lambda s: s.shift(-s.first_valid_index()))

to get

     A    B    C    D    E    F    G    H
0 a.1 b.1 c.1 d.1 e.1 f.1 g.1 h.1
1 NaN NaN c.2 d.2 NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN

To drop the rows full of NaNs and fill the rest with empty string:

out = (df.apply(lambda s: s.shift(-s.first_valid_index()))
.dropna(how="all")
.fillna(""))

to get

>>> out

A B C D E F G H
0 a.1 b.1 c.1 d.1 e.1 f.1 g.1 h.1
1 c.2 d.2

note: this assumes your index is 0..N-1; so if it's not, you can store it beforehand and then restore back:

index = df.index
df = df.reset_index(drop=True)
df = (df.apply(lambda s: s.shift(-s.first_valid_index()))
.dropna(how="all")
.fillna(""))
df.index = index[:len(df)]

To make the pulling up specific to some columns:

def pull_up(s):
# this will be a column number; `s.name` is the column name
col_index = df.columns.get_indexer([s.name])

# for example: if `col_index` is either 7 or 8, pull by 4
if col_index in (7, 8):
return s.shift(-4)
else:
# otherwise, pull as much
return s.shift(-s.first_valid_index())

# applying
df.apply(pull_up)


Related Topics



Leave a reply



Submit