How to remove blanks/NA's from dataframe and shift the values up
You can use apply
with dropna
:
np.random.seed(100)
df = pd.DataFrame(np.random.randn(5,4))
df.iloc[1,2] = np.NaN
df.iloc[0,1] = np.NaN
df.iloc[2,1] = np.NaN
df.iloc[2,0] = np.NaN
print (df)
0 1 2 3
0 -1.749765 NaN 1.153036 -0.252436
1 0.981321 0.514219 NaN -1.070043
2 NaN NaN -0.458027 0.435163
3 -0.583595 0.816847 0.672721 -0.104411
4 -0.531280 1.029733 -0.438136 -1.118318
df1 = df.apply(lambda x: pd.Series(x.dropna().values))
print (df1)
0 1 2 3
0 -1.749765 0.514219 1.153036 -0.252436
1 0.981321 0.816847 -0.458027 -1.070043
2 -0.583595 1.029733 0.672721 0.435163
3 -0.531280 NaN -0.438136 -0.104411
4 NaN NaN NaN -1.118318
And then if need replace to empty space, what create mixed values - strings with numeric - some functions can be broken:
df1 = df.apply(lambda x: pd.Series(x.dropna().values)).fillna('')
print (df1)
0 1 2 3
0 -1.74977 0.514219 1.15304 -0.252436
1 0.981321 0.816847 -0.458027 -1.070043
2 -0.583595 1.02973 0.672721 0.435163
3 -0.53128 -0.438136 -0.104411
4 -1.118318
Remove NaN values and shift values from the next column
Here is one way:
df_out = df.apply(lambda x: pd.Series(x.dropna().to_numpy()), axis=1)
df_out = df_out.set_axis(df.columns[:df_out.shape[1]], axis=1).reindex(df.columns, axis=1)
df_out
Output:
CLIENT ANIMAL_1 ANIMAL_2 ANIMAL_3 ANIMAL_4
ROW_1 1 cow frog dog NaN
ROW_2 2 pig cat NaN NaN
Details, use dropna on each row, but then you need to to convert to numpy array to remove indexes, then assign column headers to the original dataframe and reindex along columns to pick up all null columns at the end of the dataframe.
Remove NaN values from pandas dataframe and reshape table
You need apply
with dropna
, only is necessary create numpy array
and reassign Series
for reset indices:
df.apply(lambda x: pd.Series(x.dropna().values))
Sample:
df = pd.DataFrame({'B':[4,np.nan,4,np.nan,np.nan,4],
'C':[7,np.nan,9,np.nan,2,np.nan],
'D':[1,3,np.nan,7,np.nan,np.nan],
'E':[np.nan,3,np.nan,9,2,np.nan]})
print (df)
B C D E
0 4.0 7.0 1.0 NaN
1 NaN NaN 3.0 3.0
2 4.0 9.0 NaN NaN
3 NaN NaN 7.0 9.0
4 NaN 2.0 NaN 2.0
5 4.0 NaN NaN NaN
df1 = df.apply(lambda x: pd.Series(x.dropna().values))
print (df1)
B C D E
0 4.0 7.0 1.0 3.0
1 4.0 9.0 3.0 9.0
2 4.0 2.0 7.0 2.0
Remove NaNs from Dataframe?
Another possible solution is to use dropna(), reset_index() and concat().
pd.concat([df[x].dropna().reset_index(drop=True) for x in df.columns], axis=1)
Code
import pandas as pd
import numpy as np
li=[['A',np.nan],['P',np.nan],[np.nan,'E'],[np.nan,'R'],['U',np.nan],[np.nan,'Y']]
df=pd.DataFrame(li,columns=['col1','col2'])
df2=pd.concat([df[x].dropna().reset_index(drop=True) for x in df.columns], axis=1)
print(df2)
Output
col1 col2
0 A E
1 P R
2 U Y
How to resize dataframe by dropping NaN in individual cells?
You can forward fill missing values by ffill
, remove NaN
s rows and remove duplicates:
df = df.ffill().dropna().drop_duplicates()
print (df)
column1 column2 column3 column4
1 1 2 3.0 4.0
Or if need first non missing values per groups specified by some column(s):
df = df.groupby(['column1','column2'], as_index=False).first()
print (df)
column1 column2 column3 column4
0 1 2 3.0 4.0
Pandas: Remove all NaN values in all columns
As you could see, each column has different number of rows.
A DataFrame is a tabular data structure: you can look up an index and a column, and find the value. If the number of rows is different per columns, then the index is meaningless and misleading. A dict
might be a better alternative:
{c: df[c].dropna().values for c in df.columns}
or
{c: list(df[c]) for c in df.columns}
move column above and delete rows in pandas python dataframe
You can shift back each column by the number of preceding missing values which is found with first_valid_index
:
df.apply(lambda s: s.shift(-s.first_valid_index()))
to get
A B C D E F G H
0 a.1 b.1 c.1 d.1 e.1 f.1 g.1 h.1
1 NaN NaN c.2 d.2 NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN
To drop the rows full of NaN
s and fill the rest with empty string:
out = (df.apply(lambda s: s.shift(-s.first_valid_index()))
.dropna(how="all")
.fillna(""))
to get
>>> out
A B C D E F G H
0 a.1 b.1 c.1 d.1 e.1 f.1 g.1 h.1
1 c.2 d.2
note: this assumes your index is 0..N-1
; so if it's not, you can store it beforehand and then restore back:
index = df.index
df = df.reset_index(drop=True)
df = (df.apply(lambda s: s.shift(-s.first_valid_index()))
.dropna(how="all")
.fillna(""))
df.index = index[:len(df)]
To make the pulling up specific to some columns:
def pull_up(s):
# this will be a column number; `s.name` is the column name
col_index = df.columns.get_indexer([s.name])
# for example: if `col_index` is either 7 or 8, pull by 4
if col_index in (7, 8):
return s.shift(-4)
else:
# otherwise, pull as much
return s.shift(-s.first_valid_index())
# applying
df.apply(pull_up)
Related Topics
Is There a "Not Equal" Operator in Python
Process to Convert Simple Python Script into Windows Executable
Programmatically Searching Google in Python Using Custom Search
What's the Best Way to Generate a Uml Diagram from Python Source Code
Replacing Text in a File with Python
How to Get Last Items of a List in Python
Heapq with Custom Compare Predicate
How to Set Env Variable in Jupyter Notebook
How to Install Python Package from Github
Get Raw Post Body in Python Flask Regardless of Content-Type Header
Python Datetime Object Show Wrong Timezone Offset
Getting Today's Date in Yyyy-Mm-Dd in Python
List() Uses Slightly More Memory Than List Comprehension
Getting the Docstring from a Function
Importerror: No Module Named Crypto.Cipher