First Non-Null Value Per Row from a List of Pandas Columns

First non-null value per row from a list of Pandas columns

This is a really messy way to do this, first use first_valid_index to get the valid columns, convert the returned series to a dataframe so we can call apply row-wise and use this to index back to original df:

In [160]:
def func(x):
    if x.values[0] is None:
        return None
    else:
        return df.loc[x.name, x.values[0]]
pd.DataFrame(df.apply(lambda x: x.first_valid_index(), axis=1)).apply(func,axis=1)

Out[160]:
0     1
1     3
2     4
3   NaN
dtype: float64

EDIT

A slightly cleaner way:

In [12]:
def func(x):
    if x.first_valid_index() is None:
        return None
    else:
        return x[x.first_valid_index()]
df.apply(func, axis=1)

Out[12]:
0     1
1     3
2     4
3   NaN
dtype: float64

Get first non-null value per row

Use back filling NaNs first and then select first column by iloc:

df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')

Or:

df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')

print (df)
   ID   c1   c2  c3   c4 result
0   1    a    b   a  NaN      a
1   2  NaN   cc  dd   cc     cc
2   3  NaN   ee  ff   ee     ee
3   4  NaN  NaN  gg   gg     gg

Performance:

df = pd.concat([df] * 1000, ignore_index=True)

In [220]: %timeit df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')
100 loops, best of 3: 2.78 ms per loop

In [221]: %timeit df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')
100 loops, best of 3: 2.7 ms per loop

#jpp solution
In [222]: %%timeit
     ...: cols = df.iloc[:, 1:].T.apply(pd.Series.first_valid_index)
     ...: 
     ...: df['result'] = [df.loc[i, cols[i]] for i in range(len(df.index))]
     ...: 
1 loop, best of 3: 180 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ'  s solution
In [223]: %timeit df['result'] = df.stack().groupby(level=0).first()
1 loop, best of 3: 606 ms per loop

First column name with non null value by row pandas

You can apply first_valid_index to each row in the dataframe using a lambda expression with axis=1 to specify rows.

>>> df.apply(lambda row: row.first_valid_index(), axis=1)
ID
0      Y2
1      Y3
2    None
3      Y1
dtype: object

To apply it to your dataframe:

df = df.assign(first = df.apply(lambda row: row.first_valid_index(), axis=1))

>>> df
    Y1  Y2  Y3 first
ID                  
0  NaN   8   4    Y2
1  NaN NaN   1    Y3
2  NaN NaN NaN  None
3    5   3 NaN    Y1

Pandas - find first non-null value in column

You can use first_valid_index with select by loc:

s = pd.Series([np.nan,2,np.nan])
print (s)
0    NaN
1    2.0
2    NaN
dtype: float64

print (s.first_valid_index())
1

print (s.loc[s.first_valid_index()])
2.0

# If your Series contains ALL NaNs, you'll need to check as follows:

s = pd.Series([np.nan, np.nan, np.nan])
idx = s.first_valid_index()  # Will return None
first_valid_value = s.loc[idx] if idx is not None else None
print(first_valid_value)
None

How to take the first non null element, row-wise, from a column that consists of lists?

You can do this directly, without the need for fifth_colum. Just stack the data frame. Since you want the first non-null element per row, your group is the first index (level=0). So just get the first value by that group.

x['sixth_col'] = x.stack().groupby(level=0).first()

   col_1  col_2  col_3  col_4  sixth_col
0    NaN   15.0   12.0    NaN       15.0
1   35.0   12.0   15.0    NaN       35.0
2   27.0    NaN   40.0    NaN       27.0
3   50.0    NaN    NaN    5.0       50.0

Keep only the 1st non-null value in each row (and replace others with NaN)

One way to go, would be:

import pandas as pd

data = {'a': {0: 3.0, 1: 2.0, 2: None}, 'b': {0: 10.0, 1: 9.0, 2: 8.0}}

df = pd.DataFrame(data)

def keep_first_valid(x):
    first_valid = x.first_valid_index()
    return x.mask(x.index!=first_valid)

df = df.apply(lambda x: keep_first_valid(x), axis=1)
df

     a    b
0  3.0  NaN
1  2.0  NaN
2  NaN  8.0

So, the first x passed to the function would consist of pd.Series([3.0, 10.0],index=['a','b']).
Inside the function first_valid = x.first_valid_index() will store 'a'; see df.first_valid_index.
Finally, we apply s.mask to get pd.Series([3.0, None],index=['a','b']), which we assign back to the df.

pandas group by and find first non null value for all columns

Use GroupBy.first:

df1 = df.groupby('id', as_index=False).first()
print (df1)
   id   age gender country  sales_year
0   1  20.0      M   India        2016
1   2  23.0      F   India        2016
2   3  30.0      M   India        2019
3   4  36.0    NaN   India        2019

If column sales_year is not sorted:

df2 = df.sort_values('sales_year', ascending=False).groupby('id', as_index=False).first()
print (df2)
   id   age gender country  sales_year
0   1  20.0      M   India        2016
1   2  23.0      F   India        2016
2   3  30.0      M   India        2019
3   4  36.0    NaN   India        2019

First Non-Null Value Per Row from a List of Pandas Columns