Coalesce Values from 2 Columns into a Single Column in a Pandas Dataframe

Coalesce values from 2 columns into a single column in a pandas dataframe

use combine_first():

In [16]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=list('ab'))

In [17]: df.loc[::2, 'a'] = np.nan

In [18]: df
Out[18]:
     a  b
0  NaN  0
1  5.0  5
2  NaN  8
3  2.0  8
4  NaN  3
5  9.0  4
6  NaN  7
7  2.0  0
8  NaN  6
9  2.0  5

In [19]: df['c'] = df.a.combine_first(df.b)

In [20]: df
Out[20]:
     a  b    c
0  NaN  0  0.0
1  5.0  5  5.0
2  NaN  8  8.0
3  2.0  8  2.0
4  NaN  3  3.0
5  9.0  4  9.0
6  NaN  7  7.0
7  2.0  0  2.0
8  NaN  6  6.0
9  2.0  5  2.0

How to convert multiple set of column to single column in pandas?

You are essentially asking how to coalesce a values of certain df-columns into one column - you can do it like this:

from random import choice
import pandas as pd

# all azimuth names
azi_names = [f"Azi_{i}" for i in range(5)]  

# all distance names
dist_names = [f"Dist_{i}" for i in range(5)]

df = pd.DataFrame(columns = azi_names + dist_names)

# put some values in
for i in range(20):
    k = choice(range(5))
    df = df.append({f"Azi_{k}": i, f"Dist_{k}": i}, ignore_index=True)

print(df)

which randomly creates:

    Azi_0  Azi_1  Azi_2  Azi_3  Azi_4  Dist_0  Dist_1  Dist_2  Dist_3  Dist_4
0     NaN    NaN    NaN    0.0    NaN     NaN     NaN     NaN     0.0     NaN
1     NaN    1.0    NaN    NaN    NaN     NaN     1.0     NaN     NaN     NaN
2     2.0    NaN    NaN    NaN    NaN     2.0     NaN     NaN     NaN     NaN
3     NaN    NaN    3.0    NaN    NaN     NaN     NaN     3.0     NaN     NaN
4     NaN    4.0    NaN    NaN    NaN     NaN     4.0     NaN     NaN     NaN
5     NaN    NaN    NaN    NaN    5.0     NaN     NaN     NaN     NaN     5.0
6     6.0    NaN    NaN    NaN    NaN     6.0     NaN     NaN     NaN     NaN
7     NaN    7.0    NaN    NaN    NaN     NaN     7.0     NaN     NaN     NaN
8     NaN    8.0    NaN    NaN    NaN     NaN     8.0     NaN     NaN     NaN
9     9.0    NaN    NaN    NaN    NaN     9.0     NaN     NaN     NaN     NaN
10    NaN    NaN   10.0    NaN    NaN     NaN     NaN    10.0     NaN     NaN
11   11.0    NaN    NaN    NaN    NaN    11.0     NaN     NaN     NaN     NaN
12   12.0    NaN    NaN    NaN    NaN    12.0     NaN     NaN     NaN     NaN
13    NaN    NaN   13.0    NaN    NaN     NaN     NaN    13.0     NaN     NaN
14    NaN   14.0    NaN    NaN    NaN     NaN    14.0     NaN     NaN     NaN
15    NaN    NaN    NaN   15.0    NaN     NaN     NaN     NaN    15.0     NaN
16    NaN    NaN    NaN    NaN   16.0     NaN     NaN     NaN     NaN    16.0
17    NaN    NaN   17.0    NaN    NaN     NaN     NaN    17.0     NaN     NaN
18    NaN    NaN    NaN    NaN   18.0     NaN     NaN     NaN     NaN    18.0
19    NaN    NaN    NaN   19.0    NaN     NaN     NaN     NaN    19.0     NaN

To coalesce this and only keep filled values you use

df2 = pd.DataFrame()

# propagates values and chooses first
df2["AZI"] = df[azi_names].bfill(axis=1).iloc[:, 0]
df2["DIS"] = df[dist_names].bfill(axis=1).iloc[:, 0]

print(df2)

to get a coalesced new df:

     AZI   DIS
0    0.0   0.0
1    1.0   1.0
2    2.0   2.0
3    3.0   3.0
4    4.0   4.0
5    5.0   5.0
6    6.0   6.0
7    7.0   7.0
8    8.0   8.0
9    9.0   9.0
10  10.0  10.0
11  11.0  11.0
12  12.0  12.0
13  13.0  13.0
14  14.0  14.0
15  15.0  15.0
16  16.0  16.0
17  17.0  17.0
18  18.0  18.0
19  19.0  19.0

Attributation: inspired by Erfan's answer to Coalesce values from 2 columns into a single column in a pandas dataframe

You may need to Replacing blank values (white space) with NaN in pandas for your shown data.

Pandas combine/coalesce multiple columns into 1

Assuming there is always only one value per row across those three columns, as in your example, you could use df.sum(), which skips any NaN by default:

desired_dataframe = pd.DataFrame(base_dataframe['Name'])
desired_dataframe['Mark'] = base_dataframe.iloc[:, 1:4].sum(axis=1)

In case of potentially more values per row, it would perhaps be safer to use e.g. df.max() instead, which works in the same way.

How to Coalesce datetime values from 3 columns into a single column in a pandas dataframe?

You are right, a mix of bfill and ffill on the axis columns should do it:

df.assign(ACTUAL_START_DATE = df.filter(like='DATE')
                                .bfill(axis=1)
                                .ffill(axis=1)
                                .min(axis=1)
        )
 
   CLIENT_ID  DATE_BEGIN  DATE_START DATE_REGISTERED ACTUAL_START_DATE
0          1  2020-01-01  2020-01-01      2020-01-01        2020-01-01
1          2  2020-01-02  2020-02-01      2020-01-01        2020-01-01
2          3         NaN  2020-05-01      2020-04-01        2020-04-01
3          4  2020-01-01  2020-01-01             NaN        2020-01-01

Creating another column in pandas df based on partially empty columns

Backfill values from id2 to id1. Extract the numbers. Convert to int then str.

Given:

    id1   id2
0  ID01  ID01
1   NaN  ID03
2  ID07   NaN
3  ID08  ID08

Doing:

df['college_name'] = 'College' + (df.bfill(axis=1)['id1']
                                    .str.extract('(\d+)')
                                    .astype(int)
                                    .astype(str))

Output:

    id1   id2 college_name
0  ID01  ID01     College1
1   NaN  ID03     College3
2  ID07   NaN     College7
3  ID08  ID08     College8

To check for rows where the ids are different:

Given:

    id1   id2
0  ID01  ID01
1   NaN  ID03
2  ID07   NaN
3  ID08  ID98

Doing:

print(df[df.id1.ne(df.id2) & df.id1.notna() & df.id2.notna()])

Output:

    id1   id2
3  ID08  ID98

Is there a better more readable way to coalese columns in pandas

You could use pd.isnull to find the null -- in this case None -- values:

In [169]: pd.isnull(df)
Out[169]: 
   first second  third
0  False  False  False
1   True  False  False
2   True   True  False
3   True   True   True
4  False   True  False

and then use np.argmin to find the index of the first non-null value. If all the values are null, np.argmin returns 0:

In [186]: np.argmin(pd.isnull(df).values, axis=1)
Out[186]: array([0, 1, 2, 0, 0])

Then you could select the desired values from df using NumPy integer-indexing:

In [193]: df.values[np.arange(len(df)), np.argmin(pd.isnull(df).values, axis=1)]
Out[193]: array(['A', 'C', 'B', None, 'A'], dtype=object)

For example,

import pandas as pd
df = pd.DataFrame([{'third':'B','first':'A','second':'C'},
                   {'third':'B','first':None,'second':'C'},
                   {'third':'B','first':None,'second':None},                   
                   {'third':None,'first':None,'second':None},
                   {'third':'B','first':'A','second':None}])

mask = pd.isnull(df).values
df['combo1'] = df.values[np.arange(len(df)), np.argmin(mask, axis=1)]
order = np.array([1,2,0])
mask = mask[:, order]
df['combo2'] = df.values[np.arange(len(df)), order[np.argmin(mask, axis=1)]]

yields

  first second third combo1 combo2
0     A      C     B      A      C
1  None      C     B      C      C
2  None   None     B      B      B
3  None   None  None   None   None
4     A   None     B      A      B

Using argmin instead of df3.apply(coalesce, ...) is significantly quicker if the DataFrame has a lot of rows:

df2 = pd.concat([df]*1000)

In [230]: %timeit mask = pd.isnull(df2).values; df2.values[np.arange(len(df2)), np.argmin(mask, axis=1)]
1000 loops, best of 3: 617 µs per loop

In [231]: %timeit df2.apply(coalesce, axis=1)
10 loops, best of 3: 84.1 ms per loop

Apply Coalesce after grouping on two columns in pandas

It looks like you want to groupby consecutive blocks of ID. If so:

blocks = df['ID'].ne(df['ID'].shift()).cumsum()

agg_dict = {k:'first' if k != 'end-time' else 'last' 
             for k in df.columns}
df.groupby(blocks).agg(agg_dict)

Pandas Coalesce Multiple Columns, NaN

The last chain fillna for cusip is too complicated. You may change it to bfill

final['join_key'] = (final['book'].astype('str') + 
                     final['bdr'] + 
                     final[['cusip', 'isin', 'Deal', 'Id']].bfill(1)['cusip'].astype(str))

Coalesce Pandas DataFrame DOWN Columns

Try:

print(df.bfill().head(1))

Prints:

   Col A  Col B  Col C  Col D Col E
0  Row 1   20.0    4.0    1.0  text

Coalesce Values from 2 Columns into a Single Column in a Pandas Dataframe