Find First Non-Zero Value in Each Column of Pandas Dataframe

Find first non-zero value in each column of pandas DataFrame

You're looking for idxmax which gives you the first position of the maximum. However, you need to find the max of "not equal to zero"

df.ne(0).idxmax()

first     2
second    1
third     3
dtype: int64

We can couple this with lookup and assign

df.ne(0).idxmax().to_frame('pos').assign(val=lambda d: df.lookup(d.pos, d.index))

        pos  val
first     2    4
second    1   10
third     3    3

Same answer packaged slightly differently.

m = df.ne(0).idxmax()
pd.DataFrame(dict(pos=m, val=df.lookup(m, m.index)))

        pos  val
first     2    4
second    1   10
third     3    3

Mark first non-zero element in each column with pandas

Try with broadcasting:

Given Series of Thresholds:

threshold = pd.Series([0.02, 0.06, 0.1])

m = df < threshold.values[:, None]
new_df = pd.DataFrame(
    m & (m.index.values[:, None] == m.idxmax().values),
    columns=df.columns,
    dtype=int
)

Explaination:

Find where values are less than the array values:

df < threshold.values[:, None]

       A      B      C
0  False  False   True
1  False   True  False
2   True   True   True

Then use idxmax to get the first instance of true per column:

m.idxmax()

A    2
B    1
C    0
dtype: int64

Then broadcast again to find where value is both True and max index for the column:

m & (m.index.values[:, None] == m.idxmax().values)

[[False False  True]
 [False  True False]
 [ True False False]]

Lastly turn back into a DataFrame:

new_df = pd.DataFrame(
    m & (m.index.values[:, None] == m.idxmax().values),
    columns=df.columns,
    dtype=int
)

Find First Non-zero Value in Each Row of Pandas DataFrame

try this:

res = df[df != 0.0].bfill(axis=1)['A']

all i'm doing is replacing all non-zeros with nans and then filling them in from the right, which forces all resulting values in the first column to be the first non-zero value in the row.

or, a quicker way, as suggested by @piRSquared:

df.replace(0, np.nan).bfill(1).iloc[:, 0]

Finding first non-zero entry across columns in Pandas

Use filter for select columns with info and other, then add new column called NaN for general solution - this value is in new column if only 0 rows, last compare values by ne (!=) and get column of first True by idxmax:

print (df)
  entry_name  entry_id  052018 info_1  062018 info_2  052018 other_1  \
0          a         1              0              0               1   
1          b         2              0              1               0   
2          c         4              0              0               0   
3          d         5              2              3               4   

   062018 other_2  
0               2  
1               0  
2               0  
3               4  

df1 = df.filter(like='info').copy()
df1[np.nan] = 1
df['info'] = df1.ne(0).idxmax(axis=1)

df2 = df.filter(like='other').copy()
df2[np.nan] = 1
df['other'] = df2.ne(0).idxmax(axis=1)
print (df)
  entry_name  entry_id  052018 info_1  062018 info_2  052018 other_1  \
0          a         1              0              0               1   
1          b         2              0              1               0   
2          c         4              0              0               0   
3          d         5              2              3               4   

   062018 other_2           info           other  
0               2            NaN  052018 other_1  
1               0  062018 info_2             NaN  
2               0            NaN             NaN  
3               4  052018 info_1  052018 other_1

Find first and last non-zero column in each row of a pandas dataframe

`first_valid_index` and `last_valid_index`

d = df.mask(df == 0).drop(['Name', 'count'], 1)
df.assign(
    Start=d.apply(pd.Series.first_valid_index, 1),
    Finish=d.apply(pd.Series.last_valid_index, 1)
)

    Name  Jan17  Jun18  Dec18  Apr19  count  Start Finish
0   Nick    0.0    1.7    3.7    0.0      2  Jun18  Dec18
1   Jack    0.0    0.0    2.8    3.5      2  Dec18  Apr19
2    Fox    0.0    1.7    0.0    0.0      1  Jun18  Jun18
3    Rex    1.0    0.0    3.0    4.2      3  Jan17  Apr19
4  Snack    0.0    0.0    2.8    4.4      2  Dec18  Apr19
5  Yosee    0.0    0.0    0.0    4.3      1  Apr19  Apr19
6  Petty    0.5    1.3    2.8    3.5      4  Jan17  Apr19

`stack` then `groupby`

d = df.mask(df == 0).drop(['Name', 'count'], 1)
def fl(s): return s.xs(s.name).index[[0, -1]]
s, f = d.stack().groupby(level=0).apply(fl).str
df.assign(Start=s, Finish=f)

    Name  Jan17  Jun18  Dec18  Apr19  count  Start Finish
0   Nick    0.0    1.7    3.7    0.0      2  Jun18  Dec18
1   Jack    0.0    0.0    2.8    3.5      2  Dec18  Apr19
2    Fox    0.0    1.7    0.0    0.0      1  Jun18  Jun18
3    Rex    1.0    0.0    3.0    4.2      3  Jan17  Apr19
4  Snack    0.0    0.0    2.8    4.4      2  Dec18  Apr19
5  Yosee    0.0    0.0    0.0    4.3      1  Apr19  Apr19
6  Petty    0.5    1.3    2.8    3.5      4  Jan17  Apr19

Identify the first and all non-zero values in every row in Pandas DataFrame

For first you can select only numeric columns and replace non 0 value by 1 in DataFrame.mask, then for second add cumulative sum per axis=1 with compare first 1 values by DataFrame.eq and boolean mask convert to integers by DataFrame.astype:

df1, df2 = data.copy(), data.copy()
cols = df1.select_dtypes(include=np.number).columns
df1[cols] = df1[cols].mask(data[cols].ne(0), 1)

df2[cols] = df1[cols].cumsum(axis=1).eq(1).astype(int)
print(df1)
    Name  Date1  Date2  Date3  Date4  Date5
0   Juan      0      0      1      1      1
1   Luis      1      1      1      1      1
2  Maria      0      1      1      1      1
3  Laura      0      0      0      1      1
4   Lina      0      0      0      0      1

print(df2)
    Name  Date1  Date2  Date3  Date4  Date5
0   Juan      0      0      1      0      0
1   Luis      1      0      0      0      0
2  Maria      0      1      0      0      0
3  Laura      0      0      0      1      0
4   Lina      0      0      0      0      1

Get the column name for the first non-zero value in that row with pandas

It seems like you can use idxmax here:

df.set_index('cnum').idxmax(axis=1).reset_index(drop=True)

0    sup1
1    sup1
2    sup3
3    sup2
dtype: object

df['output'] = df.set_index('cnum').idxmax(axis=1).reset_index(drop=True) 
# Slightly faster,
# df['output'] = df.set_index('cnum').idxmax(axis=1).to_numpy() 

df
         cnum  sup1  sup2  sup3  sup4 output
0   285414459     1     0     1     1   sup1
1   445633709     1     0     0     0   sup1
2   556714736     0     0     1     0   sup3
3  1089852074     0     1     0     1   sup2

Another option with dot (will give you all non-zero columns):

d = df.set_index('cnum') 
d.dot(d.columns + ',').str.rstrip(',').reset_index(drop=True)

0    sup1,sup3,sup4
1              sup1
2              sup3
3         sup2,sup4
dtype: object

Or,

(d.dot(d.columns + ',')
  .str.rstrip(',')
  .str.split(',', 1).str[0] 
  .reset_index(drop=True))

0    sup1
1    sup1
2    sup3
3    sup2
dtype: object

Find first non-zero element within a group in pandas

In the groupby, set sort to False, get the cumsum, then filter for rows not equal to 0 :

df.loc[df.groupby(["group"], sort=False).target.cumsum() != 0]

    group   value   target
2      1       3    2
3      1       4    0
4      1       5    1
8      2       4    1
9      2       5    3

Pandas dataframe column forward fill from first non-zero value

Use .values attribute:

df['c']=df.groupby('ID',as_index = False)['c'].apply(lambda x: x.replace(to_replace=0, method='ffill')).values

Now if you print df you will get your desired output:

    ID  b   c
0   1   0   0
1   1   5   1
2   1   8   1
3   2   4   0
4   2   8   1
5   2   81  1

Find First Non-Zero Value in Each Column of Pandas Dataframe