Find first non-zero value in each column of pandas DataFrame
You're looking for idxmax
which gives you the first position of the maximum. However, you need to find the max of "not equal to zero"
df.ne(0).idxmax()
first 2
second 1
third 3
dtype: int64
We can couple this with lookup
and assign
df.ne(0).idxmax().to_frame('pos').assign(val=lambda d: df.lookup(d.pos, d.index))
pos val
first 2 4
second 1 10
third 3 3
Same answer packaged slightly differently.
m = df.ne(0).idxmax()
pd.DataFrame(dict(pos=m, val=df.lookup(m, m.index)))
pos val
first 2 4
second 1 10
third 3 3
Mark first non-zero element in each column with pandas
Try with broadcasting
:
Given Series of Thresholds:
threshold = pd.Series([0.02, 0.06, 0.1])
m = df < threshold.values[:, None]
new_df = pd.DataFrame(
m & (m.index.values[:, None] == m.idxmax().values),
columns=df.columns,
dtype=int
)
A B C
0 0 0 1
1 0 1 0
2 1 0 0
Explaination:
Find where values are less than the array values:
df < threshold.values[:, None]
A B C
0 False False True
1 False True False
2 True True True
Then use idxmax
to get the first instance of true per column:
m.idxmax()
A 2
B 1
C 0
dtype: int64
Then broadcast again to find where value is both True
and max index for the column:
m & (m.index.values[:, None] == m.idxmax().values)
[[False False True]
[False True False]
[ True False False]]
Lastly turn back into a DataFrame:
new_df = pd.DataFrame(
m & (m.index.values[:, None] == m.idxmax().values),
columns=df.columns,
dtype=int
)
A B C
0 0 0 1
1 0 1 0
2 1 0 0
Find First Non-zero Value in Each Row of Pandas DataFrame
try this:
res = df[df != 0.0].bfill(axis=1)['A']
all i'm doing is replacing all non-zeros with nan
s and then filling them in from the right, which forces all resulting values in the first column to be the first non-zero value in the row.
or, a quicker way, as suggested by @piRSquared:
df.replace(0, np.nan).bfill(1).iloc[:, 0]
Finding first non-zero entry across columns in Pandas
Use filter
for select columns with info
and other
, then add new column called NaN
for general solution - this value is in new column if only 0
rows, last compare values by ne
(!=
) and get column of first True
by idxmax
:
print (df)
entry_name entry_id 052018 info_1 062018 info_2 052018 other_1 \
0 a 1 0 0 1
1 b 2 0 1 0
2 c 4 0 0 0
3 d 5 2 3 4
062018 other_2
0 2
1 0
2 0
3 4
df1 = df.filter(like='info').copy()
df1[np.nan] = 1
df['info'] = df1.ne(0).idxmax(axis=1)
df2 = df.filter(like='other').copy()
df2[np.nan] = 1
df['other'] = df2.ne(0).idxmax(axis=1)
print (df)
entry_name entry_id 052018 info_1 062018 info_2 052018 other_1 \
0 a 1 0 0 1
1 b 2 0 1 0
2 c 4 0 0 0
3 d 5 2 3 4
062018 other_2 info other
0 2 NaN 052018 other_1
1 0 062018 info_2 NaN
2 0 NaN NaN
3 4 052018 info_1 052018 other_1
Find first and last non-zero column in each row of a pandas dataframe
first_valid_index
and last_valid_index
d = df.mask(df == 0).drop(['Name', 'count'], 1)
df.assign(
Start=d.apply(pd.Series.first_valid_index, 1),
Finish=d.apply(pd.Series.last_valid_index, 1)
)
Name Jan17 Jun18 Dec18 Apr19 count Start Finish
0 Nick 0.0 1.7 3.7 0.0 2 Jun18 Dec18
1 Jack 0.0 0.0 2.8 3.5 2 Dec18 Apr19
2 Fox 0.0 1.7 0.0 0.0 1 Jun18 Jun18
3 Rex 1.0 0.0 3.0 4.2 3 Jan17 Apr19
4 Snack 0.0 0.0 2.8 4.4 2 Dec18 Apr19
5 Yosee 0.0 0.0 0.0 4.3 1 Apr19 Apr19
6 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19
stack
then groupby
d = df.mask(df == 0).drop(['Name', 'count'], 1)
def fl(s): return s.xs(s.name).index[[0, -1]]
s, f = d.stack().groupby(level=0).apply(fl).str
df.assign(Start=s, Finish=f)
Name Jan17 Jun18 Dec18 Apr19 count Start Finish
0 Nick 0.0 1.7 3.7 0.0 2 Jun18 Dec18
1 Jack 0.0 0.0 2.8 3.5 2 Dec18 Apr19
2 Fox 0.0 1.7 0.0 0.0 1 Jun18 Jun18
3 Rex 1.0 0.0 3.0 4.2 3 Jan17 Apr19
4 Snack 0.0 0.0 2.8 4.4 2 Dec18 Apr19
5 Yosee 0.0 0.0 0.0 4.3 1 Apr19 Apr19
6 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19
Identify the first and all non-zero values in every row in Pandas DataFrame
For first you can select only numeric columns and replace non 0
value by 1
in DataFrame.mask
, then for second add cumulative sum per axis=1
with compare first 1
values by DataFrame.eq
and boolean mask convert to integers by DataFrame.astype
:
df1, df2 = data.copy(), data.copy()
cols = df1.select_dtypes(include=np.number).columns
df1[cols] = df1[cols].mask(data[cols].ne(0), 1)
df2[cols] = df1[cols].cumsum(axis=1).eq(1).astype(int)
print(df1)
Name Date1 Date2 Date3 Date4 Date5
0 Juan 0 0 1 1 1
1 Luis 1 1 1 1 1
2 Maria 0 1 1 1 1
3 Laura 0 0 0 1 1
4 Lina 0 0 0 0 1
print(df2)
Name Date1 Date2 Date3 Date4 Date5
0 Juan 0 0 1 0 0
1 Luis 1 0 0 0 0
2 Maria 0 1 0 0 0
3 Laura 0 0 0 1 0
4 Lina 0 0 0 0 1
Get the column name for the first non-zero value in that row with pandas
It seems like you can use idxmax
here:
df.set_index('cnum').idxmax(axis=1).reset_index(drop=True)
0 sup1
1 sup1
2 sup3
3 sup2
dtype: object
df['output'] = df.set_index('cnum').idxmax(axis=1).reset_index(drop=True)
# Slightly faster,
# df['output'] = df.set_index('cnum').idxmax(axis=1).to_numpy()
df
cnum sup1 sup2 sup3 sup4 output
0 285414459 1 0 1 1 sup1
1 445633709 1 0 0 0 sup1
2 556714736 0 0 1 0 sup3
3 1089852074 0 1 0 1 sup2
Another option with dot
(will give you all non-zero columns):
d = df.set_index('cnum')
d.dot(d.columns + ',').str.rstrip(',').reset_index(drop=True)
0 sup1,sup3,sup4
1 sup1
2 sup3
3 sup2,sup4
dtype: object
Or,
(d.dot(d.columns + ',')
.str.rstrip(',')
.str.split(',', 1).str[0]
.reset_index(drop=True))
0 sup1
1 sup1
2 sup3
3 sup2
dtype: object
Find first non-zero element within a group in pandas
In the groupby, set sort
to False
, get the cumsum
, then filter for rows not equal to 0
:
df.loc[df.groupby(["group"], sort=False).target.cumsum() != 0]
group value target
2 1 3 2
3 1 4 0
4 1 5 1
8 2 4 1
9 2 5 3
Pandas dataframe column forward fill from first non-zero value
Use .values
attribute:
df['c']=df.groupby('ID',as_index = False)['c'].apply(lambda x: x.replace(to_replace=0, method='ffill')).values
Now if you print df
you will get your desired output:
ID b c
0 1 0 0
1 1 5 1
2 1 8 1
3 2 4 0
4 2 8 1
5 2 81 1
Related Topics
Filtering Dataframe Using the Length of a Column
Using Continue in a Try and Except Inside While-Loop
Package Only Binary Compiled .So Files of a Python Library Compiled With Cython
How to Use Ffmpeg in a Python Function
_Corrupt_Record Error When Reading a Json File into Spark
How to Locate the Input Within Div
Get the Last Sunday and Saturday'S Date in Python
Finding the Two Closest Numbers in a List Using Sorting
How to Set the Default Python Path for Anaconda on Linux
How Does the Code Prints 1 2 6 24 as Output and Not 24 6 2 1
How to Remove Commas and Dots of Individual Word in Two Dimensional List
How to Expand Input Buffer Size of Pyserial
Test If Dictionary Key Exists, Is Not None and Isn't Blank
How to Pass a Dictionary Object as Parameter for a Function in Python
If-Condition With Multiple Actions in Robot Framework