Find the max of two or more columns with pandas
You can get the maximum like this:
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [-2, 8, 1]})
>>> df
A B
0 1 -2
1 2 8
2 3 1
>>> df[["A", "B"]]
A B
0 1 -2
1 2 8
2 3 1
>>> df[["A", "B"]].max(axis=1)
0 1
1 8
2 3
and so:
>>> df["C"] = df[["A", "B"]].max(axis=1)
>>> df
A B C
0 1 -2 1
1 2 8 8
2 3 1 3
If you know that "A" and "B" are the only columns, you could even get away with
>>> df["C"] = df.max(axis=1)
And you could use .apply(max, axis=1)
too, I guess.
Select Pandas dataframe row where two or more columns have their maximum value together
You can do it with slicing:
output = df.loc[(df['Feat1'] + df['Feat3']).to_frame().idxmax(),:]
This outputs:
Institution Feat1 Feat2 Feat3
1 ID2 322.12 1 0.94
Alternatively you can always create a column and slice through it, but this would require a bit of an extra effort.
df['filter'] = df['Feat1'] + df['Feat3']
output = df[df['filter'] == df['filter'].max()]
Create Pandas column with the max of two calculated values from other columns
Use np.maximum
:
df['max'] =np.maximum(df['A']*3, df['B']+df['A'])
Output:
A B max
0 1 -2 3
1 2 8 10
2 3 1 9
Second and third largest values within multiple columns in Pandas
To find the second largest values of each row, you can use nlargest
; apply a function to each row:
df['2nd_largest'] = df[["A1", "B1", "C1", "D1", "E1", "F1"]].apply(lambda row: row.nlargest(2).iat[-1], axis=1)
Get Max value comparing multiple columns and return specific values
Try the following, quite short code, based mainly on Numpy:
vv = df.iloc[:, 1::2].values
iRow, iCol = np.unravel_index(vv.argmax(), vv.shape)
iCol = iCol * 2 + 1
result = df.iloc[iRow, [0, iCol, iCol + 1]]
The result is a Series:
Sequence 1008
Duration3 981
Value3 82
Name: 7, dtype: int64
If you want to "rehape" it (first index values, then actual values),
you can get something like this executing:
pd.DataFrame([result.values], columns=result.index)
python get max and min values across mutiple columns while grouping a dataframe
You can melt
the DataFrame so that you consider either 'actual' or 'budget' when calculating the min or max. Then group the melted DataFrame and merge back.
id_vars = ['measure', 'measure_group', 'route']
df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])
.groupby(id_vars)['value']
.agg(['min', 'max']))
df = df.merge(df1, how='left', on=id_vars)
measure measure_group route year actual budget min max
0 AC electrification A 20182019 103 99 99 122
1 AC electrification A 20192020 110 122 99 122
2 AC electrification B 20182019 9 10 9 55
3 AC electrification B 20192020 55 50 9 55
4 HV electrification A 20182019 2 10 2 15
5 HV electrification A 20192020 7 15 2 15
6 HV electrification B 20182019 67 10 10 115
7 HV electrification B 20192020 100 115 10 115
8 cat1 track A 20182019 10 15 10 111
9 cat1 track A 20192020 111 25 10 111
10 cat1 track B 20182019 55 16 16 175
11 cat1 track B 20192020 75 175 16 175
12 cat2 track A 20182019 84 5 5 1005
13 cat2 track A 20192020 125 1005 5 1005
14 cat2 track B 20182019 7 4 4 25
15 cat2 track B 20192020 15 25 4 25
Python Pandas add column for row-wise max value of selected columns
>>> frame['HighScore'] = frame[['test1','test2','test3']].max(axis=1)
>>> frame
name test1 test2 test3 HighScore
0 bill 85 35 51 85
1 joe 75 45 61 75
2 steve 85 83 45 85
Python/ Pandas: calculate 1. minimum, 2. max of columns to left of minimum and 3. max of columns to right of minimum
You can use .iloc[:1,:]
to only select after the first column, and use a bunch of pandas methods like .min
, .max
, idxmin
, idxmax
and others:
df['nadir'] = df.iloc[:,1:].min(axis=1)
df['nadir_qtr'] = df.iloc[:,1:].idxmin(axis=1).apply(lambda x: df.columns.get_loc(x))
df['new'] = [df.iloc[i].values for i in df.index]
df['pre_peak'] = df.apply(lambda x: max(x['new'][0:x['nadir_qtr']]), axis=1)
df['post_peak'] = df.apply(lambda x: max(x['new'][x['nadir_qtr']:]), axis=1)
df['pre_peak_qtr'] = pd.Series([s[i] for i, s in zip(df.index, df['pre_peak'].apply(
lambda x: [i for i in (df.iloc[:,0:-6] == x)
.idxmax(axis=1)]))]).apply(lambda x: df.columns.get_loc(x))
df['post_peak_qtr'] = pd.Series([s[i] for i, s in zip(df.index, df['post_peak'].apply(
lambda x: [i for i in (df.iloc[:,0:-6] == x)
.idxmax(axis=1)]))]).apply(lambda x: df.columns.get_loc(x))
df_new = df[['nadir', 'nadir_qtr', 'pre_peak', 'pre_peak_qtr', 'post_peak', 'post_peak_qtr']]
df_new
Out[1]:
nadir nadir_qtr pre_peak pre_peak_qtr post_peak post_peak_qtr
idx
0 4039370.0 7 4114911.0 1 4254681.0 11
1 21566.0 1 21226.0 0 23232.0 5
2 95958.0 7 103054.0 5 123064.0 9
3 22080.0 11 24186.0 2 22080.0 11
4 6722.0 7 7906.0 1 8326.0 11
Related Topics
How to Specify the Function Type in My Type Hints
Beautifulsoup Webscraping Find_All( ): Finding Exact Match
Parse HTML Table to Python List
How to Use the Python HTMLparser Library to Extract Data from a Specific Div Tag
Is There a Built in Package to Parse HTML into Dom
Paramiko Error When Trying to Edit File: "Sudo: No Tty Present and No Askpass Program Specified"
Arranging Text Files Side by Side Using Python
Programming on Samsung Chromebook
How to Check If a Process Is Still Running Using Python on Linux
How to Read Realtime Microphone Audio Volume in Python and Ffmpeg or Similar
Installing Python Modules on Ubuntu
How to Make a Call to an Executable from Python Script
Keyboard Interrupts with Python's Multiprocessing Pool
How to Replace Text in a String Column of a Pandas Dataframe