Find the Max of Two or More Columns with Pandas

Find the max of two or more columns with pandas

You can get the maximum like this:

>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [-2, 8, 1]})
>>> df
A B
0 1 -2
1 2 8
2 3 1
>>> df[["A", "B"]]
A B
0 1 -2
1 2 8
2 3 1
>>> df[["A", "B"]].max(axis=1)
0 1
1 8
2 3

and so:

>>> df["C"] = df[["A", "B"]].max(axis=1)
>>> df
A B C
0 1 -2 1
1 2 8 8
2 3 1 3

If you know that "A" and "B" are the only columns, you could even get away with

>>> df["C"] = df.max(axis=1)

And you could use .apply(max, axis=1) too, I guess.

Select Pandas dataframe row where two or more columns have their maximum value together

You can do it with slicing:

output = df.loc[(df['Feat1'] + df['Feat3']).to_frame().idxmax(),:]

This outputs:

  Institution   Feat1  Feat2  Feat3
1 ID2 322.12 1 0.94

Alternatively you can always create a column and slice through it, but this would require a bit of an extra effort.

df['filter'] = df['Feat1'] + df['Feat3']
output = df[df['filter'] == df['filter'].max()]

Create Pandas column with the max of two calculated values from other columns

Use np.maximum:

df['max'] =np.maximum(df['A']*3, df['B']+df['A'])

Output:

   A  B  max
0 1 -2 3
1 2 8 10
2 3 1 9

Second and third largest values within multiple columns in Pandas

To find the second largest values of each row, you can use nlargest; apply a function to each row:

df['2nd_largest'] = df[["A1", "B1", "C1", "D1", "E1", "F1"]].apply(lambda row: row.nlargest(2).iat[-1], axis=1)

Get Max value comparing multiple columns and return specific values

Try the following, quite short code, based mainly on Numpy:

vv = df.iloc[:, 1::2].values
iRow, iCol = np.unravel_index(vv.argmax(), vv.shape)
iCol = iCol * 2 + 1
result = df.iloc[iRow, [0, iCol, iCol + 1]]

The result is a Series:

Sequence     1008
Duration3 981
Value3 82
Name: 7, dtype: int64

If you want to "rehape" it (first index values, then actual values),
you can get something like this executing:

pd.DataFrame([result.values], columns=result.index)

python get max and min values across mutiple columns while grouping a dataframe

You can melt the DataFrame so that you consider either 'actual' or 'budget' when calculating the min or max. Then group the melted DataFrame and merge back.

id_vars = ['measure', 'measure_group', 'route']

df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])
.groupby(id_vars)['value']
.agg(['min', 'max']))

df = df.merge(df1, how='left', on=id_vars)


   measure    measure_group route      year  actual  budget  min   max
0 AC electrification A 20182019 103 99 99 122
1 AC electrification A 20192020 110 122 99 122
2 AC electrification B 20182019 9 10 9 55
3 AC electrification B 20192020 55 50 9 55
4 HV electrification A 20182019 2 10 2 15
5 HV electrification A 20192020 7 15 2 15
6 HV electrification B 20182019 67 10 10 115
7 HV electrification B 20192020 100 115 10 115
8 cat1 track A 20182019 10 15 10 111
9 cat1 track A 20192020 111 25 10 111
10 cat1 track B 20182019 55 16 16 175
11 cat1 track B 20192020 75 175 16 175
12 cat2 track A 20182019 84 5 5 1005
13 cat2 track A 20192020 125 1005 5 1005
14 cat2 track B 20182019 7 4 4 25
15 cat2 track B 20192020 15 25 4 25

Python Pandas add column for row-wise max value of selected columns

>>> frame['HighScore'] = frame[['test1','test2','test3']].max(axis=1)
>>> frame
name test1 test2 test3 HighScore
0 bill 85 35 51 85
1 joe 75 45 61 75
2 steve 85 83 45 85

Python/ Pandas: calculate 1. minimum, 2. max of columns to left of minimum and 3. max of columns to right of minimum

You can use .iloc[:1,:] to only select after the first column, and use a bunch of pandas methods like .min, .max, idxmin, idxmax and others:

df['nadir'] = df.iloc[:,1:].min(axis=1)
df['nadir_qtr'] = df.iloc[:,1:].idxmin(axis=1).apply(lambda x: df.columns.get_loc(x))
df['new'] = [df.iloc[i].values for i in df.index]
df['pre_peak'] = df.apply(lambda x: max(x['new'][0:x['nadir_qtr']]), axis=1)
df['post_peak'] = df.apply(lambda x: max(x['new'][x['nadir_qtr']:]), axis=1)
df['pre_peak_qtr'] = pd.Series([s[i] for i, s in zip(df.index, df['pre_peak'].apply(
lambda x: [i for i in (df.iloc[:,0:-6] == x)
.idxmax(axis=1)]))]).apply(lambda x: df.columns.get_loc(x))
df['post_peak_qtr'] = pd.Series([s[i] for i, s in zip(df.index, df['post_peak'].apply(
lambda x: [i for i in (df.iloc[:,0:-6] == x)
.idxmax(axis=1)]))]).apply(lambda x: df.columns.get_loc(x))
df_new = df[['nadir', 'nadir_qtr', 'pre_peak', 'pre_peak_qtr', 'post_peak', 'post_peak_qtr']]
df_new
Out[1]:
nadir nadir_qtr pre_peak pre_peak_qtr post_peak post_peak_qtr
idx
0 4039370.0 7 4114911.0 1 4254681.0 11
1 21566.0 1 21226.0 0 23232.0 5
2 95958.0 7 103054.0 5 123064.0 9
3 22080.0 11 24186.0 2 22080.0 11
4 6722.0 7 7906.0 1 8326.0 11


Related Topics



Leave a reply



Submit