How to Select Percentage of Rows in Pandas Dataframe

Calculate percentage of rows with a column above a certain value Pandas

You can compare values for greater like 30 with aggregate mean:

df = (df.B > 30).groupby(df['A']).mean().mul(100).reset_index(name='C')
print (df)
A C
0 r 60.0

Or:

df = df.assign(C = df.B > 30).groupby('A')['C'].mean().mul(100).reset_index()

How to calculate the percentage of rows in pandas?

import pandas as pd 

data = {'Conditions ': ['NormalBlink1400', 'NormalBlink2000', 'NormalBlink3000', 'NormalBlink4000','NormalNoBlink1400'],
'-1': [48, 74, 77, 67,40],
'1': [108, 124, 147, 150,119]}

df = pd.DataFrame(data)

df['perc_one '] = (df["1"] / (df["1"] + df["-1"]))
df["perc_minus_one"] = (df["-1"] / (df["1"] + df["-1"]))

Output:

df.head()

enter image description here

Compute row percentages in pandas DataFrame?

div + sum

For a vectorised solution, divide the dataframe along axis=0 by its sum over axis=1. You can use set_index + reset_index to ignore the identifier column.

df = df.set_index('cat')
res = df.div(df.sum(axis=1), axis=0)

print(res.reset_index())

cat val1 val2 val3 val4
0 A 0.194444 0.277778 0.000000 0.527778
1 B 0.370370 0.074074 0.037037 0.518519
2 C 0.119048 0.357143 0.142857 0.380952

Pandas - get first n-rows based on percentage

I want to pop first 5% of record

There is no built-in method but you can do this:

You can multiply the total number of rows to your percent and use the result as parameter for head method.

n = 5
df.head(int(len(df)*(n/100)))

So if your dataframe contains 1000 rows and n = 5% you will get the first 50 rows.

Calculate percentage change between values of column in Pandas dataframe

pct_change is computing a change relative to the previous value (which is why 2017 is NaN), and this doesn't seem to be what you want. If you want to compute a percentage change relative to 2019, as 2019 is already normalized to 100, simply subtract 100:

df['Percentage_Change'] = df['Index'].sub(100)

output:

  Country     Industry  Year  Index  Percentage_Change
0 US Agriculture 2017 83.0 -17.0
1 US Agriculture 2018 97.2 -2.8
2 US Agriculture 2019 100.0 0.0
3 US Agriculture 2020 112.0 12.0
4 US Agriculture 2021 108.0 8.0
5 Japan Mining 2017 88.0 -12.0
6 Japan Mining 2018 93.0 -7.0
7 Japan Mining 2019 100.0 0.0
8 Japan Mining 2020 104.0 4.0
9 Japan Mining 2021 112.0 12.0

bidirectional pct_change

If you want a bidirectional pct_change "centered" on 100, you can use masks to compute the pct_change both ways:

df['Percentage_Change'] = (df
.assign(ref=df['Year'].eq(2019))
.groupby(['Country', 'Industry'], group_keys=False)
.apply(lambda g: g['Index'].where(g['ref'].cummax()).pct_change()
.fillna(g['Index'][::-1].pct_change().mask(g['ref'].cummax(), 0))
)
)

output:

  Country     Industry  Year  Index  Percentage_Change
0 US Agriculture 2017 83.0 -0.146091
1 US Agriculture 2018 97.2 -0.028000
2 US Agriculture 2019 100.0 0.000000
3 US Agriculture 2020 112.0 0.120000
4 US Agriculture 2021 108.0 -0.035714
5 Japan Mining 2017 88.0 -0.053763
6 Japan Mining 2018 93.0 -0.070000
7 Japan Mining 2019 100.0 0.000000
8 Japan Mining 2020 104.0 0.040000
9 Japan Mining 2021 112.0 0.076923

Python Pandas – How to calculate percentage difference row wise for a entire dataframe with NaNs

Use pct_change:

>>> df[['route']].join(df.filter(like='col').pct_change(axis=1).mul(100).round(2))

route col1 col2 col3 col4 col5
0 1 NaN NaN 0.00 0.00 0.00
1 2 NaN 433.33 0.00 145.83 0.00
2 3 NaN 2.94 68.57 -10.17 0.00
3 4 NaN 0.00 0.00 0.00 433.33


Related Topics



Leave a reply



Submit