How to Select Percentage of Rows in Pandas Dataframe

Calculate percentage of rows with a column above a certain value Pandas

You can compare values for greater like 30 with aggregate mean:

df = (df.B > 30).groupby(df['A']).mean().mul(100).reset_index(name='C')
print (df)
   A     C
0  r  60.0

Or:

df = df.assign(C = df.B > 30).groupby('A')['C'].mean().mul(100).reset_index()

How to calculate the percentage of rows in pandas?

import pandas as pd 

data = {'Conditions ': ['NormalBlink1400', 'NormalBlink2000', 'NormalBlink3000', 'NormalBlink4000','NormalNoBlink1400'], 
        '-1': [48, 74, 77, 67,40], 
        '1': [108, 124, 147, 150,119]} 

df = pd.DataFrame(data) 

df['perc_one '] = (df["1"] / (df["1"] + df["-1"]))
df["perc_minus_one"] = (df["-1"] / (df["1"] + df["-1"]))

Output:

df.head()

enter image description here

Compute row percentages in pandas DataFrame?

div + sum

For a vectorised solution, divide the dataframe along axis=0 by its sum over axis=1. You can use set_index + reset_index to ignore the identifier column.

df = df.set_index('cat')
res = df.div(df.sum(axis=1), axis=0)

print(res.reset_index())

  cat      val1      val2      val3      val4
0   A  0.194444  0.277778  0.000000  0.527778
1   B  0.370370  0.074074  0.037037  0.518519
2   C  0.119048  0.357143  0.142857  0.380952

Pandas - get first n-rows based on percentage

I want to pop first 5% of record

There is no built-in method but you can do this:

You can multiply the total number of rows to your percent and use the result as parameter for head method.

n = 5
df.head(int(len(df)*(n/100)))

So if your dataframe contains 1000 rows and n = 5% you will get the first 50 rows.

Calculate percentage change between values of column in Pandas dataframe

pct_change is computing a change relative to the previous value (which is why 2017 is NaN), and this doesn't seem to be what you want. If you want to compute a percentage change relative to 2019, as 2019 is already normalized to 100, simply subtract 100:

df['Percentage_Change'] = df['Index'].sub(100)

output:

  Country     Industry  Year  Index  Percentage_Change
0      US  Agriculture  2017   83.0              -17.0
1      US  Agriculture  2018   97.2               -2.8
2      US  Agriculture  2019  100.0                0.0
3      US  Agriculture  2020  112.0               12.0
4      US  Agriculture  2021  108.0                8.0
5   Japan       Mining  2017   88.0              -12.0
6   Japan       Mining  2018   93.0               -7.0
7   Japan       Mining  2019  100.0                0.0
8   Japan       Mining  2020  104.0                4.0
9   Japan       Mining  2021  112.0               12.0

bidirectional `pct_change`

If you want a bidirectional pct_change "centered" on 100, you can use masks to compute the pct_change both ways:

df['Percentage_Change'] = (df
 .assign(ref=df['Year'].eq(2019))
 .groupby(['Country', 'Industry'], group_keys=False)
 .apply(lambda g: g['Index'].where(g['ref'].cummax()).pct_change()
                  .fillna(g['Index'][::-1].pct_change().mask(g['ref'].cummax(), 0))
       )
)

output:

  Country     Industry  Year  Index  Percentage_Change
0      US  Agriculture  2017   83.0          -0.146091
1      US  Agriculture  2018   97.2          -0.028000
2      US  Agriculture  2019  100.0           0.000000
3      US  Agriculture  2020  112.0           0.120000
4      US  Agriculture  2021  108.0          -0.035714
5   Japan       Mining  2017   88.0          -0.053763
6   Japan       Mining  2018   93.0          -0.070000
7   Japan       Mining  2019  100.0           0.000000
8   Japan       Mining  2020  104.0           0.040000
9   Japan       Mining  2021  112.0           0.076923

Python Pandas – How to calculate percentage difference row wise for a entire dataframe with NaNs

Use pct_change:

>>> df[['route']].join(df.filter(like='col').pct_change(axis=1).mul(100).round(2))

   route  col1    col2   col3    col4    col5
0      1   NaN     NaN   0.00    0.00    0.00
1      2   NaN  433.33   0.00  145.83    0.00
2      3   NaN    2.94  68.57  -10.17    0.00
3      4   NaN    0.00   0.00    0.00  433.33

How to Select Percentage of Rows in Pandas Dataframe

Calculate percentage of rows with a column above a certain value Pandas

How to calculate the percentage of rows in pandas?

Compute row percentages in pandas DataFrame?

div + sum

Pandas - get first n-rows based on percentage

Calculate percentage change between values of column in Pandas dataframe

bidirectional `pct_change`

Python Pandas – How to calculate percentage difference row wise for a entire dataframe with NaNs

Related Topics

Leave a reply

Calculate percentage of rows with a column above a certain value Pandas

How to calculate the percentage of rows in pandas?

Compute row percentages in pandas DataFrame?

div + sum

Pandas - get first n-rows based on percentage

Calculate percentage change between values of column in Pandas dataframe

bidirectional pct_change

Python Pandas – How to calculate percentage difference row wise for a entire dataframe with NaNs

Related Topics

Leave a reply

bidirectional `pct_change`