Normalize Data in Pandas

Normalize columns of a dataframe

You can use the package sklearn and its associated preprocessing utilities to normalize the data.

import pandas as pd
from sklearn import preprocessing

x = df.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled)

For more information look at the scikit-learn documentation on preprocessing data: scaling features to a range.

Normalize data in pandas

In [92]: df
Out[92]:
           a         b          c         d
A  -0.488816  0.863769   4.325608 -4.721202
B -11.937097  2.993993 -12.916784 -1.086236
C  -5.569493  4.672679  -2.168464 -9.315900
D   8.892368  0.932785   4.535396  0.598124

In [93]: df_norm = (df - df.mean()) / (df.max() - df.min())

In [94]: df_norm
Out[94]:
          a         b         c         d
A  0.085789 -0.394348  0.337016 -0.109935
B -0.463830  0.164926 -0.650963  0.256714
C -0.158129  0.605652 -0.035090 -0.573389
D  0.536170 -0.376229  0.349037  0.426611

In [95]: df_norm.mean()
Out[95]:
a   -2.081668e-17
b    4.857226e-17
c    1.734723e-17
d   -1.040834e-17

In [96]: df_norm.max() - df_norm.min()
Out[96]:
a    1
b    1
c    1
d    1

min max normalization dataframe in pandas

Use MinMaxScaler.

df = pd.DataFrame({'A': [1, 2, 5, 3], 'B': [10, 0, 3, 7], 'C': [100, 200, 50, 500]})
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaler = scaler.fit(df)
scaler.transform(df)

Results

array([[0.        , 1.        , 0.11111111],
       [0.25      , 0.        , 0.33333333],
       [1.        , 0.3       , 0.        ],
       [0.5       , 0.7       , 1.        ]])

Now using the same scaler on new data

df_new = pd.DataFrame({'A': [10, 15, 20], 'B': [18, 17, 15], 'C': [250, 300, 150]})
scaler.transform(df_new)

Results

array([[2.25      , 1.8       , 0.44444444],
       [3.5       , 1.7       , 0.55555556],
       [4.75      , 1.5       , 0.22222222]])

How can I normalize data in a pandas dataframe to the starting value of a time series?

IIUC, GroupBy.transform

df['Normalized'] = df['Parameter'].div(df.groupby('Patient')['Parameter']
                                         .transform('first'))
print(df)
  Patient  Visit  Parameter  Normalized
0       A      1         44    1.000000
1       A      2         47    1.068182
2       A      3         64    1.454545
3       B      1         67    1.000000
4       B      2         67    1.000000
5       B      3          9    0.134328
6       C      1         83    1.000000
7       C      2         21    0.253012
8       C      3         36    0.433735

df['Normalized'] = df['Parameter'].div(df.groupby('Patient')['Parameter']
                                         .transform('first')).round(2)
print(df)
  Patient  Visit  Parameter  Normalized
0       A      1         44        1.00
1       A      2         47        1.07
2       A      3         64        1.45
3       B      1         67        1.00
4       B      2         67        1.00
5       B      3          9        0.13
6       C      1         83        1.00
7       C      2         21        0.25
8       C      3         36        0.43

If you need create a new DataFrame:

df2 = df.assign(Normalized = df['Parameter'].div(df.groupby('Patient')['Parameter'].transform('first')))

We could also use lambda as I suggested.

Or:

df2 = df.copy()
df2['Normalized'] = df['Parameter'].div(df.groupby('Patient')['Parameter']
                                         .transform('first'))

Normalize/scale dataframe in a certain range

We can use MinMaxScaler to perform feature scaling, MinMaxScaler supports a parameter called feature_range which allows us to specify the desired range of the transformed data

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0.6, 8.4))
df['normalized'] = scaler.fit_transform(df['wind power [W]'].values[:, None])

Alternatively if you don't want to use MinMaxScaler, here is a way scale data in pandas only:

w = df['wind power [W]'].agg(['min', 'max'])
norm = (df['wind power [W]'] - w['min']) / (w['max'] - w['min'])
df['normalized'] = norm * (8.4 - 0.6) + 0.6

print(df)

             DateTime  wind power [W]  normalized
0 2022-02-08 00:00:00            83.9    8.400000
1 2022-02-08 00:10:00            57.2    2.598886
2 2022-02-08 00:20:00            58.2    2.816156
3 2022-02-08 00:30:00            48.0    0.600000
4 2022-02-08 00:40:00            69.5    5.271309

Un-Normalise Data Frame in Pandas

Transformers in sklearn have an inverse_transform method that does that. However you seem to do the normalization of features & target together so that can't be utilized as is. Therefore you can separate these:

# prepare two scalers
X_scaler = preprocessing.MinMaxScaler()
y_scaler = preprocessing.MinMaxScaler()

# features are everything but target
X = df.drop(columns="target")
y = df["target"]

# scale them separately
X_scaled = X_scaler.fit_transform(X)
y_scaled = y_scaler.fit_transform(y)

# training..
# ...

# prediction time
preds = ...
unnormalized_preds = y_scaler.inverse_transform(preds)

I want normalize data by dividing every row by the price on the first row

Use:

bova['Norm Close'] = bova['Close'] / bova['Close'][0]
print(bova[['Close', 'Norm Close']])

# Output
                 Close  Norm Close
Date                              
2014-01-02   49.080002    1.000000
2014-01-03   49.259998    1.003667
2014-01-06   49.840000    1.015485
2014-01-07   49.230000    1.003056
2014-01-08   49.279999    1.004075
...                ...         ...
2021-12-23  100.849998    2.054808
2021-12-27  101.599998    2.070090
2021-12-28  101.059998    2.059087
2021-12-29  100.250000    2.042583
2021-12-30  100.800003    2.053790

[1986 rows x 2 columns]

Normalization Of single Column Of Dataframe

Try:

df1['Normalize'] = df1.groupby('Symbol')['Close'].transform(lambda x: x/x.iloc[0]).fillna(1)#.reset_index()

As commented by Shubham:

you can divide groups by its first value by

df['Close'] /= df1.groupby('Symbol')['Close'].transform('first')

df1:

    Date        Symbol      Close   Normalize
0   2020-11-23  APLAPOLLO   3247.45 1.000000
1   2020-11-24  APLAPOLLO   3219.95 0.991532
2   2020-11-25  APLAPOLLO   3220.45 0.991686
3   2020-11-26  APLAPOLLO   3178.95 0.978907
4   2020-11-27  APLAPOLLO   3378.90 1.040478
5   2020-12-01  APLAPOLLO   3446.85 1.061402
6   2020-12-02  APLAPOLLO   3514.55 1.082249
7   2020-12-03  APLAPOLLO   3545.80 1.091872
8   2020-12-04  APLAPOLLO   3708.60 1.142004
9   2020-12-07  APLAPOLLO   3868.55 1.191258
10  2020-12-08  APLAPOLLO   3750.30 1.154845
11  2020-12-09  APLAPOLLO   3801.35 1.170565
12  2020-12-10  APLAPOLLO   3766.65 1.159879
13  2020-12-11  APLAPOLLO   3674.30 1.131442
14  2020-12-14  APLAPOLLO   3814.80 1.174706
15  2020-12-15  APLAPOLLO   780.55  0.240358
16  2020-12-16  APLAPOLLO   790.20  0.243329
17  2020-12-17  APLAPOLLO   791.20  0.243637
18  2020-12-18  APLAPOLLO   769.70  0.237017
19  2020-12-21  APLAPOLLO   726.60  0.223745
20  2020-12-22  APLAPOLLO   744.30  0.229195
21  2020-11-23  AUBANK      869.65  1.000000
22  2020-11-24  AUBANK      874.35  1.005404
23  2020-11-25  AUBANK      856.25  0.984592
24  2020-11-26  AUBANK      861.05  0.990111
25  2020-11-27  AUBANK      839.05  0.964813
26  2020-12-01  AUBANK      872.90  1.003737
27  2020-12-02  AUBANK      886.65  1.019548
28  2020-12-03  AUBANK      880.30  1.012246
29  2020-12-04  AUBANK      880.45  1.012419
30  2020-12-07  AUBANK      898.65  1.033347
31  2020-12-08  AUBANK      907.80  1.043868
32  2020-12-09  AUBANK      918.90  1.056632
33  2020-12-10  AUBANK      911.05  1.047605
34  2020-12-11  AUBANK      920.30  1.058242
35  2020-12-14  AUBANK      929.45  1.068763
36  2020-12-15  AUBANK      922.60  1.060887
37  2020-12-16  AUBANK      915.80  1.053067
38  2020-12-17  AUBANK      943.15  1.084517
39  2020-12-18  AUBANK      897.00  1.031449
40  2020-12-21  AUBANK      840.45  0.966423
41  2020-12-22  AUBANK      856.00  0.984304
42  2020-11-23  AARTIDRUGS  711.70  1.000000