## Python: Scaling numbers column by column with pandas

You could subtract by the min, then divide by the max (beware 0/0). Note that after subtracting the min, the new max is the original max - min.

`In [11]: df`

Out[11]:

a b

A 14 103

B 90 107

C 90 110

D 96 114

E 91 114

In [12]: df -= df.min() # equivalent to df = df - df.min()

In [13]: df /= df.max() # equivalent to df = df / df.max()

In [14]: df

Out[14]:

a b

A 0.000000 0.000000

B 0.926829 0.363636

C 0.926829 0.636364

D 1.000000 1.000000

E 0.939024 1.000000

To switch the order of a column (from 1 to 0 rather than 0 to 1):

`In [15]: df['b'] = 1 - df['b']`

*An alternative method is to negate the b columns first ( df['b'] = -df['b']).*

## how to scale columns by column from another Pandas dataframe

Assuming the columns are unique, and there are no duplicates in `scaling`

, you could use `map`

:

`df.mul(df.columns.map(scaling.set_index("id").scaling))`

A B C

0 0.2 1.2 2.8

1 0.4 1.5 3.2

2 0.6 1.8 3.6

## Scaling pandas column to be between specified min and max numbers

Just change `a, b = 10, 50`

to `a, b = 0, 1`

in linked answer for upper and lower values for scale:

`a, b = 0, 1`

x, y = df.Frequency.min(), df.Frequency.max()

df['normal'] = (df.Frequency - x) / (y - x) * (b - a) + a

print (df)

Frequency normal

0 20 1.000000

1 14 0.684211

2 10 0.473684

3 8 0.368421

4 6 0.263158

5 2 0.052632

6 1 0.000000

## Scaling / Normalizing pandas column

**Option 1**`sklearn`

You see this problem time and time again, the error really should be indicative of what you need to do. You're basically missing a superfluous dimension on the input. Change `df["TOTAL"]`

to `df[["TOTAL"]]`

.

`df['SIZE'] = scaler.fit_transform(df[["TOTAL"]])`

`df`

TOTAL Name SIZE

0 3232 Jane 24.413959

1 382 Jack 10.000000

2 8291 Jones 50.000000

**Option 2**`pandas`

Preferably, I would bypass sklearn and just do the min-max scaling myself.

`a, b = 10, 50`

x, y = df.TOTAL.min(), df.TOTAL.max()

df['SIZE'] = (df.TOTAL - x) / (y - x) * (b - a) + a

`df`

TOTAL Name SIZE

0 3232 Jane 24.413959

1 382 Jack 10.000000

2 8291 Jones 50.000000

This is essentially what the min-max scaler does, but without the overhead of importing scikit learn (don't do it unless you have to, it's a heavy library).

## pandas dataframe columns scaling with sklearn

I am not sure if previous versions of `pandas`

prevented this but now the following snippet works perfectly for me and produces exactly what you want without having to use `apply`

`>>> import pandas as pd`

>>> from sklearn.preprocessing import MinMaxScaler

>>> scaler = MinMaxScaler()

>>> dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],

'B':[103.02,107.26,110.35,114.23,114.68],

'C':['big','small','big','small','small']})

>>> dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A', 'B']])

>>> dfTest

A B C

0 0.000000 0.000000 big

1 0.926219 0.363636 small

2 0.935335 0.628645 big

3 1.000000 0.961407 small

4 0.938495 1.000000 small

## Scaling euler number in Pandas Column

This is the scientific notation of `Pandas`

and is it's way of dealing with very large or small `floats`

.

Although not necessary, multiple methods exist if you wish to convert your floats to another format:

**1.** use `apply()`

`df.apply(lambda x: '%.5f' %x, axis=1)`

**2.** set the global options of pandas

`pd.set_option('display.float_format', lambda x: '%.5f' %x)`

**3.** use `df.round()`

. This only works if you have very small numbers with a lot of dcimals

`df.round(2)`

## Normalize/scale dataframe in a certain range

We can use `MinMaxScaler`

to perform feature scaling, `MinMaxScaler`

supports a parameter called `feature_range`

which allows us to specify the desired range of the transformed data

`from sklearn.preprocessing import MinMaxScaler`

scaler = MinMaxScaler(feature_range=(0.6, 8.4))

df['normalized'] = scaler.fit_transform(df['wind power [W]'].values[:, None])

Alternatively if you don't want to use `MinMaxScaler`

, here is a way scale data in pandas only:

`w = df['wind power [W]'].agg(['min', 'max'])`

norm = (df['wind power [W]'] - w['min']) / (w['max'] - w['min'])

df['normalized'] = norm * (8.4 - 0.6) + 0.6

`print(df)`

DateTime wind power [W] normalized

0 2022-02-08 00:00:00 83.9 8.400000

1 2022-02-08 00:10:00 57.2 2.598886

2 2022-02-08 00:20:00 58.2 2.816156

3 2022-02-08 00:30:00 48.0 0.600000

4 2022-02-08 00:40:00 69.5 5.271309

## Normalize columns of pandas data frame

You can use the package sklearn and its associated preprocessing utilities to normalize the data.

`import pandas as pd`

from sklearn import preprocessing

x = df.values #returns a numpy array

min_max_scaler = preprocessing.MinMaxScaler()

x_scaled = min_max_scaler.fit_transform(x)

df = pd.DataFrame(x_scaled)

For more information look at the scikit-learn documentation on preprocessing data: scaling features to a range.

## How to re-scale a column by percentage change and start from a given number

`c[1] = 100`

for i in range(2, 5):

c[i] = c[i-1] * (1+b[i])

The way you allocate/assign to `c`

is incorrect.

You first need to allocate `c`

of appropriate length, then assign the first element, which is `0`

, not `1`

, and the loop should start from 1. Arrays/lists in Python are 0-indexed, meaning an array of length 5 is counted from 0-4.

Try this:

`a = pd.Series([4, 5, 6, 3, 2])`

# no need for the fillna, as the first element is never used

# it is better to leave it as NaN to avoid confusion with no change

b = a.pct_change()

c = pd.Series([0] * len(a))

c[0] = 100

for i in range(1, len(a)):

c[i] = c[i-1] * (1+ b[i])

For the chosen `a`

, you get the following `c`

:

`0 100`

1 125

2 150

3 75

4 50

dtype: int64

Note that you cannot get rid of the for-loop, because your calculation has a sequential dependence (depends on the previous element); vectorisation requires every element be calculated independently. If someone else has a vectorised solution, I would be happy to know.

## Python : Scale columns in pandas dataframe

Multiple DataFrame with dictionary, working well if keys are same like columns names:

`df = df.mul(scalingDictionary) `

print (df)

a b c

0 20.0 15.0 0.1

1 40.0 30.0 0.2

2 60.0 45.0 0.3

3 80.0 60.0 0.4

If some columns not match:

`scalingDictionary = {'a': 10, 'b': 5} `

df = pd.DataFrame({'a':[2,4,6,8], 'b':[3,6,9,12], 'c':[1,2,3,4]})

df = df.mul(pd.Series(scalingDictionary).reindex(df.columns, fill_value=1))

print (df)

a b c

0 20 15 1

1 40 30 2

2 60 45 3

3 80 60 4

Or:

`df = df.mul({**dict.fromkeys(df.columns, 1), **scalingDictionary})`

print (df)

a b c

0 20 15 1

1 40 30 2

2 60 45 3

3 80 60 4

### Related Topics

How to Get Maximum Length of Each Column in the Data Frame Using Pandas Python

How to Map True/False to 1/0 in a Pandas Dataframe

How to Check Whether a Number Is Divisible by Another Number

How to Delete Tkinter Widgets from a Window

How to Install Pip for a Specific Python Version

Converting a List into Comma Separated and Add Quotes in Python

Printing the Number of Days in a Given Month and Year [Python]

Easiest Way to Ignore Blank Lines When Reading a File in Python

How to Start a Background Process in Python

How to Transfer Data from One Worksheet into Another Using Python in the Same Workbook

How to Convert Number 1 to a Boolean in Python

Python Tkinter Return Value from Function Used in Command

Print Floating Point Values Without Leading Zero

Opening a Word Document That Has a Password Using Docx Library

Could Not Translate Host Name "Db" to Address Using Postgres, Docker Compose and Psycopg2