Python: Scaling numbers column by column with pandas
You could subtract by the min, then divide by the max (beware 0/0). Note that after subtracting the min, the new max is the original max - min.
In [11]: df
Out[11]:
a b
A 14 103
B 90 107
C 90 110
D 96 114
E 91 114
In [12]: df -= df.min() # equivalent to df = df - df.min()
In [13]: df /= df.max() # equivalent to df = df / df.max()
In [14]: df
Out[14]:
a b
A 0.000000 0.000000
B 0.926829 0.363636
C 0.926829 0.636364
D 1.000000 1.000000
E 0.939024 1.000000
To switch the order of a column (from 1 to 0 rather than 0 to 1):
In [15]: df['b'] = 1 - df['b']
An alternative method is to negate the b columns first (df['b'] = -df['b']
).
how to scale columns by column from another Pandas dataframe
Assuming the columns are unique, and there are no duplicates in scaling
, you could use map
:
df.mul(df.columns.map(scaling.set_index("id").scaling))
A B C
0 0.2 1.2 2.8
1 0.4 1.5 3.2
2 0.6 1.8 3.6
Scaling pandas column to be between specified min and max numbers
Just change a, b = 10, 50
to a, b = 0, 1
in linked answer for upper and lower values for scale:
a, b = 0, 1
x, y = df.Frequency.min(), df.Frequency.max()
df['normal'] = (df.Frequency - x) / (y - x) * (b - a) + a
print (df)
Frequency normal
0 20 1.000000
1 14 0.684211
2 10 0.473684
3 8 0.368421
4 6 0.263158
5 2 0.052632
6 1 0.000000
Scaling / Normalizing pandas column
Option 1sklearn
You see this problem time and time again, the error really should be indicative of what you need to do. You're basically missing a superfluous dimension on the input. Change df["TOTAL"]
to df[["TOTAL"]]
.
df['SIZE'] = scaler.fit_transform(df[["TOTAL"]])
df
TOTAL Name SIZE
0 3232 Jane 24.413959
1 382 Jack 10.000000
2 8291 Jones 50.000000
Option 2pandas
Preferably, I would bypass sklearn and just do the min-max scaling myself.
a, b = 10, 50
x, y = df.TOTAL.min(), df.TOTAL.max()
df['SIZE'] = (df.TOTAL - x) / (y - x) * (b - a) + a
df
TOTAL Name SIZE
0 3232 Jane 24.413959
1 382 Jack 10.000000
2 8291 Jones 50.000000
This is essentially what the min-max scaler does, but without the overhead of importing scikit learn (don't do it unless you have to, it's a heavy library).
pandas dataframe columns scaling with sklearn
I am not sure if previous versions of pandas
prevented this but now the following snippet works perfectly for me and produces exactly what you want without having to use apply
>>> import pandas as pd
>>> from sklearn.preprocessing import MinMaxScaler
>>> scaler = MinMaxScaler()
>>> dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],
'B':[103.02,107.26,110.35,114.23,114.68],
'C':['big','small','big','small','small']})
>>> dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A', 'B']])
>>> dfTest
A B C
0 0.000000 0.000000 big
1 0.926219 0.363636 small
2 0.935335 0.628645 big
3 1.000000 0.961407 small
4 0.938495 1.000000 small
Scaling euler number in Pandas Column
This is the scientific notation of Pandas
and is it's way of dealing with very large or small floats
.
Although not necessary, multiple methods exist if you wish to convert your floats to another format:
1. use apply()
df.apply(lambda x: '%.5f' %x, axis=1)
2. set the global options of pandas
pd.set_option('display.float_format', lambda x: '%.5f' %x)
3. use df.round()
. This only works if you have very small numbers with a lot of dcimals
df.round(2)
Normalize/scale dataframe in a certain range
We can use MinMaxScaler
to perform feature scaling, MinMaxScaler
supports a parameter called feature_range
which allows us to specify the desired range of the transformed data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0.6, 8.4))
df['normalized'] = scaler.fit_transform(df['wind power [W]'].values[:, None])
Alternatively if you don't want to use MinMaxScaler
, here is a way scale data in pandas only:
w = df['wind power [W]'].agg(['min', 'max'])
norm = (df['wind power [W]'] - w['min']) / (w['max'] - w['min'])
df['normalized'] = norm * (8.4 - 0.6) + 0.6
print(df)
DateTime wind power [W] normalized
0 2022-02-08 00:00:00 83.9 8.400000
1 2022-02-08 00:10:00 57.2 2.598886
2 2022-02-08 00:20:00 58.2 2.816156
3 2022-02-08 00:30:00 48.0 0.600000
4 2022-02-08 00:40:00 69.5 5.271309
Normalize columns of pandas data frame
You can use the package sklearn and its associated preprocessing utilities to normalize the data.
import pandas as pd
from sklearn import preprocessing
x = df.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled)
For more information look at the scikit-learn documentation on preprocessing data: scaling features to a range.
How to re-scale a column by percentage change and start from a given number
c[1] = 100
for i in range(2, 5):
c[i] = c[i-1] * (1+b[i])
The way you allocate/assign to c
is incorrect.
You first need to allocate c
of appropriate length, then assign the first element, which is 0
, not 1
, and the loop should start from 1. Arrays/lists in Python are 0-indexed, meaning an array of length 5 is counted from 0-4.
Try this:
a = pd.Series([4, 5, 6, 3, 2])
# no need for the fillna, as the first element is never used
# it is better to leave it as NaN to avoid confusion with no change
b = a.pct_change()
c = pd.Series([0] * len(a))
c[0] = 100
for i in range(1, len(a)):
c[i] = c[i-1] * (1+ b[i])
For the chosen a
, you get the following c
:
0 100
1 125
2 150
3 75
4 50
dtype: int64
Note that you cannot get rid of the for-loop, because your calculation has a sequential dependence (depends on the previous element); vectorisation requires every element be calculated independently. If someone else has a vectorised solution, I would be happy to know.
Python : Scale columns in pandas dataframe
Multiple DataFrame with dictionary, working well if keys are same like columns names:
df = df.mul(scalingDictionary)
print (df)
a b c
0 20.0 15.0 0.1
1 40.0 30.0 0.2
2 60.0 45.0 0.3
3 80.0 60.0 0.4
If some columns not match:
scalingDictionary = {'a': 10, 'b': 5}
df = pd.DataFrame({'a':[2,4,6,8], 'b':[3,6,9,12], 'c':[1,2,3,4]})
df = df.mul(pd.Series(scalingDictionary).reindex(df.columns, fill_value=1))
print (df)
a b c
0 20 15 1
1 40 30 2
2 60 45 3
3 80 60 4
Or:
df = df.mul({**dict.fromkeys(df.columns, 1), **scalingDictionary})
print (df)
a b c
0 20 15 1
1 40 30 2
2 60 45 3
3 80 60 4
Related Topics
How to Get Maximum Length of Each Column in the Data Frame Using Pandas Python
How to Map True/False to 1/0 in a Pandas Dataframe
How to Check Whether a Number Is Divisible by Another Number
How to Delete Tkinter Widgets from a Window
How to Install Pip for a Specific Python Version
Converting a List into Comma Separated and Add Quotes in Python
Printing the Number of Days in a Given Month and Year [Python]
Easiest Way to Ignore Blank Lines When Reading a File in Python
How to Start a Background Process in Python
How to Transfer Data from One Worksheet into Another Using Python in the Same Workbook
How to Convert Number 1 to a Boolean in Python
Python Tkinter Return Value from Function Used in Command
Print Floating Point Values Without Leading Zero
Opening a Word Document That Has a Password Using Docx Library
Could Not Translate Host Name "Db" to Address Using Postgres, Docker Compose and Psycopg2