How to implement the Softmax function in Python
They're both correct, but yours is preferred from the point of view of numerical stability.
You start with
e ^ (x - max(x)) / sum(e^(x - max(x))
By using the fact that a^(b - c) = (a^b)/(a^c) we have
= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))
= e ^ x / sum(e ^ x)
Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.
Softmax activation function using math library
The function math.exp
only works on scalars, and you can not apply it to the whole array. If you only want to use math
than you need to implement it elementwise:
import math
def soft_max(x):
exponents = []
for element in x:
exponents.append(math.exp(element))
summ = sum(exponents)
for i in range(len(exponents)):
exponents[i] = exponents[i] / summ
return exponents
if __name__=="__main__":
arr = [0.844521, 0.147048]
output = soft_max(arr)
print(output)
However I still want to emphasise, that using numpy
would solve the problem a lot easier:
import numpy as np
def soft_max(x):
e = np.exp(x)
return e / np.sum(e)
if __name__=="__main__":
arr = [0.844521, 0.147048]
output = soft_max(arr)
print(output)
python deeplearning softmax with numpy
The problem here is that sum(exp(x), axis=1)
returns a 1-D numpy array. Change it to sum(esp(x), axis=1, keepdims=True)
to avoid numpy from automatically dropping one dimension.
def softmax(x):
return np.exp(x)/np.sum(np.exp(x),axis=1, keepdims=True)
x = np.array([[1,3,6,-3,1],
[5,2,1,4,3]])
print(softmax(x))
print(f"1:{softmax(x)[0]} sum : {np.sum(softmax(x)[0])}")
print(f"2:{softmax(x)[1]} sum : {np.sum(softmax(x)[1])}")
Softmax function in neural network (Python)
The problem is in your sum. You are summing in axis 0 where you should keep axis 0 untouched.
To sum over all the entries in the same example, i.e., in the same line, you have to use axis 1 instead.
def softmax(A):
"""
Computes a softmax function.
Input: A (N, k) ndarray.
Returns: (N, k) ndarray.
"""
e = np.exp(A)
return e / np.sum(e, axis=1, keepdims=True)
Use keepdims
to preserve shape and be able to divide e
by the sum.
In your example, e
evaluates to:
[[ 1.10627664 1.22384801 1.35391446]
[ 1.49780395 1.65698552 1.83308438]]
then the sum for each example (denominator in the return
line) is:
[[ 3.68403911]
[ 4.98787384]]
The function then divides each line by its sum and gives the result you have in test_output
.
As MaxU pointed out, it is a good practice to remove the max before exponentiating, in order to avoid overflow:
e = np.exp(A - np.sum(A, axis=1, keepdims=True))
Implementation of softmax function returns nan for high inputs
According to softmax function, you need to iterate all elements in the array and compute the exponential for each individual element then divide it by the sum of the exponential of the all elements:
import numpy as np
a = [1,3,5]
for i in a:
print np.exp(i)/np.sum(np.exp(a))
0.015876239976466765
0.11731042782619837
0.8668133321973349
However if the numbers are too big the exponents will probably blow up (computer can not handle such big numbers):
a = [2345,3456,6543]
for i in a:
print np.exp(i)/np.sum(np.exp(a))
__main__:2: RuntimeWarning: invalid value encountered in double_scalars
nan
nan
nan
To avoid this, first shift the highest value in array to zero. Then compute the softmax. For example, to compute the softmax of [1, 3, 5]
use [1-5, 3-5, 5-5]
which is [-4, -2, 0]
. Also you may choose the implement it in vectorized way (as you intendet to do in question):
def softmax(x):
f = np.exp(x - np.max(x)) # shift values
return f / f.sum(axis=0)
softmax([1,3,5])
# prints: array([0.01587624, 0.11731043, 0.86681333])
softmax([2345,3456,6543,-6789,-9234])
# prints: array([0., 0., 1., 0., 0.])
For detailed information check out the cs231n course page. The Practical issues: Numeric stability. heading is exactly what I'm trying to explain.
Get the NaN and Infinity when calculating the Softmax
Your number is too large so its exponent exceeds the range that double can handles (overflow). Exponent of 100 has an order of magnitude of 43 so exponent of 123456789 will go to infinity.total
is double.POSITIVE_INFINITY. result
is inf / inf so it is NaN.
Try to normalize your input to a range, for example, min-max normalization to transform the input to a range of [-1,1] or [0,-1]. These range are commonly used in machine learning as their power series are bounded.
Related Topics
Get Name of Current Script in Python
Salt and Hash a Password in Python
Plotting 3D Polygons in Python-Matplotlib
How to Display a Pandas Data Frame with Pyqt5/Pyside2
Python Regular Expressions - How to Capture Multiple Groups from a Wildcard Expression
How to Write Inline If Statement for Print
Pip Issue Installing Almost Any Library
Generate All Permutations of a List Without Adjacent Equal Elements
How to Check If a Value Is in the List in Selection from Pandas Data Frame
Pandas: Filtering Multiple Conditions
Progress Indicator During Pandas Operations
Matplotlib Legend Markers Only Once
A Fast Way to Find the Largest N Elements in an Numpy Array
Should I Call Close() After Urllib.Urlopen()
How to Flatten Lists Without Splitting Strings
Dropping Infinite Values from Dataframes in Pandas
Schedule Python Script - Windows 7
Windows- Pyinstaller Error "Failed to Execute Script " When App Clicked