How to Implement the Softmax Function in Python

How to implement the Softmax function in Python

They're both correct, but yours is preferred from the point of view of numerical stability.

You start with

e ^ (x - max(x)) / sum(e^(x - max(x))

By using the fact that a^(b - c) = (a^b)/(a^c) we have

= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))

= e ^ x / sum(e ^ x)

Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.

Softmax activation function using math library

The function math.exp only works on scalars, and you can not apply it to the whole array. If you only want to use math than you need to implement it elementwise:

import math

def soft_max(x):
exponents = []
for element in x:
exponents.append(math.exp(element))
summ = sum(exponents)
for i in range(len(exponents)):
exponents[i] = exponents[i] / summ
return exponents

if __name__=="__main__":
arr = [0.844521, 0.147048]
output = soft_max(arr)
print(output)

However I still want to emphasise, that using numpy would solve the problem a lot easier:

import numpy as np

def soft_max(x):
e = np.exp(x)
return e / np.sum(e)

if __name__=="__main__":
arr = [0.844521, 0.147048]
output = soft_max(arr)
print(output)

python deeplearning softmax with numpy

The problem here is that sum(exp(x), axis=1) returns a 1-D numpy array. Change it to sum(esp(x), axis=1, keepdims=True) to avoid numpy from automatically dropping one dimension.

def softmax(x):
return np.exp(x)/np.sum(np.exp(x),axis=1, keepdims=True)

x = np.array([[1,3,6,-3,1],
[5,2,1,4,3]])

print(softmax(x))
print(f"1:{softmax(x)[0]} sum : {np.sum(softmax(x)[0])}")
print(f"2:{softmax(x)[1]} sum : {np.sum(softmax(x)[1])}")

Softmax function in neural network (Python)

The problem is in your sum. You are summing in axis 0 where you should keep axis 0 untouched.

To sum over all the entries in the same example, i.e., in the same line, you have to use axis 1 instead.

def softmax(A):
"""
Computes a softmax function.
Input: A (N, k) ndarray.
Returns: (N, k) ndarray.
"""
e = np.exp(A)
return e / np.sum(e, axis=1, keepdims=True)

Use keepdims to preserve shape and be able to divide e by the sum.

In your example, e evaluates to:

[[ 1.10627664  1.22384801  1.35391446]
[ 1.49780395 1.65698552 1.83308438]]

then the sum for each example (denominator in the return line) is:

[[ 3.68403911]
[ 4.98787384]]

The function then divides each line by its sum and gives the result you have in test_output.

As MaxU pointed out, it is a good practice to remove the max before exponentiating, in order to avoid overflow:

e = np.exp(A - np.sum(A, axis=1, keepdims=True))

Implementation of softmax function returns nan for high inputs

According to softmax function, you need to iterate all elements in the array and compute the exponential for each individual element then divide it by the sum of the exponential of the all elements:

import numpy as np

a = [1,3,5]
for i in a:
print np.exp(i)/np.sum(np.exp(a))

0.015876239976466765
0.11731042782619837
0.8668133321973349

However if the numbers are too big the exponents will probably blow up (computer can not handle such big numbers):

a = [2345,3456,6543]
for i in a:
print np.exp(i)/np.sum(np.exp(a))

__main__:2: RuntimeWarning: invalid value encountered in double_scalars
nan
nan
nan

To avoid this, first shift the highest value in array to zero. Then compute the softmax. For example, to compute the softmax of [1, 3, 5] use [1-5, 3-5, 5-5] which is [-4, -2, 0]. Also you may choose the implement it in vectorized way (as you intendet to do in question):

def softmax(x):
f = np.exp(x - np.max(x)) # shift values
return f / f.sum(axis=0)

softmax([1,3,5])
# prints: array([0.01587624, 0.11731043, 0.86681333])

softmax([2345,3456,6543,-6789,-9234])
# prints: array([0., 0., 1., 0., 0.])

For detailed information check out the cs231n course page. The Practical issues: Numeric stability. heading is exactly what I'm trying to explain.

Get the NaN and Infinity when calculating the Softmax

Your number is too large so its exponent exceeds the range that double can handles (overflow). Exponent of 100 has an order of magnitude of 43 so exponent of 123456789 will go to infinity.
total is double.POSITIVE_INFINITY. result is inf / inf so it is NaN.

Try to normalize your input to a range, for example, min-max normalization to transform the input to a range of [-1,1] or [0,-1]. These range are commonly used in machine learning as their power series are bounded.



Related Topics



Leave a reply



Submit