How to Make a Custom Activation Function with Only Python in Tensorflow

How to make a custom activation function with only Python in Tensorflow?

Yes There is!

Credit:
It was hard to find the information and get it working but here is an example copying from the principles and code found here and here.

Requirements:
Before we start, there are two requirement for this to be able to succeed. First you need to be able to write your activation as a function on numpy arrays. Second you have to be able to write the derivative of that function either as a function in Tensorflow (easier) or in the worst case scenario as a function on numpy arrays.

Writing Activation function:

So let's take for example this function which we would want to use an activation function:

def spiky(x):
r = x % 1
if r <= 0.5:
return r
else:
return 0

Which look as follows:
Spiky Activation

The first step is making it into a numpy function, this is easy:

import numpy as np
np_spiky = np.vectorize(spiky)

Now we should write its derivative.

Gradient of Activation:
In our case it is easy, it is 1 if x mod 1 < 0.5 and 0 otherwise. So:

def d_spiky(x):
r = x % 1
if r <= 0.5:
return 1
else:
return 0
np_d_spiky = np.vectorize(d_spiky)

Now for the hard part of making a TensorFlow function out of it.

Making a numpy fct to a tensorflow fct:
We will start by making np_d_spiky into a tensorflow function. There is a function in tensorflow tf.py_func(func, inp, Tout, stateful=stateful, name=name) [doc] which transforms any numpy function to a tensorflow function, so we can use it:

import tensorflow as tf
from tensorflow.python.framework import ops

np_d_spiky_32 = lambda x: np_d_spiky(x).astype(np.float32)

def tf_d_spiky(x,name=None):
with tf.name_scope(name, "d_spiky", [x]) as name:
y = tf.py_func(np_d_spiky_32,
[x],
[tf.float32],
name=name,
stateful=False)
return y[0]

tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x] (and return y[0]). The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False) in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations. One thing to be careful of at this point is that numpy used float64 but tensorflow uses float32 so you need to convert your function to use float32 before you can convert it to a tensorflow function otherwise tensorflow will complain. This is why we need to make np_d_spiky_32 first.

What about the Gradients? The problem with only doing the above is that even though we now have tf_d_spiky which is the tensorflow version of np_d_spiky, we couldn't use it as an activation function if we wanted to because tensorflow doesn't know how to calculate the gradients of that function.

Hack to get Gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc]. Copying the code from harpone we can modify the tf.py_func function to make it define the gradient at the same time:

def py_func(func, inp, Tout, stateful=True, name=None, grad=None):

# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))

tf.RegisterGradient(rnd_name)(grad) # see _MySquareGrad for grad example
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

Now we are almost done, the only thing is that the grad function we need to pass to the above py_func function needs to take a special form. It needs to take in an operation, and the previous gradients before the operation and propagate the gradients backward after the operation.

Gradient Function: So for our spiky activation function that is how we would do it:

def spikygrad(op, grad):
x = op.inputs[0]

n_gr = tf_d_spiky(x)
return grad * n_gr

The activation function has only one input, that is why x = op.inputs[0]. If the operation had many inputs, we would need to return a tuple, one gradient for each input. For example if the operation was a-bthe gradient with respect to a is +1 and with respect to b is -1 so we would have return +1*grad,-1*grad. Notice that we need to return tensorflow functions of the input, that is why need tf_d_spiky, np_d_spiky would not have worked because it cannot act on tensorflow tensors. Alternatively we could have written the derivative using tensorflow functions:

def spikygrad2(op, grad):
x = op.inputs[0]
r = tf.mod(x,1)
n_gr = tf.to_float(tf.less_equal(r, 0.5))
return grad * n_gr

Combining it all together: Now that we have all the pieces, we can combine them all together:

np_spiky_32 = lambda x: np_spiky(x).astype(np.float32)

def tf_spiky(x, name=None):

with tf.name_scope(name, "spiky", [x]) as name:
y = py_func(np_spiky_32,
[x],
[tf.float32],
name=name,
grad=spikygrad) # <-- here's the call to the gradient
return y[0]

And now we are done. And we can test it.

Test:

with tf.Session() as sess:

x = tf.constant([0.2,0.7,1.2,1.7])
y = tf_spiky(x)
tf.initialize_all_variables().run()

print(x.eval(), y.eval(), tf.gradients(y, [x])[0].eval())

[ 0.2 0.69999999 1.20000005 1.70000005] [ 0.2 0. 0.20000005 0.] [ 1. 0. 1. 0.]

Success!

How to make a custom activation function with trainable parameters in Tensorflow

If you create a tf.Variable within your model, Tensorflow will track its state and will adjust it as any other parameter. Such a tf.Variable can be a parameter from your activation function.

Let's start with some toy dataset.

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt
from tensorflow.keras import Model
from sklearn.datasets import load_iris
iris = load_iris(return_X_y=True)

X = iris[0].astype(np.float32)
y = iris[1].astype(np.float32)

ds = tf.data.Dataset.from_tensor_slices((X, y)).shuffle(25).batch(8)

Now, let's create a tf.keras.Model and make a parametric ReLU function with the slope being learnable, and also the minimum value (usually 0 for classical ReLU). Let's start with a PReLU slope/min value of 0.1 for now.

slope_values = list()
min_values = list()

class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.prelu_slope = tf.Variable(0.1)
self.min_value = tf.Variable(0.1)

self.d0 = Dense(16, activation=self.prelu)
self.d1 = Dense(32, activation=self.prelu)
self.d2 = Dense(3, activation='softmax')

def prelu(self, x):
return tf.maximum(self.min_value, x * self.prelu_slope)

def call(self, x, **kwargs):
slope_values.append(self.prelu_slope.numpy())
min_values.append(self.min_value.numpy())
x = self.d0(x)
x = self.d1(x)
x = self.d2(x)
return x

model = MyModel()

Now, let's train the model (in eager mode so we can keep the slope values).

model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam', run_eagerly=True)

history = model.fit(ds, epochs=500, verbose=0)

Let's look at the slope. Tensorflow is adjusting it to be the best slope for this task. As you will see it approaches non-parametric ReLU with a slope of 1.

plt.plot(slope_values, label='Slope Value')
plt.plot(min_values, label='Minimum Value')
plt.legend()
plt.title('Parametric ReLU Parameters Across Time')
plt.show()

Sample Image

How do you create a custom activation function with Keras?

Credits to this Github issue comment by Ritchie Ng.

# Creating a model
from keras.models import Sequential
from keras.layers import Dense

# Custom activation function
from keras.layers import Activation
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects

def custom_activation(x):
return (K.sigmoid(x) * 5) - 1

get_custom_objects().update({'custom_activation': Activation(custom_activation)})

# Usage
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation(custom_activation, name='SpecialActivation'))
print(model.summary())

Please keep in mind that you have to import this function when you save and restore the model. See the note of keras-contrib.

Making custom activation function in tensorflow 2.0

I suggest you tf.keras.backend.switch. Here a dummy example

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras import backend as K

def output_activation(x):
return K.switch(x >= 0, tf.math.tanh(x+0.1)*10, tf.math.tanh(x) + 1)

X = np.random.uniform(0,1, (100,10))
y = np.random.uniform(0,1, 100)

inp = Input((10,))
x = Dense(8, activation=output_activation)(inp)
out = Dense(1)(x)

model = Model(inp, out)
model.compile('adam', 'mse')
model.fit(X,y, epochs=3)

here the running notebook: https://colab.research.google.com/drive/1T_kRNUphJt9xTjiOheTgoIGpGDZaaRAg?usp=sharing

Keras custom activation function (not training)

If you print out model.get_weights() in your custom_activation cases, you should see that the weights are all nans. That's why there are no improvements in accuracy.

The reason is that K.exp(x) becomes inf for x > 88 or so (and MNIST dataset contains values from 0 to 255). As a result, there will be a 0 * inf = nan calculation encountered during the gradient propagation through K.switch(). Maybe check this related TF issue for more details. It seems that K.switch() (or equivalently, tf.where()) is not smart enough to figure out the fact that K.exp(x) is required only when x < 0 in the custom activation.

I'm not an expert in TensorFlow, but I guess the reason why the built-in ELU activation (which calls tf.nn.elu) works fine is because tf.nn.elu has its own gradient op. The branching of x >= 0 and x < 0 is handled inside the gradient op instead of multiplying the gradients of tf.exp() and tf.where() ops, so the 0 * inf = nan calculation can be avoided.

To solve the problem, you can either normalize your data before training,

x_train = x_train.reshape(x_train.shape[0], 28*28) / 255.
x_test = x_test.reshape(x_test.shape[0], 28*28) / 255.

or apply ceiling operation to x before taking K.exp() since we don't need to know the actual values of K.exp(x) when x is greater than 0.

def custom_activation(x):
cond = K.greater(x, 0)
return K.switch(cond, x, K.exp(K.minimum(x, 0.)) - 1)


Related Topics



Leave a reply



Submit