How to Create a Custom Activation Function with Keras

How do you create a custom activation function with Keras?

Credits to this Github issue comment by Ritchie Ng.

# Creating a model
from keras.models import Sequential
from keras.layers import Dense

# Custom activation function
from keras.layers import Activation
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects

def custom_activation(x):
    return (K.sigmoid(x) * 5) - 1

get_custom_objects().update({'custom_activation': Activation(custom_activation)})

# Usage
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation(custom_activation, name='SpecialActivation'))
print(model.summary())

Please keep in mind that you have to import this function when you save and restore the model. See the note of keras-contrib.

Creating of custom activation function in keras

Warning: this operation you want has no gradients and will not allow any weights before it to be trainable. You will see error messages like "an operation has None for gradient" or something like "None type not supported".
As a workaround for your activation, I believe the 'relu' activation would be the closest and best option, with the advantage of being very popular and used in most models.

In Keras, you don't usually run sessions. For custom operations, you create a function using backend functions.

So, you'd use a Lambda layer:

import keras.backend as K

def hardlim(x):
   return K.cast(K.greater_equal(x,0), K.floatx())

You can then use activation=hardlim in layers.

Making custom activation function in tensorflow 2.0

I suggest you tf.keras.backend.switch. Here a dummy example

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras import backend as K

def output_activation(x):
    return K.switch(x >= 0, tf.math.tanh(x+0.1)*10, tf.math.tanh(x) + 1)

X = np.random.uniform(0,1, (100,10))
y = np.random.uniform(0,1, 100)

inp = Input((10,))
x = Dense(8, activation=output_activation)(inp)
out = Dense(1)(x)

model = Model(inp, out)
model.compile('adam', 'mse')
model.fit(X,y, epochs=3)

here the running notebook: https://colab.research.google.com/drive/1T_kRNUphJt9xTjiOheTgoIGpGDZaaRAg?usp=sharing

How to make a custom activation function with trainable parameters in Tensorflow

If you create a tf.Variable within your model, Tensorflow will track its state and will adjust it as any other parameter. Such a tf.Variable can be a parameter from your activation function.

Let's start with some toy dataset.

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt
from tensorflow.keras import Model
from sklearn.datasets import load_iris
iris = load_iris(return_X_y=True)

X = iris[0].astype(np.float32)
y = iris[1].astype(np.float32)

ds = tf.data.Dataset.from_tensor_slices((X, y)).shuffle(25).batch(8)

Now, let's create a tf.keras.Model and make a parametric ReLU function with the slope being learnable, and also the minimum value (usually 0 for classical ReLU). Let's start with a PReLU slope/min value of 0.1 for now.

slope_values = list()
min_values = list()

class MyModel(Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.prelu_slope = tf.Variable(0.1)
        self.min_value = tf.Variable(0.1)

        self.d0 = Dense(16, activation=self.prelu)
        self.d1 = Dense(32, activation=self.prelu)
        self.d2 = Dense(3, activation='softmax')

    def prelu(self, x):
        return tf.maximum(self.min_value, x * self.prelu_slope)

    def call(self, x, **kwargs):
        slope_values.append(self.prelu_slope.numpy())
        min_values.append(self.min_value.numpy())
        x = self.d0(x)
        x = self.d1(x)
        x = self.d2(x)
        return x

model = MyModel()

Now, let's train the model (in eager mode so we can keep the slope values).

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam', run_eagerly=True)

history = model.fit(ds, epochs=500, verbose=0)

Let's look at the slope. Tensorflow is adjusting it to be the best slope for this task. As you will see it approaches non-parametric ReLU with a slope of 1.

plt.plot(slope_values, label='Slope Value')
plt.plot(min_values, label='Minimum Value')
plt.legend()
plt.title('Parametric ReLU Parameters Across Time')
plt.show()

Sample Image

Custom activation function dependant on other output nodes in Keras

The keras-team on the GitHub answered the question about how to make a custom activation function.

There also is a question with a code with a custom activation function.

These pages may help you!

Additional comment

These pages were not enough for this question so I add the comment below;

Maybe PyTorch is better for customization than Keras. I tried to write such a network, though it is a very simple one, based on PyTorch tutorials and "Extending PyTorch with Custom Activation Functions"

I made a custom activation function in which the 1-th(counting from 0) elements of the output vector are equal to twice the 0-th elements. A very simple network with one layer was used for the training. After training, I checked that the condition was satisfied.

import torch
import matplotlib.pyplot as plt

# Define the custom activation function
# reference: https://towardsdatascience.com/extending-pytorch-with-custom-activation-functions-2d8b065ef2fa
def silu(input):
    input[:,1] = input[:,0] * 2
    return input 

class SiLU(torch.nn.Module):
    def __init__(self):
        super().__init__() # init the base class

    def forward(self, input):
        return silu(input) # simply apply already implemented SiLU

# Training
# reference: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html
k = 10
x = torch.rand([k,3])
y = x * 2
model = torch.nn.Sequential(
    torch.nn.Linear(3, 3),
    SiLU()  # custom activation function
)

loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-3
for t in range(2000):
    y_pred = model(x)
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    model.zero_grad()
    loss.backward()

    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# check the behaviour
yy = model(x)  # predicted
print('ground truth')
print(y)
print('predicted')
print(yy)

# examples for the first five data
colorlist = ['#e41a1c', '#377eb8', '#4daf4a', '#984ea3', '#ff7f00']
plt.figure()
for i in range(5):
  plt.plot(y[i,:].detach().numpy(), linestyle = "solid", label = "ground truth_" + str(i), color=colorlist[i])
  plt.plot(yy[i,:].detach().numpy(), linestyle = "dotted", label = "predicted_" + str(i), color=colorlist[i])
  plt.legend()

# check if the custom activation works correctly
plt.figure()
plt.plot(yy[:,0].detach().numpy()*2, label = '0th * 2')
plt.plot(yy[:,1].detach().numpy(), label = '1th')
plt.legend()

print(yy[:,0]*2)
print(yy[:,1])

Keras custom activation function with additional parameter / argument

you can simply do it in this way...

X = np.random.uniform(0,1, (100,10))
y = np.random.uniform(0,1, (100,))

def poly_transfer(x, lenght):

    a = np.arange(0, lenght + 0.05, 0.05)

    b = []
    for i in range(x.shape[1]):
        b.append(a**i)

    b = tf.constant(np.asarray(b), dtype=tf.float32)
    c = tf.matmul(x, b)

    return c

inp = Input((10,))
poly = Lambda(lambda x: poly_transfer(x, lenght=1))(inp)
out = Dense(1)(poly)

model = Model(inp, out)
model.compile('adam', 'mse')
model.fit(X, y, epochs=3)

Custom keras activation function for different neurons

import keras.backend as K

def myactivation(x):
    #x is the layer's output, shaped as (batch_size, units)

    #each element in the last dimension is a neuron
    n0 = x[:,0:1]
    n1 = x[:,1:2]
    n2 = x[:,2:3]  #each N is shaped as (batch_size, 1)

    #apply the activation to each neuron
    x0 = K.relu(n0)
    x1 = K.tanh(n1)
    x2 = K.sigmoid(n2)

    return K.concatenate([x0,x1,x2], axis=-1) #return to the original shape

Keras custom activation function (not training)

If you print out model.get_weights() in your custom_activation cases, you should see that the weights are all nans. That's why there are no improvements in accuracy.

The reason is that K.exp(x) becomes inf for x > 88 or so (and MNIST dataset contains values from 0 to 255). As a result, there will be a 0 * inf = nan calculation encountered during the gradient propagation through K.switch(). Maybe check this related TF issue for more details. It seems that K.switch() (or equivalently, tf.where()) is not smart enough to figure out the fact that K.exp(x) is required only when x < 0 in the custom activation.

I'm not an expert in TensorFlow, but I guess the reason why the built-in ELU activation (which calls tf.nn.elu) works fine is because tf.nn.elu has its own gradient op. The branching of x >= 0 and x < 0 is handled inside the gradient op instead of multiplying the gradients of tf.exp() and tf.where() ops, so the 0 * inf = nan calculation can be avoided.

To solve the problem, you can either normalize your data before training,

x_train = x_train.reshape(x_train.shape[0], 28*28) / 255.
x_test = x_test.reshape(x_test.shape[0], 28*28) / 255.

or apply ceiling operation to x before taking K.exp() since we don't need to know the actual values of K.exp(x) when x is greater than 0.

def custom_activation(x):
    cond = K.greater(x, 0)
    return K.switch(cond, x, K.exp(K.minimum(x, 0.)) - 1)