How do you create a custom activation function with Keras?
Credits to this Github issue comment by Ritchie Ng.
# Creating a model
from keras.models import Sequential
from keras.layers import Dense
# Custom activation function
from keras.layers import Activation
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects
def custom_activation(x):
return (K.sigmoid(x) * 5) - 1
get_custom_objects().update({'custom_activation': Activation(custom_activation)})
# Usage
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation(custom_activation, name='SpecialActivation'))
print(model.summary())
Please keep in mind that you have to import this function when you save and restore the model. See the note of keras-contrib.
Creating of custom activation function in keras
Warning: this operation you want has no gradients and will not allow any weights before it to be trainable. You will see error messages like "an operation has None for gradient" or something like "None type not supported".
As a workaround for your activation, I believe the 'relu' activation would be the closest and best option, with the advantage of being very popular and used in most models.
In Keras, you don't usually run sessions. For custom operations, you create a function using backend functions.
So, you'd use a Lambda
layer:
import keras.backend as K
def hardlim(x):
return K.cast(K.greater_equal(x,0), K.floatx())
You can then use activation=hardlim
in layers.
Making custom activation function in tensorflow 2.0
I suggest you tf.keras.backend.switch
. Here a dummy example
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras import backend as K
def output_activation(x):
return K.switch(x >= 0, tf.math.tanh(x+0.1)*10, tf.math.tanh(x) + 1)
X = np.random.uniform(0,1, (100,10))
y = np.random.uniform(0,1, 100)
inp = Input((10,))
x = Dense(8, activation=output_activation)(inp)
out = Dense(1)(x)
model = Model(inp, out)
model.compile('adam', 'mse')
model.fit(X,y, epochs=3)
here the running notebook: https://colab.research.google.com/drive/1T_kRNUphJt9xTjiOheTgoIGpGDZaaRAg?usp=sharing
How to make a custom activation function with trainable parameters in Tensorflow
If you create a tf.Variable
within your model, Tensorflow will track its state and will adjust it as any other parameter. Such a tf.Variable
can be a parameter from your activation function.
Let's start with some toy dataset.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt
from tensorflow.keras import Model
from sklearn.datasets import load_iris
iris = load_iris(return_X_y=True)
X = iris[0].astype(np.float32)
y = iris[1].astype(np.float32)
ds = tf.data.Dataset.from_tensor_slices((X, y)).shuffle(25).batch(8)
Now, let's create a tf.keras.Model
and make a parametric ReLU function with the slope being learnable, and also the minimum value (usually 0 for classical ReLU). Let's start with a PReLU slope/min value of 0.1 for now.
slope_values = list()
min_values = list()
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.prelu_slope = tf.Variable(0.1)
self.min_value = tf.Variable(0.1)
self.d0 = Dense(16, activation=self.prelu)
self.d1 = Dense(32, activation=self.prelu)
self.d2 = Dense(3, activation='softmax')
def prelu(self, x):
return tf.maximum(self.min_value, x * self.prelu_slope)
def call(self, x, **kwargs):
slope_values.append(self.prelu_slope.numpy())
min_values.append(self.min_value.numpy())
x = self.d0(x)
x = self.d1(x)
x = self.d2(x)
return x
model = MyModel()
Now, let's train the model (in eager mode so we can keep the slope values).
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam', run_eagerly=True)
history = model.fit(ds, epochs=500, verbose=0)
Let's look at the slope. Tensorflow is adjusting it to be the best slope for this task. As you will see it approaches non-parametric ReLU with a slope of 1.
plt.plot(slope_values, label='Slope Value')
plt.plot(min_values, label='Minimum Value')
plt.legend()
plt.title('Parametric ReLU Parameters Across Time')
plt.show()
Custom activation function dependant on other output nodes in Keras
The keras-team on the GitHub answered the question about how to make a custom activation function.
There also is a question with a code with a custom activation function.
These pages may help you!
Additional comment
These pages were not enough for this question so I add the comment below;
Maybe PyTorch is better for customization than Keras. I tried to write such a network, though it is a very simple one, based on PyTorch tutorials and "Extending PyTorch with Custom Activation Functions"
I made a custom activation function in which the 1-th(counting from 0) elements of the output vector are equal to twice the 0-th elements. A very simple network with one layer was used for the training. After training, I checked that the condition was satisfied.
import torch
import matplotlib.pyplot as plt
# Define the custom activation function
# reference: https://towardsdatascience.com/extending-pytorch-with-custom-activation-functions-2d8b065ef2fa
def silu(input):
input[:,1] = input[:,0] * 2
return input
class SiLU(torch.nn.Module):
def __init__(self):
super().__init__() # init the base class
def forward(self, input):
return silu(input) # simply apply already implemented SiLU
# Training
# reference: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html
k = 10
x = torch.rand([k,3])
y = x * 2
model = torch.nn.Sequential(
torch.nn.Linear(3, 3),
SiLU() # custom activation function
)
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-3
for t in range(2000):
y_pred = model(x)
loss = loss_fn(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
model.zero_grad()
loss.backward()
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
# check the behaviour
yy = model(x) # predicted
print('ground truth')
print(y)
print('predicted')
print(yy)
# examples for the first five data
colorlist = ['#e41a1c', '#377eb8', '#4daf4a', '#984ea3', '#ff7f00']
plt.figure()
for i in range(5):
plt.plot(y[i,:].detach().numpy(), linestyle = "solid", label = "ground truth_" + str(i), color=colorlist[i])
plt.plot(yy[i,:].detach().numpy(), linestyle = "dotted", label = "predicted_" + str(i), color=colorlist[i])
plt.legend()
# check if the custom activation works correctly
plt.figure()
plt.plot(yy[:,0].detach().numpy()*2, label = '0th * 2')
plt.plot(yy[:,1].detach().numpy(), label = '1th')
plt.legend()
print(yy[:,0]*2)
print(yy[:,1])
Keras custom activation function with additional parameter / argument
you can simply do it in this way...
X = np.random.uniform(0,1, (100,10))
y = np.random.uniform(0,1, (100,))
def poly_transfer(x, lenght):
a = np.arange(0, lenght + 0.05, 0.05)
b = []
for i in range(x.shape[1]):
b.append(a**i)
b = tf.constant(np.asarray(b), dtype=tf.float32)
c = tf.matmul(x, b)
return c
inp = Input((10,))
poly = Lambda(lambda x: poly_transfer(x, lenght=1))(inp)
out = Dense(1)(poly)
model = Model(inp, out)
model.compile('adam', 'mse')
model.fit(X, y, epochs=3)
Custom keras activation function for different neurons
import keras.backend as K
def myactivation(x):
#x is the layer's output, shaped as (batch_size, units)
#each element in the last dimension is a neuron
n0 = x[:,0:1]
n1 = x[:,1:2]
n2 = x[:,2:3] #each N is shaped as (batch_size, 1)
#apply the activation to each neuron
x0 = K.relu(n0)
x1 = K.tanh(n1)
x2 = K.sigmoid(n2)
return K.concatenate([x0,x1,x2], axis=-1) #return to the original shape
Keras custom activation function (not training)
If you print out model.get_weights()
in your custom_activation
cases, you should see that the weights are all nan
s. That's why there are no improvements in accuracy.
The reason is that K.exp(x)
becomes inf
for x > 88
or so (and MNIST dataset contains values from 0 to 255). As a result, there will be a 0 * inf = nan
calculation encountered during the gradient propagation through K.switch()
. Maybe check this related TF issue for more details. It seems that K.switch()
(or equivalently, tf.where()
) is not smart enough to figure out the fact that K.exp(x)
is required only when x < 0
in the custom activation.
I'm not an expert in TensorFlow, but I guess the reason why the built-in ELU activation (which calls tf.nn.elu
) works fine is because tf.nn.elu
has its own gradient op. The branching of x >= 0
and x < 0
is handled inside the gradient op instead of multiplying the gradients of tf.exp()
and tf.where()
ops, so the 0 * inf = nan
calculation can be avoided.
To solve the problem, you can either normalize your data before training,
x_train = x_train.reshape(x_train.shape[0], 28*28) / 255.
x_test = x_test.reshape(x_test.shape[0], 28*28) / 255.
or apply ceiling operation to x
before taking K.exp()
since we don't need to know the actual values of K.exp(x)
when x
is greater than 0.
def custom_activation(x):
cond = K.greater(x, 0)
return K.switch(cond, x, K.exp(K.minimum(x, 0.)) - 1)
Related Topics
Check If a File Is Not Open Nor Being Used by Another Process
Passing Table Name as a Parameter in Psycopg2
How to Use Qscrollarea to Make Scrollbars Appear
How to Change Default Anaconda Python Environment
Find Out How Much Memory Is Being Used by an Object in Python
Pandas: To_Numeric for Multiple Columns
Is It Ever Useful to Use Python's Input Over Raw_Input
Heatmap in Matplotlib with Pcolor
Backporting Python 3 Open(Encoding="Utf-8") to Python 2
Complexity of *In* Operator in Python
How to Get Rid of Double Backslash in Python Windows File Path String
How to Scroll Frame Using Mouse Wheel & Adding Horizontal Scrollbar
How to Unzip a List of Tuples into Individual Lists
Python Nested Functions Variable Scoping
Can Python Pickle Lambda Functions
Why Do I Get Typeerror: Can't Multiply Sequence by Non-Int of Type 'Float'
A Good Way to Get the Charset/Encoding of an Http Response in Python