Custom Loss Function in Keras

Custom loss function in Keras

All you have to do is define a function for that, using keras backend functions for calculations. The function must take the true values and the model predicted values.

Now, since I'm not sure about what are g, q, x an y in your function, I'll just create a basic example here without caring about what it means or whether it's an actual useful function:

import keras.backend as K

def customLoss(yTrue,yPred):
    return K.sum(K.log(yTrue) - K.log(yPred))

All backend functions can be seen here.

After that, compile your model using that function instead of a regular one:

model.compile(loss=customLoss, optimizer = .....)

Should the custom loss function in Keras return a single loss value for the batch or an arrary of losses for every sample in the training batch?

I opened an issue on github. It's confirmed that custom loss function is required to return one loss value per sample. The example will need to be updated to reflect this.

Custom Loss Function of Keras Model Giving Incorrect Answer

I misunderstood your issue, and I have updated my method. it should work now. I stack the input layer and output layer to get a new layer that I pass to output.


    def tangle_loss3(y_true, y_pred):
        true_diff = y_true - y_pred[0]
        pred_diff = y_pred[1] - y_pred[0]
    
        normalized_diff = tf.abs(tf.math.divide(pred_diff, true_diff))
        normalized_diff = tf.reduce_mean(normalized_diff)
    
        return normalized_diff

    input_layer = Input(shape=(384,), name='input')
    hl_1 = Dense(64, activation='elu', name='hl_1')(input_layer)
    hl_2 = Dense(32, activation='elu', name='hl_2')(hl_1)
    hl_3 = Dense(32, activation='elu', name='hl_3')(hl_2)
    output_layer = Dense(384, activation=None, name='output')(hl_3)
    out = tf.stack([input_layer, output_layer])
    
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
    
    model = tf.keras.models.Model(input_layer, out)
    model.compile(loss=tangle_loss3, optimizer=optimizer)

and now when I calculate the loss it works


    X = np.random.rand(1, 384)
    y = np.random.rand(1, 384)
    
    np.mean(np.abs((model.predict(X)[1] - X)/(y - X)))
    # returns some number
    
    model.test_on_batch(X, y)

Note that I have to use model.predict(X)[1] as we get two outputs, both the input and output layers' results. This is just one hacky solution but it works.

Custom keras model with custom loss function gives error

You seem to have mixed up a few constructs that don't fit together. I suggest you define your own custom training loop that will give you the flexibility you need:

First define your model:

import tensorflow as tf
import random
import numpy as np

def initializeOutputWeights(shape, dtype=None):
    #output weights are initialized as 1 or -1 and not changed afterwards
    randoms = np.random.randint(low=2, size=shape)
    new = np.where(randoms==0, -1, randoms)
    return tf.keras.backend.variable(new, dtype=dtype)

class CustomModel(tf.keras.Model):
   def __init__(self, b, input_dim):
        #first i initialized it as self, model and without the call function
        #but this gave me an error that said i needed a call function thus i changed it to this
        super(CustomModel, self).__init__()

        initializer = tf.keras.initializers.RandomUniform(minval=-1, maxval=1)
        self.dense = tf.keras.layers.Dense(20, name='hidden', kernel_initializer=initializer, 
                 bias_initializer=initializer, activation = lambda x: tf.tanh(b*x), input_shape=(input_dim,))
        self._output = tf.keras.layers.Dense(2, activation='linear', name='output', use_bias=False, trainable=False,kernel_initializer= lambda shape, 
                 dtype: initializeOutputWeights(shape, dtype))

   def call(self, inputs):
        x =  self.dense(inputs)
        return self._output(x)

Then define your loss function:

def compute_loss(d, y_pred, y_true):
    #calculate the loss
    N = len(y_true)
    L = len(y_pred[0])
    y_dot = y_pred*y_true
    y_d = y_dot-d
    y_square= y_d*y_d
    index_replace = y_dot>d
    idx_replace=tf.where(index_replace==True)
    y_loss = tf.tensor_scatter_nd_update(y_square, idx_replace, tf.zeros(len(idx_replace)))
    return tf.divide(tf.keras.backend.sum(tf.keras.backend.sum(y_loss, axis=1)),tf.cast(N*L, tf.float32))

Afterwards define your training loop:

@tf.function
def train_step(model, batch, optimizer):
    with tf.GradientTape() as tape:
        x, y = batch
        d = 16
        y_pred = model(x)
        loss = compute_loss(d, y_pred, y)

    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    # update your metrics here how you want.
    acc_metric.update_state(y, y_pred)
    tf.print("Training loss (for one batch): ", loss)

def train(dataset, optimizer, epochs=25):
   
   b = np.ones(20)
   custom_model = CustomModel(b, 9)
   for epoch in range(epochs):
     for batch in dataset:
       train_step(custom_model, batch, optimizer) 

     train_acc = acc_metric.result()
     tf.print("Training acc over epoch: %.4f" % (float(train_acc),))

     # Reset training metrics at the end of each epoch
     acc_metric.reset_states()

And finally define your dataset and other important variables and train your model:

random.seed(1)
tf.random.set_seed(2)
acc_metric = tf.keras.metrics.SparseCategoricalAccuracy(name="accuracy")
opt = tf.keras.optimizers.Adam()

#### Dummy data ####
y_train = tf.cast(tf.random.uniform((500, 1), minval=0, maxval=2, dtype=tf.int32), tf.float32)
X_train = tf.random.normal((500, 9))
BATCH_SIZE = 64
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train)).shuffle(
    X_train.shape[0]).batch(
    BATCH_SIZE)

train(train_dataset, opt)

Custom Loss Function - Keras

Any idea what could cause this?

Assuming you set the seed for reproducibility, otherwise it could simply be initialization, when you change the loss function you change the domain over which the gradient is going to iterate to optimize your network.

And since you don't have any guarantee that your model is going to reach the global minima but, most likely, will stop at a local minima it coud just mean that, given the same initialization (set seed) the optimization process stopped at a different local minima.