Getting gradient of model output w.r.t weights using Keras
To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient. listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})
And thats it! How do I get the gradient of a keras model with respect to its inputs?
I ended up getting this to work with a variant of the answer to this question: Get Gradients with Keras Tensorflow 2.0
x_tensor = tf.convert_to_tensor(input_data, dtype=tf.float32)
with tf.GradientTape() as t:
t.watch(x_tensor)
output = model(x_tensor)
result = output
gradients = t.gradient(output, x_tensor)
This allows me to obtain both the output and the gradient without redundant computation. How to obtain the gradients in keras?
You need to create a symbolic Keras function, taking the input/output as inputs and returning the gradients.
Here is a working example :
import numpy as np
import keras
from keras import backend as K
model = keras.Sequential()
model.add(keras.layers.Dense(20, input_shape = (10, )))
model.add(keras.layers.Dense(5))
model.compile('adam', 'mse')
dummy_in = np.ones((4, 10))
dummy_out = np.ones((4, 5))
dummy_loss = model.train_on_batch(dummy_in, dummy_out)
def get_weight_grad(model, inputs, outputs):
""" Gets gradient of model for given inputs and outputs for all weights"""
grads = model.optimizer.get_gradients(model.total_loss, model.trainable_weights)
symb_inputs = (model._feed_inputs + model._feed_targets + model._feed_sample_weights)
f = K.function(symb_inputs, grads)
x, y, sample_weight = model._standardize_user_data(inputs, outputs)
output_grad = f(x + y + sample_weight)
return output_grad
def get_layer_output_grad(model, inputs, outputs, layer=-1):
""" Gets gradient a layer output for given inputs and outputs"""
grads = model.optimizer.get_gradients(model.total_loss, model.layers[layer].output)
symb_inputs = (model._feed_inputs + model._feed_targets + model._feed_sample_weights)
f = K.function(symb_inputs, grads)
x, y, sample_weight = model._standardize_user_data(inputs, outputs)
output_grad = f(x + y + sample_weight)
return output_grad
weight_grads = get_weight_grad(model, dummy_in, dummy_out)
output_grad = get_layer_output_grad(model, dummy_in, dummy_out)
The first function I wrote returns all the gradients in the model but it wouldn't be difficult to extend it so it supports layer indexing. However, it's probably dangerous because any layer without weights in the model will be ignored by this indexing and you would end up with different layer indexing in the model and the gradients.The second function I wrote returns the gradient at a given layer's output and there, the indexing is the same as in the model, so it's safe to use it.
Note : This works with Keras 2.2.0, not under, as this release included a major refactoring of keras.engine
Keras GradientType: Calculating gradients with respect to the output node
Well, after some research I found the answer myself: It is possible to extract the trainable variables of a given layer based on the layer name. Then we can apply tape.gradient
and optimizer.apply_gradients
to the extracted set of trainable variables. My current solution is pretty slow, but it works. I just need to figure out how to improve its runtime.
How to compute gradient of output wrt input in Tensorflow 2.0
This should work in TF2:
inp = tf.Variable(np.random.normal(size=(25, 120)), dtype=tf.float32)
with tf.GradientTape() as tape:
preds = model(inp)
grads = tape.gradient(preds, inp)
Basically you do it the same way as TF1, but using GradientTape
. Calculating gradient norm wrt weights with keras
There are several placeholders related to the gradient computation process in Keras:
- Input
x
- Target
y
- Sample weights: even if you don't provide it in
model.fit()
, Keras still generates a placeholder for sample weights, and feednp.ones((y.shape[0],), dtype=K.floatx())
into the graph during training. - Learning phase: this placeholder will be connected to the gradient tensor only if there's any layer using it (e.g.
Dropout
).
x
, y
and sample_weights
into the graph. That's the underlying reason of the error.Inside Model._make_train_function()
there are the following lines showing how to construct the necessary inputs to K.function()
in this case:
inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
inputs += [K.learning_phase()]
with K.name_scope('training'):
...
self.train_function = K.function(inputs,
[self.total_loss] + self.metrics_tensors,
updates=updates,
name='train_function',
**self._function_kwargs)
By mimicking this function, you should be able to get the norm value:def get_gradient_norm_func(model):
grads = K.gradients(model.total_loss, model.trainable_weights)
summed_squares = [K.sum(K.square(g)) for g in grads]
norm = K.sqrt(sum(summed_squares))
inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights
func = K.function(inputs, [norm])
return func
def main():
x = np.random.random((128,)).reshape((-1, 1))
y = 2 * x
model = Sequential(layers=[Dense(2, input_shape=(1,)),
Dense(1)])
model.compile(loss='mse', optimizer='rmsprop')
get_gradient = get_gradient_norm_func(model)
history = model.fit(x, y, epochs=1)
print(get_gradient([x, y, np.ones(len(y))]))
Execution output:Epoch 1/1
128/128 [==============================] - 0s - loss: 2.0073
[4.4091368]
Note that since you're using Sequential
instead of Model
, model.model._feed_*
is required instead of model._feed_*
.
Related Topics
How to Reverse a Dictionary That Has Repeated Values
Rotating a Two-Dimensional Array in Python
Cartesian Product of a Dictionary of Lists
Should All Python Classes Extend Object
When to Use Get, Get_Queryset, Get_Context_Data in Django
How to Ignore Hidden Files Using Os.Listdir()
Make Part of a Matplotlib Title Bold and a Different Color
Append Dataframe to Excel with Pandas
How to Update SQLalchemy Row Entry
Pip Is Not Able to Install Packages Correctly: Permission Denied Error
Continuing in Python's Unittest When an Assertion Fails
How to Print a Dictionary Line by Line in Python
How to Write a Generator Class
What Do Backticks Mean to the Python Interpreter? Example: 'Num'