How to Apply Gradient Clipping in Tensorflow

How to apply gradient clipping in TensorFlow?

Gradient clipping needs to happen after computing the gradients, but before applying them to update the model's parameters. In your example, both of those things are handled by the AdamOptimizer.minimize() method.

In order to clip your gradients you'll need to explicitly compute, clip, and apply them as described in this section in TensorFlow's API documentation. Specifically you'll need to substitute the call to the minimize() method with something like the following:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)

How to clip the gradient norm on the grad_and_var tuple in tensorflow-r1.0?

One possible approach that I have seen is to zip clipped_gradients and your variables and to use opt.apply_gradients on the zipped list, like in the code below (taken from here, lines 78-83):

tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),
                args.grad_clip)
with tf.name_scope('optimizer'):
    optimizer = tf.train.AdamOptimizer(self.lr)
self.train_op = optimizer.apply_gradients(zip(grads, tvars))

Can't apply gradients on tf.Variable

The python zip function expects iterable objects, like for example a list or a tuple.

In your calls to tape.gradient, or optimizer.apply_gradients, you can put your Variable in a list to solve the issue :

with tf.GradienTape() as tape:
  gradients = tape.gradient(loss_value, [self.trainable_variables])
# Apply gradients via optimizer
self.optimizer.apply_gradients(zip(gradients, [self.trainable_variables]))

tape.gradient respects the shape of the sources object passed to compute the gradients of, so if you feed it with a list, you will get a list out of it. It is stated in the documentation:

Returns
a list or nested structure of Tensors (or IndexedSlices, or None), one for each element in sources. Returned structure is the same as the structure of sources.

How to manipulate client gradients in tensorflow federated sgd

build_federated_sgd_process is fully-canned; it is really designed to serve as a reference implementation, not as a point of extensibility.

I believe what you are looking for is the function that build_federated_sgd_process calls under the hoos, tff.learning.framework.build_model_delta_optimizer_process. This function allows you to supply your own mapping from a model function (IE, a zero-arg callable that returns a tff.learning.Model) to a tff.learning.framework.ClientDeltaFn.

Your ClientDeltaFn would look something like:

@tf.function
def _clip_and_noise(grads):
  return ...

class ClippedGradClientDeltaFn(tff.learning.framework.ClientDeltaFn)

def __init__(self, model, ...):
  self._model = model
  ...

@tf.function
def __call__(dataset, weights):
  # Compute gradients grads
  return _clip_and_noise(grads)

And you would be able to construct a tff.templates.IterativeProcess by calling:

def clipped_sgd(model_fn: Callable[[], model_lib.Model]) -> ClippedGradClientDeltaFn:
    return ClippedGradClientDeltaFn(
        model_fn(),
        ...)

iterproc = optimizer_utils.build_model_delta_optimizer_process(
      model_fn, model_to_client_delta_fn=clipped_sgd, ...)

as more or less in the body of build_federated_sgd_process.

It sounds to me like you are interested in differential privacy; TFF is actually designed to compose with differential privacy generally through the aggregation processes rather than writing different client updates, though this is certainly one approach. See the pointers from the TFF for research documentation for idiomatic ways to wire differential privacy in to TFF.

How to Apply Gradient Clipping in Tensorflow