How to Set Layer-Wise Learning Rate in Tensorflow

How to set layer-wise learning rate in Tensorflow?

It can be achieved quite easily with 2 optimizers:

var_list1 = [variables from first 5 layers]
var_list2 = [the rest of variables]
train_op1 = GradientDescentOptimizer(0.00001).minimize(loss, var_list=var_list1)
train_op2 = GradientDescentOptimizer(0.0001).minimize(loss, var_list=var_list2)
train_op = tf.group(train_op1, train_op2)

One disadvantage of this implementation is that it computes tf.gradients(.) twice inside the optimizers and thus it might not be optimal in terms of execution speed. This can be mitigated by explicitly calling tf.gradients(.), splitting the list into 2 and passing corresponding gradients to both optimizers.

Related question: Holding variables constant during optimizer

EDIT: Added more efficient but longer implementation:

var_list1 = [variables from first 5 layers]
var_list2 = [the rest of variables]
opt1 = tf.train.GradientDescentOptimizer(0.00001)
opt2 = tf.train.GradientDescentOptimizer(0.0001)
grads = tf.gradients(loss, var_list1 + var_list2)
grads1 = grads[:len(var_list1)]
grads2 = grads[len(var_list1):]
tran_op1 = opt1.apply_gradients(zip(grads1, var_list1))
train_op2 = opt2.apply_gradients(zip(grads2, var_list2))
train_op = tf.group(train_op1, train_op2)

You can use tf.trainable_variables() to get all training variables and decide to select from them.
The difference is that in the first implementation tf.gradients(.) is called twice inside the optimizers. This may cause some redundant operations to be executed (e.g. gradients on the first layer can reuse some computations for the gradients of the following layers).

Layer-specific learning rate in Keras Model

You can use tfa.optimizers.MultiOptimizer from the tensorflow_addons package.

See directly from the docs:

import tensorflow as tf
import tensorflow_addons as tfa

model = tf.keras.Sequential([
    tf.keras.Input(shape=(4,)),
    tf.keras.layers.Dense(8),
    tf.keras.layers.Dense(16),
    tf.keras.layers.Dense(32),
])
optimizers = [
    tf.keras.optimizers.Adam(learning_rate=1e-4),
    tf.keras.optimizers.Adam(learning_rate=1e-2)
]
optimizers_and_layers = [(optimizers[0], model.layers[0]), (optimizers[1], model.layers[1:])]
optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)
model.compile(optimizer=optimizer, loss="mse")

Note "Each optimizer will optimize only the weights associated with its paired layer."

In TensorFlow, is it possible to use different learning rate for different part of the network?

You could use a similar approach mentioned here

Basically set a different var scope around each of the parts of the network you want to train with a separate learning rate then:

optimizer1 = tf.train.AdagradOptimzer(0.0001)
optimizer2 = tf.train.AdagradOptimzer(0.01)

first_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
                                 "scope/prefix/for/first/vars")
first_train_op = optimizer1.minimize(cost, var_list=first_train_vars)

second_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
                                  "scope/prefix/for/second/vars")                     
second_train_op = optimizer2.minimize(cost, var_list=second_train_vars)

How to apply layer-wise learning rate in Pytorch?

Here is the solution:

from torch.optim import Adam

model = Net()

optim = Adam(
    [
        {"params": model.fc.parameters(), "lr": 1e-3},
        {"params": model.agroupoflayer.parameters()},
        {"params": model.lastlayer.parameters(), "lr": 4e-2},
    ],
    lr=5e-4,
)

Other parameters that are didn't specify in optimizer will not optimize. So you should state all layers or groups(OR the layers you want to optimize). and if you didn't specify the learning rate it will take the global learning rate(5e-4).
The trick is when you create the model you should give names to the layers or you can group it.

How to change Learning rate in Tensorflow after a batch end?

Reduce Learning rate on plateau only adjust the learning rate at the end of an epoch. It does not due so at the end of a batch. To do that you need to create a custom callback. If I understand you correctly you want to reduce the learning rate by 5% at the end of each batch. The code below will do that for you. In the callback model is the name of your compiled model. freq is an integer that determine how often the learning rate is adjusted. If freq=1 it will be adjusted at the end of every batch. If freq=2 it will be adjust on every other batch etc. reduction_pct is a float. It is the percent the learning rate will be reduced by. Verbose is a boolean. If verbose=True, a print out will occur each time the learning rate is adjusted showing the LR used for the just completed batch and the LR that will be used for the next batch. If verbose=False no printout is generated.

class ADJLR_ON_BATCH(keras.callbacks.Callback):
    def __init__ (self, model, freq, reduction_pct, verbose):
        super(ADJLR_ON_BATCH, self).__init__()
        self.model=model
        self.freq=freq
        self.reduction_pct =reduction_pct
        self.verbose=verbose
        self.adj_batch=freq
        self.factor= 1.0-reduction_pct * .01
    def on_train_batch_end(self, batch, logs=None):
        lr=float(tf.keras.backend.get_value(self.model.optimizer.lr)) # get the current learning rate
        if batch + 1 == self.adj_batch:
            new_lr=lr * self.factor
            tf.keras.backend.set_value(self.model.optimizer.lr, new_lr) # set the learning rate in the optimizer
            self.adj_batch +=self.freq
            if verbose:
                print('\nat the end of batch ',batch + 1, ' lr was adjusted from ', lr, ' to ', new_lr)

Below is an example of use of the callback with the values I believe you wish to use

model=your_model_name # variable name of your model
reduction_pct=5.0 # reduce lr by 5%
verbose=True  # print out each time the LR is adjusted
frequency=1   # adjust LR at the end of every batch
callbacks=[ADJLR_ON_BATCH(model, frequency, reduction_pct, verbose)]

Remember to include callbacks=callbacks in model.fit. Below is a sample of the resulting printout starting with an LR of .001

at the end of batch  1  lr was adjusted from  0.0010000000474974513  to  0.0009500000451225787
  1/374 [..............................] - ETA: 1:14:55 - loss: 9.3936 - accuracy: 0.3333
at the end of batch  2  lr was adjusted from  0.0009500000160187483  to  0.0009025000152178108

at the end of batch  3  lr was adjusted from  0.0009025000035762787  to  0.0008573750033974647
  3/374 [..............................] - ETA: 25:04 - loss: 9.1338 - accuracy: 0.4611  
at the end of batch  4  lr was adjusted from  0.0008573749801144004  to  0.0008145062311086804

How to Set Layer-Wise Learning Rate in Tensorflow