How to set layer-wise learning rate in Tensorflow?
It can be achieved quite easily with 2 optimizers:
var_list1 = [variables from first 5 layers]
var_list2 = [the rest of variables]
train_op1 = GradientDescentOptimizer(0.00001).minimize(loss, var_list=var_list1)
train_op2 = GradientDescentOptimizer(0.0001).minimize(loss, var_list=var_list2)
train_op = tf.group(train_op1, train_op2)
One disadvantage of this implementation is that it computes tf.gradients(.) twice inside the optimizers and thus it might not be optimal in terms of execution speed. This can be mitigated by explicitly calling tf.gradients(.), splitting the list into 2 and passing corresponding gradients to both optimizers.
Related question: Holding variables constant during optimizer
EDIT: Added more efficient but longer implementation:
var_list1 = [variables from first 5 layers]
var_list2 = [the rest of variables]
opt1 = tf.train.GradientDescentOptimizer(0.00001)
opt2 = tf.train.GradientDescentOptimizer(0.0001)
grads = tf.gradients(loss, var_list1 + var_list2)
grads1 = grads[:len(var_list1)]
grads2 = grads[len(var_list1):]
tran_op1 = opt1.apply_gradients(zip(grads1, var_list1))
train_op2 = opt2.apply_gradients(zip(grads2, var_list2))
train_op = tf.group(train_op1, train_op2)
You can use tf.trainable_variables()
to get all training variables and decide to select from them.
The difference is that in the first implementation tf.gradients(.)
is called twice inside the optimizers. This may cause some redundant operations to be executed (e.g. gradients on the first layer can reuse some computations for the gradients of the following layers).
Layer-specific learning rate in Keras Model
You can use tfa.optimizers.MultiOptimizer
from the tensorflow_addons
package.
See directly from the docs:
import tensorflow as tf
import tensorflow_addons as tfa
model = tf.keras.Sequential([
tf.keras.Input(shape=(4,)),
tf.keras.layers.Dense(8),
tf.keras.layers.Dense(16),
tf.keras.layers.Dense(32),
])
optimizers = [
tf.keras.optimizers.Adam(learning_rate=1e-4),
tf.keras.optimizers.Adam(learning_rate=1e-2)
]
optimizers_and_layers = [(optimizers[0], model.layers[0]), (optimizers[1], model.layers[1:])]
optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)
model.compile(optimizer=optimizer, loss="mse")
Note "Each optimizer will optimize only the weights associated with its paired layer."
In TensorFlow, is it possible to use different learning rate for different part of the network?
You could use a similar approach mentioned here
Basically set a different var scope around each of the parts of the network you want to train with a separate learning rate then:
optimizer1 = tf.train.AdagradOptimzer(0.0001)
optimizer2 = tf.train.AdagradOptimzer(0.01)
first_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
"scope/prefix/for/first/vars")
first_train_op = optimizer1.minimize(cost, var_list=first_train_vars)
second_train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
"scope/prefix/for/second/vars")
second_train_op = optimizer2.minimize(cost, var_list=second_train_vars)
How to apply layer-wise learning rate in Pytorch?
Here is the solution:
from torch.optim import Adam
model = Net()
optim = Adam(
[
{"params": model.fc.parameters(), "lr": 1e-3},
{"params": model.agroupoflayer.parameters()},
{"params": model.lastlayer.parameters(), "lr": 4e-2},
],
lr=5e-4,
)
Other parameters that are didn't specify in optimizer will not optimize. So you should state all layers or groups(OR the layers you want to optimize). and if you didn't specify the learning rate it will take the global learning rate(5e-4).
The trick is when you create the model you should give names to the layers or you can group it.
How to change Learning rate in Tensorflow after a batch end?
Reduce Learning rate on plateau only adjust the learning rate at the end of an epoch. It does not due so at the end of a batch. To do that you need to create a custom callback. If I understand you correctly you want to reduce the learning rate by 5% at the end of each batch. The code below will do that for you. In the callback model is the name of your compiled model. freq is an integer that determine how often the learning rate is adjusted. If freq=1 it will be adjusted at the end of every batch. If freq=2 it will be adjust on every other batch etc. reduction_pct is a float. It is the percent the learning rate will be reduced by. Verbose is a boolean. If verbose=True, a print out will occur each time the learning rate is adjusted showing the LR used for the just completed batch and the LR that will be used for the next batch. If verbose=False no printout is generated.
class ADJLR_ON_BATCH(keras.callbacks.Callback):
def __init__ (self, model, freq, reduction_pct, verbose):
super(ADJLR_ON_BATCH, self).__init__()
self.model=model
self.freq=freq
self.reduction_pct =reduction_pct
self.verbose=verbose
self.adj_batch=freq
self.factor= 1.0-reduction_pct * .01
def on_train_batch_end(self, batch, logs=None):
lr=float(tf.keras.backend.get_value(self.model.optimizer.lr)) # get the current learning rate
if batch + 1 == self.adj_batch:
new_lr=lr * self.factor
tf.keras.backend.set_value(self.model.optimizer.lr, new_lr) # set the learning rate in the optimizer
self.adj_batch +=self.freq
if verbose:
print('\nat the end of batch ',batch + 1, ' lr was adjusted from ', lr, ' to ', new_lr)
Below is an example of use of the callback with the values I believe you wish to use
model=your_model_name # variable name of your model
reduction_pct=5.0 # reduce lr by 5%
verbose=True # print out each time the LR is adjusted
frequency=1 # adjust LR at the end of every batch
callbacks=[ADJLR_ON_BATCH(model, frequency, reduction_pct, verbose)]
Remember to include callbacks=callbacks in model.fit. Below is a sample of the resulting printout starting with an LR of .001
at the end of batch 1 lr was adjusted from 0.0010000000474974513 to 0.0009500000451225787
1/374 [..............................] - ETA: 1:14:55 - loss: 9.3936 - accuracy: 0.3333
at the end of batch 2 lr was adjusted from 0.0009500000160187483 to 0.0009025000152178108
at the end of batch 3 lr was adjusted from 0.0009025000035762787 to 0.0008573750033974647
3/374 [..............................] - ETA: 25:04 - loss: 9.1338 - accuracy: 0.4611
at the end of batch 4 lr was adjusted from 0.0008573749801144004 to 0.0008145062311086804
Related Topics
How to Check If There Exists a Process with a Given Pid in Python
What Is the Meaning of "Failed Building Wheel for X" in Pip Install
Generalise Slicing Operation in a Numpy Array
How to Remove Leading Whitespace in Python
Anaconda/Conda - Install a Specific Package Version
How to Extract a Url from a String Using Python
Can Pandas Plot a Histogram of Dates
Print List of Lists in Separate Lines
How to Crop to Largest Interior Bounding Box in Opencv
Pygame: Problems with Shooting in Space Invaders
Get Column Index from Column Name in Python Pandas
Creating a Class Within a Function and Access a Function Defined in the Containing Function's Scope
Double Precision Floating Values in Python
How to Use _Init_.Py to Define Global Variables
Handling Urllib2's Timeout? - Python
How to Print a List with Integers Without the Brackets, Commas and No Quotes