What Does Model.Train() Do in Pytorch

What does model.train() do in PyTorch?

model.train() tells your model that you are training the model. This helps inform layers such as Dropout and BatchNorm, which are designed to behave differently during training and evaluation. For instance, in training mode, BatchNorm updates a moving average on each new batch; whereas, for evaluation mode, these updates are frozen.

More details:
model.train() sets the mode to train
(see source code). You can call either model.eval() or model.train(mode=False) to tell that you are testing.
It is somewhat intuitive to expect train function to train model but it does not do that. It just sets the mode.

What does model.eval() do in pytorch?

model.eval() is a kind of switch for some specific layers/parts of the model that behave differently during training and inference (evaluating) time. For example, Dropouts Layers, BatchNorm Layers etc. You need to turn them off during model evaluation, and .eval() will do it for you. In addition, the common practice for evaluating/validation is using torch.no_grad() in pair with model.eval() to turn off gradients computation:

# evaluate model:
model.eval()

with torch.no_grad():
...
out_data = model(data)
...

BUT, don't forget to turn back to training mode after eval step:

# training step
...
model.train()
...

Understanding model training and evaluation in Pytorch

  1. For each epoch, you are doing train, followed by validation/test.
    For validation/test you are moving the model to evaluation model
    using model.eval() and then doing forward propagation with
    torch.no_grad() which is correct. Again, you are moving back the
    model back to train model using model.train() at the start of
    train. There is no issue with the code and you are using the model
    modes correctly.

  2. In your code, if adaptive_lr if False then you are optimizing the parameters given by model.parameters() and when adaptive_lr
    is True then you are optimizing:

    • model.weight
    • model.linear1.parameters()
    • model.linear2.parameters()
    • model.layers.parameters()

Does model.train() put every thing in train mode in pytorch even sub-networks?

By the look of it, if you call train() on a module, it will call train() recursively on all children. So model.train() - model being the model containing the Resnet - will suffice.

Which PyTorch modules are affected by model.eval() and model.train()?

In addition to info provided by @iacob:



























Base classModuleCriteria
RNNBaseRNN
LSTM
GRU
dropout > 0 (default: 0)
Transformer layersTransformer
TransformerEncoder
TransformerDecoder
dropout > 0 (Transformer default: 0.1)
Lazy variantsLazyBatchNorm
currently nightly
merged PR
track_running_stats=True

Confusion about model.train()

train mode or eval mode only matters when you have modules that behave asymmetrically (e.g. BatchNorm, Dropout) in training/testing. I would like to emphasize that it does not affect gradient accumulation at all. Even with asymmetrical modules, one can perfectly train a model in eval mode. Some do this in order to save memory in training using a pretrained ImageNet model.

If you don't have any asymmetrical modules, it does not matter at all.

By default, all modules start with training=True.



Related Topics



Leave a reply



Submit