How to Initialize Weights in Pytorch

How do I initialize weights in PyTorch?

Single layer

To initialize the weights of a single layer, use a function from torch.nn.init. For instance:

conv1 = torch.nn.Conv2d(...)
torch.nn.init.xavier_uniform(conv1.weight)

Alternatively, you can modify the parameters by writing to conv1.weight.data (which is a torch.Tensor). Example:

conv1.weight.data.fill_(0.01)

The same applies for biases:

conv1.bias.data.fill_(0.01)

nn.Sequential or custom nn.Module

Pass an initialization function to torch.nn.Module.apply. It will initialize the weights in the entire nn.Module recursively.

apply(fn): Applies fn recursively to every submodule (as returned by .children()) as well as self. Typical use includes initializing the parameters of a model (see also torch-nn-init).

Example:

def init_weights(m):
if isinstance(m, nn.Linear):
torch.nn.init.xavier_uniform(m.weight)
m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

How to initialize weights in a pytorch model

You are deciding how to initialise the weight by checking that the class name includes Conv with classname.find('Conv'). Your class has the name upConv, which includes Conv, therefore you try to initialise its attribute .weight, but that doesn't exist.

Either rename your class or make the condition more strict, such as classname.find('Conv2d'). The strictest approach would be to check whether it's an instance of nn.Conv2d, instead of looking at the name of the class.

def weights_init(m):
if isinstance(m, nn.Conv2d):
m.weight.data.normal_(0.0, 0.02)
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.normal_(1.0, 0.02)
m.bias.data.fill_(0)

How to initialise (and check sanity) weights efficiently of layers within complex (nested) modules in PyTorch?

You can simply iterate over all submodules, at the end of your __init__ method:

class Generator(nn.Module):
def __init__(self, ....):
# all code here
# ...
# init weights, at the very bottom of __init__
for sm in self.modules():
if isinstance(sm, nn.Conv2d):
# only conv2d will be initialized in this way
torch.nn.init.normal_(sm.weight.data, 0.0, 0.02)

done.

Follow-up to In PyTorch how are layer weights and biases initialized by default?

Convolutional modules such as nn.Conv1d, nn.Conv2d, and nn.Conv3d inherit from the _ConvNd class. This class has a reset_parameters function implemented just like nn.Linear:

def reset_parameters(self) -> None:
# Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
# uniform(-1/sqrt(k), 1/sqrt(k)), where k = weight.size(1) * prod(*kernel_size)
# For more details see:
# https://github.com/pytorch/pytorch/issues/15314#issuecomment-477448573
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)

As for nn.BatchNorm2d, it has reset_parameters and reset_running_stats function:

def reset_parameters(self) -> None:
self.reset_running_stats()
if self.affine:
init.ones_(self.weight)
init.zeros_(self.bias)

def reset_running_stats(self) -> None:
if self.track_running_stats:
# running_mean/running_var/num_batches... are registered at runtime depending
# if self.track_running_stats is on
self.running_mean.zero_() # type: ignore[operator]
self.running_var.fill_(1) # type: ignore[operator]
self.num_batches_tracked.zero_() # type: ignore[operator]

When does Pytorch initialize parameters?

For the basic layers (e.g., nn.Conv, nn.Linear, etc.) the parameters are initialized by the __init__ method of the layer.

For example, look at the source code of class _ConvNd(Module) (the class from which all other convolution layers are derived). At the bottom of its __init__ it calls self.reset_parameters() which initialize the weights.

Therefore, if your nn.Module does not have any "independent" nn.Parameters, only trainable parameters inside sub-nn.Modules, when you construct your network, all weights of the sub modules are being initialized as the sub modules are constructed.
That is, once you call h_model = H_model() the weights of h_model are already initialized to their default values. Calling h_model.load_state_dict(...) overrides these values to the desired pre-trained weights.

Initialize weight in pytorch neural net

type(param) will only return the actual datatype called a parameter for any type of weight or data in the model. Because named_parameters() doesn't return anything useful in the name either when used on an nn.sequential-based model, you need to look at the modules to see which layers are specifically related to the nn.Conv2d class using isinstance as such:

for layer in D.modules():
if isinstance(layer, nn.Conv2d):
layer.weight.data.normal_(...)

Or, the way that is recommended by Soumith Chintala himself, actually just loop through your main module itself:

for L,layer in D.main:
if isisntance(layer,nn.Conv2d):
layer.weight.data.normal_(..)

I actually prefer the first because you don't have to specify the exact nn.sequential module itself, and will search all possible modules in the model, but either one should do the job for you.

Create a new model in pytorch with custom initial value for the weights

You can use simply torch.nn.Parameter() to assign a custom weight for the layer of your network.

As in your case -

model.fc1.weight = torch.nn.Parameter(custom_weight)

torch.nn.Parameter: A kind of Tensor that is to be considered a module parameter.

For Example:

# Classifier model
model = Classifier()

# your custom weight, here taking randam
custom_weight = torch.rand(model.fc1.weight.shape)
custom_weight.shape
torch.Size([128, 784])

# before assign custom weight
print(model.fc1.weight)
Parameter containing:
tensor([[ 1.6920e-02, 4.6515e-03, -1.0214e-02, ..., -7.6517e-03,
2.3892e-02, -8.8965e-03],
...,
[-2.3137e-02, 5.8483e-03, 4.4392e-03, ..., -1.6159e-02,
7.9369e-03, -7.7326e-03]])

# assign custom weight to first layer
model.fc1.weight = torch.nn.Parameter(custom_weight)

# after assign custom weight
model.fc1.weight
Parameter containing:
tensor([[ 0.1724, 0.7513, 0.8454, ..., 0.8780, 0.5330, 0.5847],
[ 0.8500, 0.7687, 0.3371, ..., 0.7464, 0.1503, 0.7720],
[ 0.8514, 0.6530, 0.6261, ..., 0.7867, 0.9312, 0.3890],
...,
[ 0.5426, 0.7655, 0.1191, ..., 0.4343, 0.2500, 0.6207],
[ 0.2310, 0.4260, 0.4138, ..., 0.1168, 0.5946, 0.2505],
[ 0.4220, 0.5500, 0.6282, ..., 0.5921, 0.7953, 0.9997]])


Related Topics



Leave a reply



Submit