Tutorial 10: Weight initialization¶
During training, a proper initialization strategy is beneficial to speeding up the training or obtaining a higher performance. MMCV provides some commonly used methods for initializing modules like nn.Conv2d. Model initialization in MMdetection mainly uses init_cfg. Users can initialize models with following two steps:
- Define
init_cfgfor a model or its components inmodel_cfg, butinit_cfgof children components have higher priority and will overrideinit_cfgof parents modules. - Build model as usual, but call
model.init_weights()method explicitly, and model parameters will be initialized as configuration.
The high-level workflow of initialization in MMdetection is :
model_cfg(init_cfg) -> build_from_cfg -> model -> init_weight() -> initialize(self, self.init_cfg) -> children's init_weight()
Description¶
It is dict or list[dict], and contains the following keys and values:
type(str), containing the initializer name inINTIALIZERS, and followed by arguments of the initializer.layer(str or list[str]), containing the names of basiclayers in Pytorch or MMCV with learnable parameters that will be initialized, e.g.'Conv2d','DeformConv2d'.override(dict or list[dict]), containing the sub-modules that not inherit from BaseModule and whose initialization configuration is different from other layers' which are in'layer'key. Initializer defined intypewill work for all layers defined inlayer, so if sub-modules are not derived Classes ofBaseModulebut can be initialized as same ways of layers inlayer, it does not need to useoverride.overridecontains:typefollowed by arguments of initializer;nameto indicate sub-module which will be initialized.
Initialize parameters¶
Inherit a new model from mmcv.runner.BaseModule or mmdet.models Here we show an example of FooModel.
import torch.nn as nn
from mmcv.runner import BaseModule
class FooModel(BaseModule)
def __init__(self,
arg1,
arg2,
init_cfg=None):
super(FooModel, self).__init__(init_cfg)
...
- Initialize model by using
init_cfgdirectly in code
import torch.nn as nn
from mmcv.runner import BaseModule
# or directly inherit mmdet models
class FooModel(BaseModule)
def __init__(self,
arg1,
arg2,
init_cfg=XXX):
super(FooModel, self).__init__(init_cfg)
...
- Initialize model by using
init_cfgdirectly inmmcv.Sequentialormmcv.ModuleListcode
from mmcv.runner import BaseModule, ModuleList
class FooModel(BaseModule)
def __init__(self,
arg1,
arg2,
init_cfg=None):
super(FooModel, self).__init__(init_cfg)
...
self.conv1 = ModuleList(init_cfg=XXX)
- Initialize model by using
init_cfgin config file
Usage of init_cfg¶
- Initialize model by
layerkey
If we only define layer, it just initialize the layer in layer key.
NOTE: Value of layer key is the class name with attributes weights and bias of Pytorch, (so such as MultiheadAttention layer is not supported).
- Define
layerkey for initializing module with same configuration.
init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d', 'Linear'], val=1)
# initialize whole module with same configuration
- Define
layerkey for initializing layer with different configurations.
init_cfg = [dict(type='Constant', layer='Conv1d', val=1),
dict(type='Constant', layer='Conv2d', val=2),
dict(type='Constant', layer='Linear', val=3)]
# nn.Conv1d will be initialized with dict(type='Constant', val=1)
# nn.Conv2d will be initialized with dict(type='Constant', val=2)
# nn.Linear will be initialized with dict(type='Constant', val=3)
-
Initialize model by
overridekey -
When initializing some specific part with its attribute name, we can use
overridekey, and the value inoverridewill ignore the value in init_cfg.
# layers:
# self.feat = nn.Conv1d(3, 1, 3)
# self.reg = nn.Conv2d(3, 3, 3)
# self.cls = nn.Linear(1,2)
init_cfg = dict(type='Constant',
layer=['Conv1d','Conv2d'], val=1, bias=2,
override=dict(type='Constant', name='reg', val=3, bias=4))
# self.feat and self.cls will be initialized with dict(type='Constant', val=1, bias=2)
# The module called 'reg' will be initialized with dict(type='Constant', val=3, bias=4)
- If
layeris None in init_cfg, only sub-module with the name in override will be initialized, and type and other args in override can be omitted.
# layers:
# self.feat = nn.Conv1d(3, 1, 3)
# self.reg = nn.Conv2d(3, 3, 3)
# self.cls = nn.Linear(1,2)
init_cfg = dict(type='Constant', val=1, bias=2, override=dict(name='reg'))
# self.feat and self.cls will be initialized by Pytorch
# The module called 'reg' will be initialized with dict(type='Constant', val=1, bias=2)
-
If we don't define
layerkey oroverridekey, it will not initialize anything. -
Invalid usage
# It is invalid that override don't have name key
init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'], val=1, bias=2,
override=dict(type='Constant', val=3, bias=4))
# It is also invalid that override has name and other args except type
init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'], val=1, bias=2,
override=dict(name='reg', val=3, bias=4))
- Initialize model with the pretrained model
More details can refer to the documentation in MMCV and MMCV PR #780