Fine tuning freezing weights nnUNet - pytorch

Good morning,
I've followed the instructions in this github issue:
https://github.com/MIC-DKFZ/nnUNet/issues/1108
to fine-tune an nnUNet model (pyTorch) on a pre-trained one, but this method retrain all weights, and i would like to freeze all weigths and retrain only the last layer's weights, changing the number of segmentation classes from 3 to 1.
Do you know a way to do that?
Thank you in advance

To freeze the weights you need to set parameter.requires_grad = False.
Example:
from nnunet.network_architecture.generic_UNet import Generic_UNet
model = Generic_UNet(input_channels=3, base_num_features=64, num_classes=4, num_pool=3)
for name, parameter in model.named_parameters():
if 'seg_outputs' in name:
print(f"parameter '{name}' will not be freezed")
parameter.requires_grad = True
else:
parameter.requires_grad = False
To check parameter names you can use print:
print(model)
which produces:
Generic_UNet(
(conv_blocks_localization): ModuleList(
(0): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(instnorm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(instnorm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(conv_blocks_context): ModuleList(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(dropout): Dropout2d(p=0.5, inplace=True)
(instnorm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
(1): ConvDropoutNormNonlin(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(dropout): Dropout2d(p=0.5, inplace=True)
(instnorm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): Sequential(
(0): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(dropout): Dropout2d(p=0.5, inplace=True)
(instnorm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
(1): StackedConvLayers(
(blocks): Sequential(
(0): ConvDropoutNormNonlin(
(conv): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(dropout): Dropout2d(p=0.5, inplace=True)
(instnorm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lrelu): LeakyReLU(negative_slope=0.01, inplace=True)
)
)
)
)
)
(td): ModuleList(
(0): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
(tu): ModuleList(
(0): Upsample()
)
(seg_outputs): ModuleList(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
Or you can visualize your network with netron:
https://github.com/lutzroeder/netron

Related

Can modifying a layer in a pretrained model break the residual connections?

I'm currently working on an image recognition task using SE-ResNet-50 as my backbone network.
The pretrained SE-ResNet-50 model was obtained from Timm (https://paperswithcode.com/model/se-resnet?variant=seresnet50).
My aim is to modify the SE blocks, also known as the attention blocks, to alternative attention blocks such as CBAM or Dual attention. The script I use is:
model = timm.create_model('seresnet50', pretrained=True)
# search every se blocks in seresnet
seblocks = [name.split('.') for name, _ in model.named_modules() if name.split('.')[-1] == 'se']
for *parent, k in seblocks:
block = model.get_submodule('.'.join(parent))
block.se = CBAM(block.se.fc1.in_channels)
before:
(2): Bottleneck(
...
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-> (se): SEModule(
(fc1): Conv2d(2048, 128, kernel_size=(1, 1), stride=(1, 1))
(bn): Identity()
(act): ReLU(inplace=True)
(fc2): Conv2d(128, 2048, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
)
(act3): ReLU(inplace=True)
)
after:
(2): Bottleneck(
...
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-> (se): CBAM(
(ChannelGate): ChannelGate(
(mlp): Sequential(
(0): Flatten()
(1): Linear(in_features=2048, out_features=128, bias=True)
(2): ReLU()
(3): Linear(in_features=128, out_features=2048, bias=True)
)
)
(SpatialGate): SpatialGate(
(compress): ChannelPool()
(spatial): BasicConv(
(conv): Conv2d(2, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False)
(bn): BatchNorm2d(1, eps=1e-05, momentum=0.01, affine=True, track_running_stats=True)
)
)
)
(act3): ReLU(inplace=True)
)
The replacement of the SE block with the CBAM block was successful.
I have two questions:
Will this replacement affect the residual connection?
How do I modify the module name, changing (SE) to (CBAM)?
thanks

Pytorch Architecture Failure

I am trying to build my own resent architecture.
I have the following Resent Architecture:
ResNetClassifier(
(feature_extractor): Sequential(
(0): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(2): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(3): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(4): AvgPool2d(kernel_size=2, stride=2, padding=0)
(5): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(6): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
(classifier): Sequential(
(0): Linear(in_features=518400, out_features=100, bias=True)
(1): LeakyReLU(negative_slope=0.01)
(2): Linear(in_features=100, out_features=100, bias=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Linear(in_features=100, out_features=10, bias=True)
)
)
when sending tensor in size torch.Size([1, 3, 100, 100])
with the following code
net = cnn.ResNetClassifier(
in_size=(3,100,100), out_classes=10, channels=[32, 64]*3,
pool_every=4, hidden_dims=[100]*2,
activation_type='lrelu', activation_params=dict(negative_slope=0.01),
pooling_type='avg', pooling_params=dict(kernel_size=2),
batchnorm=True, dropout=0.1,
)
print(net)
torch.manual_seed(seed)
test_image = torch.randint(low=0, high=256, size=(3, 100, 100), dtype=torch.float).unsqueeze(0)
test_out = net(test_image)
print('out =', test_out)
it fails with following error:
"RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[1, 64, 100, 100] to have 3 channels, but got 64 channels instead"
Any clue to solve it will be appreciated!
BTW, what is the recommended way to debug such network errors?

Apply hooks on inner layers of ResNet

The pytorch official implementation of resnet results in the following model:
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
)
### Skipping layers 3 and 4
(linear): Linear(in_features=512, out_features=10, bias=True)
)
I tried applying hook to the conv1 in the first BasicBlock of layer2 using
handle = model.layer2[0][0].register_forward_hook(batchout_pre_hook)
but got the following error :
TypeError: 'BasicBlock' object does not support indexing
I am able to apply hook to the BasicBlock using handle = model.layer2[0].register_forward_hook(batchout_pre_hook) but cannot apply hook in the modules present inside the BasicBlock
For attaching a hook to conv1 in layer2's 0th block, you need to use
handle = model.layer2[0].conv1.register_forward_hook(batchout_pre_hook)
This is because inside the 0th block, the modules are named as conv1, bn1, etc. and are not a list to be accessed via an index.

How to move data_parallel model to a specific cuda device?

I currently need to use a pretrained model by setting it on a specific cuda device. The pretrained model is defined as below:
DataParallel(
(module): MobileFaceNet(
(conv1): Conv_block(
(conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=64)
)
(conv2_dw): Conv_block(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=64)
)
(conv_23): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(conv_3): Residual(
(model): Sequential(
(0): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(3): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
)
(conv_34): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(conv_4): Residual(
(model): Sequential(
(0): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(3): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(4): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(5): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
)
(conv_45): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=512)
)
(conv_dw): Conv_block(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=512, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=512)
)
(project): Linear_block(
(conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(conv_5): Residual(
(model): Sequential(
(0): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
)
(conv_6_sep): Conv_block(
(conv): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=512)
)
(conv_6_dw): Linear_block(
(conv): Conv2d(512, 512, kernel_size=(7, 7), stride=(1, 1), groups=512, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_6_flatten): Flatten()
(linear): Linear(in_features=512, out_features=512, bias=False)
(bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
If I conventionally declare
model.to(device)
with device on cuda:1, then it makes error when forwarding:
model(imgs)
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
I think this is because the model was previously trained with data parallel utils in pytorch.
How can I properly set the model to the device that I specifically want?
You should get the neural network out of DataParallel first.
Assuming your DataParallel is named model you could do:
device = torch.device("cuda:1")
module = model.module.to(device)

How to freeze selected layers of a model in Pytorch?

I am using the mobileNetV2 and I only want to freeze part of the model. I know I can use the following code to freeze the entire model
MobileNet = models.mobilenet_v2(pretrained = True)
for param in MobileNet.parameters():
param.requires_grad = False
but I want everything from (15) onward to remain unfrozen. How can I selectively freeze everything before the desired layer is frozen?
(15): InvertedResidual(
(conv): Sequential(
(0): ConvBNReLU(
(0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(1): ConvBNReLU(
(0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(16): InvertedResidual(
(conv): Sequential(
(0): ConvBNReLU(
(0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(1): ConvBNReLU(
(0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(17): InvertedResidual(
(conv): Sequential(
(0): ConvBNReLU(
(0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(1): ConvBNReLU(
(0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(2): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
(3): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(18): ConvBNReLU(
(0): Conv2d(320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
) ) (classifier): Sequential(
(0): Dropout(p=0.2, inplace=False)
(1): Linear(in_features=1280, out_features=1000, bias=True) ) )
Pytorch's model implementation is in good modularization, so like you do
for param in MobileNet.parameters():
param.requires_grad = False
, you may also do
for param in MobileNet.features[15].parameters():
param.requires_grad = True
afterwards to unfreeze parameters in (15).
Loop from 15 to 18 to unfreeze the last several layers.
Just adding this here for completeness. You can also freeze parameters in place without iterating over them with requires_grad_ (API).
For example say you have a RetinaNet and want to just fine-tune on the heads
class RetinaNet(torch.nn.Module):
def __init__(self, ...):
self.backbone = ResNet(...)
self.fpn = FPN(...)
self.box_head = torch.nn.Sequential(...)
self.cls_head = torch.nn.Sequential(...)
Then you could freeze the backbone and FPN like this:
# Getting the model
retinanet = RetinaNet(...)
# Freezing backbone and FPN
retinanet.backbone.requires_grad_(False)
retinanet.fpn.requires_grad_(False)
If you want to define some layers by name and then unfreeze them, I propose a variant of #JVGD's answer:
class RetinaNet(torch.nn.Module):
def __init__(self, ...):
self.backbone = ResNet(...)
self.fpn = FPN(...)
self.box_head = torch.nn.Sequential(...)
self.cls_head = torch.nn.Sequential(...)
# Getting the model
retinanet = RetinaNet(...)
# The param name is f'{module_name}.weight' or f'{module_name}.bias'.
# Some layers, e.g., batch norm, have additional params.
# In some circumstances, e.g., when using DataParallel(),
# the param name is prefixed by 'module.'.
params_to_train = ['cls_head.weight', 'cls_head.bias']
for name, param in retinanet.named_parameters():
# Set True only for params in the list 'params_to_train'
param.requires_grad = True if name in params_to_train else False
...
The advantage is that you can define all layers to unfreeze in one Iterable.
An optimized answer to the first answer above is to freeze only the first 15 layers [0-14] because the last layers [15-18] are by default unfrozen (param.requires_grad = True).
Therefore, we only need to code this way:
MobileNet = torchvision.models.mobilenet_v2(pretrained = True)
for param in MobileNet.features[0:14].parameters():
param.requires_grad = False

Resources