Related
I'm currently working on an image recognition task using SE-ResNet-50 as my backbone network.
The pretrained SE-ResNet-50 model was obtained from Timm (https://paperswithcode.com/model/se-resnet?variant=seresnet50).
My aim is to modify the SE blocks, also known as the attention blocks, to alternative attention blocks such as CBAM or Dual attention. The script I use is:
model = timm.create_model('seresnet50', pretrained=True)
# search every se blocks in seresnet
seblocks = [name.split('.') for name, _ in model.named_modules() if name.split('.')[-1] == 'se']
for *parent, k in seblocks:
block = model.get_submodule('.'.join(parent))
block.se = CBAM(block.se.fc1.in_channels)
before:
(2): Bottleneck(
...
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-> (se): SEModule(
(fc1): Conv2d(2048, 128, kernel_size=(1, 1), stride=(1, 1))
(bn): Identity()
(act): ReLU(inplace=True)
(fc2): Conv2d(128, 2048, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
)
(act3): ReLU(inplace=True)
)
after:
(2): Bottleneck(
...
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
-> (se): CBAM(
(ChannelGate): ChannelGate(
(mlp): Sequential(
(0): Flatten()
(1): Linear(in_features=2048, out_features=128, bias=True)
(2): ReLU()
(3): Linear(in_features=128, out_features=2048, bias=True)
)
)
(SpatialGate): SpatialGate(
(compress): ChannelPool()
(spatial): BasicConv(
(conv): Conv2d(2, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False)
(bn): BatchNorm2d(1, eps=1e-05, momentum=0.01, affine=True, track_running_stats=True)
)
)
)
(act3): ReLU(inplace=True)
)
The replacement of the SE block with the CBAM block was successful.
I have two questions:
Will this replacement affect the residual connection?
How do I modify the module name, changing (SE) to (CBAM)?
thanks
I am trying to build my own resent architecture.
I have the following Resent Architecture:
ResNetClassifier(
(feature_extractor): Sequential(
(0): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(1): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(2): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(3): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(4): AvgPool2d(kernel_size=2, stride=2, padding=0)
(5): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(6): ResidualBlock(
(main_path): Sequential(
(0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Dropout2d(p=0.1, inplace=False)
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Dropout2d(p=0.1, inplace=False)
(6): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.01)
(8): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): Dropout2d(p=0.1, inplace=False)
(10): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): LeakyReLU(negative_slope=0.01)
(12): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): Dropout2d(p=0.1, inplace=False)
(14): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): LeakyReLU(negative_slope=0.01)
(16): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): Dropout2d(p=0.1, inplace=False)
(18): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): LeakyReLU(negative_slope=0.01)
(20): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(shortcut_path): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
(classifier): Sequential(
(0): Linear(in_features=518400, out_features=100, bias=True)
(1): LeakyReLU(negative_slope=0.01)
(2): Linear(in_features=100, out_features=100, bias=True)
(3): LeakyReLU(negative_slope=0.01)
(4): Linear(in_features=100, out_features=10, bias=True)
)
)
when sending tensor in size torch.Size([1, 3, 100, 100])
with the following code
net = cnn.ResNetClassifier(
in_size=(3,100,100), out_classes=10, channels=[32, 64]*3,
pool_every=4, hidden_dims=[100]*2,
activation_type='lrelu', activation_params=dict(negative_slope=0.01),
pooling_type='avg', pooling_params=dict(kernel_size=2),
batchnorm=True, dropout=0.1,
)
print(net)
torch.manual_seed(seed)
test_image = torch.randint(low=0, high=256, size=(3, 100, 100), dtype=torch.float).unsqueeze(0)
test_out = net(test_image)
print('out =', test_out)
it fails with following error:
"RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[1, 64, 100, 100] to have 3 channels, but got 64 channels instead"
Any clue to solve it will be appreciated!
BTW, what is the recommended way to debug such network errors?
The pytorch official implementation of resnet results in the following model:
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
)
### Skipping layers 3 and 4
(linear): Linear(in_features=512, out_features=10, bias=True)
)
I tried applying hook to the conv1 in the first BasicBlock of layer2 using
handle = model.layer2[0][0].register_forward_hook(batchout_pre_hook)
but got the following error :
TypeError: 'BasicBlock' object does not support indexing
I am able to apply hook to the BasicBlock using handle = model.layer2[0].register_forward_hook(batchout_pre_hook) but cannot apply hook in the modules present inside the BasicBlock
For attaching a hook to conv1 in layer2's 0th block, you need to use
handle = model.layer2[0].conv1.register_forward_hook(batchout_pre_hook)
This is because inside the 0th block, the modules are named as conv1, bn1, etc. and are not a list to be accessed via an index.
I currently need to use a pretrained model by setting it on a specific cuda device. The pretrained model is defined as below:
DataParallel(
(module): MobileFaceNet(
(conv1): Conv_block(
(conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=64)
)
(conv2_dw): Conv_block(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=64)
)
(conv_23): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(conv_3): Residual(
(model): Sequential(
(0): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(3): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(conv_dw): Conv_block(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=128)
)
(project): Linear_block(
(conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
)
(conv_34): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(conv_4): Residual(
(model): Sequential(
(0): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(3): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(4): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(5): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
)
(conv_45): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=512)
)
(conv_dw): Conv_block(
(conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=512, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=512)
)
(project): Linear_block(
(conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(conv_5): Residual(
(model): Sequential(
(0): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Depth_Wise(
(conv): Conv_block(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(conv_dw): Conv_block(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False)
(bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=256)
)
(project): Linear_block(
(conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
)
(conv_6_sep): Conv_block(
(conv): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(prelu): PReLU(num_parameters=512)
)
(conv_6_dw): Linear_block(
(conv): Conv2d(512, 512, kernel_size=(7, 7), stride=(1, 1), groups=512, bias=False)
(bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_6_flatten): Flatten()
(linear): Linear(in_features=512, out_features=512, bias=False)
(bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
If I conventionally declare
model.to(device)
with device on cuda:1, then it makes error when forwarding:
model(imgs)
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
I think this is because the model was previously trained with data parallel utils in pytorch.
How can I properly set the model to the device that I specifically want?
You should get the neural network out of DataParallel first.
Assuming your DataParallel is named model you could do:
device = torch.device("cuda:1")
module = model.module.to(device)
I am using the mobileNetV2 and I only want to freeze part of the model. I know I can use the following code to freeze the entire model
MobileNet = models.mobilenet_v2(pretrained = True)
for param in MobileNet.parameters():
param.requires_grad = False
but I want everything from (15) onward to remain unfrozen. How can I selectively freeze everything before the desired layer is frozen?
(15): InvertedResidual(
(conv): Sequential(
(0): ConvBNReLU(
(0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(1): ConvBNReLU(
(0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(16): InvertedResidual(
(conv): Sequential(
(0): ConvBNReLU(
(0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(1): ConvBNReLU(
(0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(17): InvertedResidual(
(conv): Sequential(
(0): ConvBNReLU(
(0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(1): ConvBNReLU(
(0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
)
(2): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
(3): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(18): ConvBNReLU(
(0): Conv2d(320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU6(inplace=True)
) ) (classifier): Sequential(
(0): Dropout(p=0.2, inplace=False)
(1): Linear(in_features=1280, out_features=1000, bias=True) ) )
Pytorch's model implementation is in good modularization, so like you do
for param in MobileNet.parameters():
param.requires_grad = False
, you may also do
for param in MobileNet.features[15].parameters():
param.requires_grad = True
afterwards to unfreeze parameters in (15).
Loop from 15 to 18 to unfreeze the last several layers.
Just adding this here for completeness. You can also freeze parameters in place without iterating over them with requires_grad_ (API).
For example say you have a RetinaNet and want to just fine-tune on the heads
class RetinaNet(torch.nn.Module):
def __init__(self, ...):
self.backbone = ResNet(...)
self.fpn = FPN(...)
self.box_head = torch.nn.Sequential(...)
self.cls_head = torch.nn.Sequential(...)
Then you could freeze the backbone and FPN like this:
# Getting the model
retinanet = RetinaNet(...)
# Freezing backbone and FPN
retinanet.backbone.requires_grad_(False)
retinanet.fpn.requires_grad_(False)
If you want to define some layers by name and then unfreeze them, I propose a variant of #JVGD's answer:
class RetinaNet(torch.nn.Module):
def __init__(self, ...):
self.backbone = ResNet(...)
self.fpn = FPN(...)
self.box_head = torch.nn.Sequential(...)
self.cls_head = torch.nn.Sequential(...)
# Getting the model
retinanet = RetinaNet(...)
# The param name is f'{module_name}.weight' or f'{module_name}.bias'.
# Some layers, e.g., batch norm, have additional params.
# In some circumstances, e.g., when using DataParallel(),
# the param name is prefixed by 'module.'.
params_to_train = ['cls_head.weight', 'cls_head.bias']
for name, param in retinanet.named_parameters():
# Set True only for params in the list 'params_to_train'
param.requires_grad = True if name in params_to_train else False
...
The advantage is that you can define all layers to unfreeze in one Iterable.
An optimized answer to the first answer above is to freeze only the first 15 layers [0-14] because the last layers [15-18] are by default unfrozen (param.requires_grad = True).
Therefore, we only need to code this way:
MobileNet = torchvision.models.mobilenet_v2(pretrained = True)
for param in MobileNet.features[0:14].parameters():
param.requires_grad = False