Trying to perform transposed convolution but missing a pixel - python-3.x

def get_unet(input_img, n_filters=16, dropout=0.5, batchnorm=True):
# contracting path
c1 = conv2d_block(input_img, n_filters=n_filters * 1, kernel_size=3, batchnorm=batchnorm)
p1 = MaxPooling2D((2, 2))(c1)
p1 = Dropout(dropout * 0.5)(p1)
c2 = conv2d_block(p1, n_filters=n_filters * 2, kernel_size=3, batchnorm=batchnorm)
p2 = MaxPooling2D((2, 2))(c2)
p2 = Dropout(dropout)(p2)
c3 = conv2d_block(p2, n_filters=n_filters * 4, kernel_size=3, batchnorm=batchnorm)
p3 = MaxPooling2D((2, 2))(c3)
p3 = Dropout(dropout)(p3)
c4 = conv2d_block(p3, n_filters=n_filters * 8, kernel_size=3, batchnorm=batchnorm)
p4 = MaxPooling2D(pool_size=(2, 2))(c4)
p4 = Dropout(dropout)(p4)
c5 = conv2d_block(p4, n_filters=n_filters * 16, kernel_size=3, batchnorm=batchnorm)
# expansive path
u6 = Conv2DTranspose(n_filters * 8, (3, 3), strides=(2, 2), padding='same')(c5)
u6 = concatenate([u6, c4])
u6 = Dropout(dropout)(u6)
c6 = conv2d_block(u6, n_filters=n_filters * 8, kernel_size=3, batchnorm=batchnorm)
u7 = Conv2DTranspose(n_filters * 4, (3, 3), strides=(2, 2), padding='same')(c6)
u7 = concatenate([u7, c3])
u7 = Dropout(dropout)(u7)
c7 = conv2d_block(u7, n_filters=n_filters * 4, kernel_size=3, batchnorm=batchnorm)
u8 = Conv2DTranspose(n_filters * 2, (3, 3), strides=(2, 2), padding='same')(c7)
u8 = concatenate([u8, c2])
u8 = Dropout(dropout)(u8)
c8 = conv2d_block(u8, n_filters=n_filters * 2, kernel_size=3, batchnorm=batchnorm)
u9 = Conv2DTranspose(n_filters * 1, (3, 3), strides=(2, 2), padding='same')(c8)
u9 = concatenate([u9, c1], axis=3)
u9 = Dropout(dropout)(u9)
c9 = conv2d_block(u9, n_filters=n_filters * 1, kernel_size=3, batchnorm=batchnorm)
outputs = Conv2D(1, (1, 1), activation='sigmoid')(c9)
model = Model(inputs=[input_img], outputs=[outputs])
return model
I got this model for Keras from here. I seem to be getting the error:
File "train.py", line 87, in get_unet
u8 = concatenate([u8, c2])
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 256, 184, 32), (None, 256, 185, 32)]
So I printed the values of each of these Tensors, and I got:
c1: Tensor("activation_2/Relu:0", shape=(?, 512, 370, 16), dtype=float32)
c2: Tensor("activation_4/Relu:0", shape=(?, 256, 185, 32), dtype=float32)
c3: Tensor("activation_6/Relu:0", shape=(?, 128, 92, 64), dtype=float32)
c4: Tensor("activation_8/Relu:0", shape=(?, 64, 46, 128), dtype=float32)
c5: Tensor("activation_10/Relu:0", shape=(?, 32, 23, 256), dtype=float32)
u6: Tensor("dropout_5/cond/Merge:0", shape=(?, 64, 46, 256), dtype=float32)
u7: Tensor("dropout_6/cond/Merge:0", shape=(?, 128, 92, 128), dtype=float32)
u8: Tensor("conv2d_transpose_3/BiasAdd:0", shape=(?, ?, ?, 32), dtype=float32)
What happened at C2? Why is the second dimension of u8 184, while the second dimension of C2 seems to be 185. Furthermore, C3s second dimension seems to to be maxpooled by a factor of 2 from 184 (probably due to a floor function)
How would I combat this? Do I have to change the size of the images that are being inputted, or do I have to engineer something while doing the transpose convolution? Do I need to perform interpolation for the one extra pixel?

That's happening because your second dimension is not even when you divide it by 2 in your C2 layer.
You are maxpooling 185 by a factor of 2, which gives you 92.5 -> floor to 92
But when you do the operation in the other way, you are upsampling 92 by a factor of 2 which gives you 184.
To avoid this you can simply zeropad U8 to be compatible with C2, like this :
u8 = Conv2DTranspose(n_filters * 2, (3, 3), strides=(2, 2), padding='same')(c7)
u8 = ZeroPadding2D(padding=((0, 0), (0, 1)))(u8)
u8 = concatenate([u8, c2])
If you don't want to zeropad, you can reshape your input images in order to have a dimension corresponding to a power of 2 or a dimension that can be divided by two multiple times without giving an odd number, like 224 (can be divided by two 5 times before giving 7).
Hope that will help you !

Related

Obtaining a specific shape using nn.Conv2d

Starting with an input shape like (64, 1, 103, 8) how should I set the parameters of nn.Conv2d to arrive at a shape of (64, 32, 43, 8)?
Currently I'm using the following
nn.Conv2d(in_channels=1,out_channels=32, stride=(2,1),kernel_size=(3,3),padding=(0,1),dilation=(9,1))
But I'm afraid that dilation parameter may cause bad performance.
You can use padding = (13, 1), stride = (3, 1) and kernel_size = 3:
nn.Conv2d(1, 32, 3, stride = (3, 1), padding = (13, 1))

how to fit the dimension in the autoencoder of Keras

I am using a convolutional autoencoder for the Mnist image data (with dimension 28*28), here is my code
input_img = Input(shape=(28, 28, 1))
x = Convolution2D(16, (5, 5), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Convolution2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Convolution2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Convolution2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Convolution2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Convolution2D(16, (5, 5), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Convolution2D(1, (3, 3), activation='sigmoid', padding='same')(x)
I get an error message (with padding ='same' at each layer)
ValueError: Error when checking target: expected conv2d_148 to have shape (32, 32, 1) but got array
with shape (28, 28, 1)
Here is my model summary
Layer (type) Output Shape Param #
input_20 (InputLayer) (None, 28, 28, 1) 0
conv2d_142 (Conv2D) (None, 28, 28, 16) 416
max_pooling2d_64 (MaxPooling (None, 14, 14, 16) 0
conv2d_143 (Conv2D) (None, 14, 14, 8) 1160
max_pooling2d_65 (MaxPooling (None, 7, 7, 8) 0
conv2d_144 (Conv2D) (None, 7, 7, 8) 584
max_pooling2d_66 (MaxPooling (None, 4, 4, 8) 0
conv2d_145 (Conv2D) (None, 4, 4, 8) 584
up_sampling2d_64 (UpSampling (None, 8, 8, 8) 0
conv2d_146 (Conv2D) (None, 8, 8, 8) 584
up_sampling2d_65 (UpSampling (None, 16, 16, 8) 0
conv2d_147 (Conv2D) (None, 16, 16, 16) 3216
up_sampling2d_66 (UpSampling (None, 32, 32, 16) 0
conv2d_148 (Conv2D) (None, 32, 32, 1) 145
Total params: 6,689
Trainable params: 6,689
Non-trainable params: 0
I know if I change the first layer to
x = Convolution2D(16, (3, 3), activation='relu', padding='same')(input_img)
It works but I want to use a 5*5 convolution.
How it happens?
You can increase your last filter size to (5, 5) to make this work:
from tensorflow.keras.layers import *
from tensorflow.keras import Model, Input
import numpy as np
input_img = Input(shape=(28, 28, 1))
x = Conv2D(16, (5, 5), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (5, 5), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (5, 5), activation='sigmoid', padding='valid')(x)
auto = Model(input_img, decoded)
auto.build(input_shape=(1, 28, 28, 1))
auto(np.random.rand(1, 28, 28, 1)).shape
TensorShape([1, 28, 28, 1])
Or, use tf.keras.Conv2DTranspose

PyTorch convolutional block - CIFAR10 - RuntimeError

I am using PyTorch 1.7 and Python 3.8 with CIFAR-10 dataset. I am trying to create a block with: conv -> conv -> pool -> fc. Fully connected layer (fc) has 256 neurons. The code for this is as follows:
# Testing-
conv1 = nn.Conv2d(
in_channels = 3, out_channels = 64,
kernel_size = 3, stride = 1,
padding = 1, bias = True
)
conv2 = nn.Conv2d(
in_channels = 64, out_channels = 64,
kernel_size = 3, stride = 1,
padding = 1, bias = True
)
pool = nn.MaxPool2d(
kernel_size = 2, stride = 2
)
fc1 = nn.Linear(
in_features = 64 * 16 * 16, out_features = 256
bias = True
)
images.shape
# torch.Size([32, 3, 32, 32])
x = conv1(images)
x.shape
# torch.Size([32, 64, 32, 32])
x = conv2(x)
x.shape
# torch.Size([32, 64, 32, 32])
x = pool(x)
x.shape
# torch.Size([32, 64, 16, 16])
# This line of code gives error-
x = fc1(x)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (32768x16 and
16384x256)
What is going wrong?
You are nearly there! As you will have noticed nn.MaxPool returns a shape (32, 64, 16, 16) which is incompatible with a nn.Linear's input: a 2D dimensional tensor (batch, in_features). You need to broadcast to (batch, 64*16*16).
I would recommend using a nn.Flatten layer rather than broadcasting yourself. It will act as x.view(x.size(0), -1) but is clearer. By default it preserves the first dimension:
conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1)
conv2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1)
pool = nn.MaxPool2d(kernel_size=2, stride=2)
flatten = nn.Flatten()
fc1 = nn.Linear(in_features=64*16*16, out_features=256)
x = conv1(images)
x = conv2(x)
x = pool(x)
x = flatten(x)
x = fc1(x)
Alternatively, you could use the functional alternative torch.flatten, where you will have to provide the start_dim as 1: x = torch.flatten(x, start_dim=1).
When you're done debugging, you could assemble your layers with nn.Sequential:
model = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(in_features=64*16*16, out_features=256)
)
x = model(images)
you need to flat the output of nn.MaxPool2d layer for giving input in nn.Linear layer.
try to use x = x.view(x.size(0), -1) before giving input to fc layer for flatten tensor.

RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[4, 32, 6, 7] to have 3 channels, but got 32 channels instead

I am trying to implement such CNN.
This is my implementation:
class Net(BaseFeaturesExtractor):
def __init__(self, observation_space: gym.spaces.Box, features_dim: int = 256):
super(Net, self).__init__(observation_space, features_dim)
n_input_channels = observation_space.shape[0]
print("Observation space shape:"+str(observation_space.shape))
print("Number of channels:" + str(n_input_channels))
self.cnn = nn.Sequential(
nn.Conv2d(n_input_channels, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(n_input_channels, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(n_input_channels, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Flatten(),
nn.Linear(in_features=128,out_features=64),
nn.ReLU(),
nn.Linear(in_features=64,out_features=7),
nn.Sigmoid()
)
def forward(self, observations: th.Tensor) -> th.Tensor:
print("Observation shape:"+str(observations[0].shape))
return self.cnn(observations)
When I tried to run the code which uses this CNN, I am getting following log:
Observation space shape:(3, 6, 7)
Number of channels:3
Observation shape:torch.Size([3, 6, 7])
Traceback (most recent call last): File "/Users/joe/Documents/JUPYTER/ConnectX/training3.py", line 250, in <module>
learner.learn(total_timesteps=iterations, callback=eval_callback)
...
RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[4, 32, 6, 7] to have 3 channels, but got 32 channels instead
What is the problem here? How can I solve it?
in_channels of a conv layer should be equal to out_channels of the previous layer. In your case, in_channels of the 2nd and 3rd conv layers don't have the correct values. They should be like below,
self.cnn = nn.Sequential(
nn.Conv2d(n_input_channels, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
...
)
Also, you should check in_features of the 1st Linear layer. It depends on the input shape and should be equal to last_conv_out_channels * last_conv_output_height * last_conv_output_width.
For example, for an input=torch.randn(1, 3, 256, 256) last conv layer's output shape would be ([1, 32, 64, 64]), in that case the 1st Linear layer should be,
nn.Linear(in_features=32*64*64,out_features=64)
---- Update after the comment:
Output shape of a conv layer is calculated through the formula here (see under "Shape:" section). Using input = torch.randn(1, 3, 256, 256) as input to the network, here are outputs of each conv layer (I skipped the ReLUs since they don't change the shape),
conv1: (1, 3, 256, 256) -> (1, 32, 256, 256)
conv2: (1, 32, 256, 256) -> (1, 32, 128, 128)
conv3: (1, 32, 128, 128) -> (1, 32, 64, 64)
So how did last_conv_output_height and last_conv_output_width became 64 ? The last conv layer is defined as follows,
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1)
Data is processed as (num_samples, num_channels, height​, width​) in PyTorch and the default value for dilation is stated as 1 in the conv2d doc. So, for the last conv layer, H_in is 128, padding[0] is 1, dilation[0] is 1, kernel_size[0] is 3 and stride[0] is 2. Therefore, height of its output becomes,
H_out = ⌊(128 + 2 * 1 - 1 * (3 - 1) - 1) / 2⌋ + 1
H_out = 64
Since square-size kernels and equal-size stride, padding and dilation are used, W_out also becomes 64 for the last conv layer.
I think the easiest way to compute in_features for the 1st Linear layer would be run the model for the desired size input until that layer. An example for your architecture,
inp = torch.randn(1, 3, 256, 256)
arch = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1)
)
outp = arch(inp)
print('outp.shape:', outp.shape)
This prints,
outp.shape: torch.Size([1, 32, 64, 64])
Finally, last_conv_out_channels is out_channels of the last conv layer. The last conv layer in your architecture is nn.Conv2d(32, 32, kernel_size=3, stride=2, padding=1). Here out_channels is the 2nd parameter, so last_conv_out_channels is 32.

Bidirectional LSTM with attention layer tf.keras

I am trying to add attention mechanism to the bellow model.
Is there really a need for CTC loss for attention model.
How could I implement a BLSTM with attention mechanism for an image OCR problem.
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return tf.keras.backend.ctc_batch_cost(labels, y_pred, input_length, label_length)
def get_Model(training):
input_shape = (img_w, img_h, 1) # (128, 64, 1)
labels = Input(name='the_labels', shape=[max_text_len], dtype='float32') # (None ,8)
input_length = Input(name='input_length', shape=[1], dtype='int64') # (None, 1)
label_length = Input(name='label_length', shape=[1], dtype='int64') # (None, 1)
# Make Networkw
inputs = Input(name='the_input', shape=input_shape, dtype='float32') # (None, 128, 64, 1)
# Convolution layer (VGG)
inner = Conv2D(64, (3, 3), padding='same', name='conv1', kernel_initializer='he_normal')(inputs) # (None, 128, 64, 64)
inner = BatchNormalization()(inner)
inner = Activation('relu')(inner)
inner = MaxPooling2D(pool_size=(2, 2), name='max1')(inner) # (None,64, 32, 64)
inner = Conv2D(128, (3, 3), padding='same', name='conv2', kernel_initializer='he_normal')(inner) # (None, 64, 32, 128)
inner = BatchNormalization()(inner)
inner = Activation('relu')(inner)
inner = MaxPooling2D(pool_size=(2, 2), name='max2')(inner) # (None, 32, 16, 128)
inner = Conv2D(256, (3, 3), padding='same', name='conv3', kernel_initializer='he_normal')(inner) # (None, 32, 16, 256)
inner = BatchNormalization()(inner)
inner = Activation('relu')(inner)
inner = Conv2D(256, (3, 3), padding='same', name='conv4', kernel_initializer='he_normal')(inner) # (None, 32, 16, 256)
inner = BatchNormalization()(inner)
inner = Activation('relu')(inner)
inner = MaxPooling2D(pool_size=(1, 2), name='max3')(inner) # (None, 32, 8, 256)
inner = Conv2D(512, (3, 3), padding='same', name='conv5', kernel_initializer='he_normal')(inner) # (None, 32, 8, 512)
inner = BatchNormalization()(inner)
inner = Activation('relu')(inner)
inner = Conv2D(512, (3, 3), padding='same', name='conv6')(inner) # (None, 32, 8, 512)
inner = BatchNormalization()(inner)
inner = Activation('relu')(inner)
inner = MaxPooling2D(pool_size=(1, 2), name='max4')(inner) # (None, 32, 4, 512)
inner = Conv2D(512, (2, 2), padding='same', kernel_initializer='he_normal', name='con7')(inner) # (None, 32, 4, 512)
inner = BatchNormalization()(inner)
inner = Activation('relu')(inner)
# CNN to RNN
inner = Reshape(target_shape=((32, 2048)), name='reshape')(inner) # (None, 32, 2048)
inner = Dense(64, activation='relu', kernel_initializer='he_normal', name='dense1')(inner) # (None, 32, 64)
# RNN layer
lstm1 = Bidirectional(LSTM(512, return_sequences=True, kernel_initializer='he_normal'), name='biLSTM1') (inner)
lstm1_norm = BatchNormalization()(lstm1)
lstm2 = Bidirectional(LSTM(512, return_sequences=True, kernel_initializer='he_normal'), name='biLSTM2') (lstm1_norm)
lstm2_norm = BatchNormalization()(lstm2)
# transforms RNN output to character activations:
inner = Dense(num_classes, kernel_initializer='he_normal',name='dense2')(lstm2_norm) #(None, 32, 63)
y_pred = Activation('softmax', name='softmax')(inner)
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length]) #(None, 1)
if training:
return Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
else:
return Model(inputs=[inputs], outputs=y_pred)```
Is lstm1 is encoder and lstm2 is decoder?
I didn't find any attention implementation using keras functional api and also it seems there is no keras attention layer.

Resources