I would like to implement operations on the results of two keras conv2d layers (Ix,Iy) in a deep learning architecture for a computer vision task. The operation looks as follows:
G = np.hypot(Ix, Iy)
G = G / G.max() * 255
theta = np.arctan2(Iy, Ix)
I've spent some time looking for operations provided by keras but did not have success so far. Among a few others, there's a "add" functionality that allows the user to add the results of two conv2d layers (tf.keras.layers.Add(Ix,Iy)). However, I would like to have a Pythagorean addition (first line) followed by a arctan2 operation (third line).
So ideally, if already implemented by keras it would look as follows:
Does anyone know if it is possible to implement those functionalities within my deep learning architecture? Is it possible to write custom layers that meet my needs?
You could probably use simple Lambda layers for your use case, although they are not absolutely necessary:
import tensorflow as tf
inputs = tf.keras.layers.Input((16, 16, 1))
x = tf.keras.layers.Conv2D(32, (3, 3), padding='same')(inputs)
y = tf.keras.layers.Conv2D(32, (2, 2), padding='same')(inputs)
hypot = tf.keras.layers.Lambda(lambda z: tf.math.sqrt(tf.math.square(z[0]) + tf.math.square(z[1])))([x, y])
hypot = tf.keras.layers.Lambda(lambda z: z / tf.reduce_max(z) * 255)(hypot)
atan2 = tf.keras.layers.Lambda(lambda z: tf.math.atan2(z[0], z[1]))([x, y])
model = tf.keras.Model(inputs, [hypot, atan2])
model.compile(optimizer='adam', loss='mse')
model.fit(tf.random.normal((64, 16, 16, 1)), [tf.random.normal((64, 16, 16, 32)), tf.random.normal((64, 16, 16, 32))])
Model: "model_1"
Layer (type) Output Shape Param # Connected to
input_3 (InputLayer) [(None, 16, 16, 1)] 0 []
conv2d_2 (Conv2D) (None, 16, 16, 32) 320 ['input_3[0][0]']
conv2d_3 (Conv2D) (None, 16, 16, 32) 160 ['input_3[0][0]']
lambda_2 (Lambda) (None, 16, 16, 32) 0 ['conv2d_2[0][0]',
lambda_3 (Lambda) (None, 16, 16, 32) 0 ['lambda_2[0][0]']
lambda_4 (Lambda) (None, 16, 16, 32) 0 ['conv2d_2[0][0]',
Total params: 480
Trainable params: 480
Non-trainable params: 0
2/2 [==============================] - 1s 71ms/step - loss: 3006.0469 - lambda_3_loss: 3001.7981 - lambda_4_loss: 4.2489
<keras.callbacks.History at 0x7ffa93dc2890>
Can anyone please help me to convert this model to PyTorch? I already tried to convert from Keras to PyTorch like this How can I convert this keras cnn model to pytorch version but training results were different. Thank you.
input_3d = (1, 64, 96, 96)
pool_3d = (2, 2, 2)
model = Sequential()
model.add(Convolution3D(8, 3, 3, 3, name='conv1', input_shape=input_3d,
model.add(MaxPooling3D(pool_size=pool_3d, name='pool1'))
model.add(Convolution3D(8, 3, 3, 3, name='conv2',data_format='channels_first'))
model.add(MaxPooling3D(pool_size=pool_3d, name='pool2'))
model.add(Convolution3D(8, 3, 3, 3, name='conv3',data_format='channels_first'))
model.add(MaxPooling3D(pool_size=pool_3d, name='pool3'))
model.add(Dense(2000, activation='relu', name='dense1'))
model.add(Dropout(0.5, name='dropout1'))
model.add(Dense(500, activation='relu', name='dense2'))
model.add(Dropout(0.5, name='dropout2'))
model.add(Dense(3, activation='softmax', name='softmax'))
Layer (type) Output Shape Param #
conv1 (Conv3D) (None, 8, 60, 94, 94) 224
pool1 (MaxPooling3D) (None, 8, 30, 47, 47) 0
conv2 (Conv3D) (None, 8, 28, 45, 45) 1736
pool2 (MaxPooling3D) (None, 8, 14, 22, 22) 0
conv3 (Conv3D) (None, 8, 12, 20, 20) 1736
pool3 (MaxPooling3D) (None, 8, 6, 10, 10) 0
flatten_1 (Flatten) (None, 4800) 0
dense1 (Dense) (None, 2000) 9602000
dropout1 (Dropout) (None, 2000) 0
dense2 (Dense) (None, 500) 1000500
dropout2 (Dropout) (None, 500) 0
softmax (Dense) (None, 3) 1503
Your PyTorch equivalent of the Keras model would look like this:
class CNN(nn.Module):
def __init__(self, ):
super(CNN, self).__init__()
self.maxpool = nn.MaxPool3d((2, 2, 2))
self.conv1 = nn.Conv3d(in_channels=1, out_channels=8, kernel_size=3)
self.conv2 = nn.Conv3d(in_channels=8, out_channels=8, kernel_size=3)
self.conv3 = nn.Conv3d(in_channels=8, out_channels=8, kernel_size=3)
self.linear1 = nn.Linear(4800, 2000)
self.dropout1 = nn.Dropout3d(0.5)
self.linear2 = nn.Linear(2000, 500)
self.dropout2 = nn.Dropout3d(0.5)
self.linear3 = nn.Linear(500, 3)
def forward(self, x):
out = self.maxpool(self.conv1(x))
out = self.maxpool(self.conv2(out))
out = self.maxpool(self.conv3(out))
# Flattening process
b, c, d, h, w = out.size() # batch_size, channels, depth, height, width
out = out.view(-1, c * d * h * w)
out = self.dropout1(self.linear1(out))
out = self.dropout2(self.linear2(out))
out = self.linear3(out)
out = torch.softmax(out, 1)
return out
A driver program to test the model:
inputs = torch.randn(8, 1, 64, 96, 96)
model = CNN()
outputs = model(inputs)
print(outputs.shape) # torch.Size([8, 3])
You can save keras weight and reload then in pytorch.
the steps are
Step 0: Train a Model in Keras. ...
Step 1: Recreate & Initialize Your Model Architecture in PyTorch. ...
Step 2: Import Your Keras Model and Copy the Weights. ...
Step 3: Load Those Weights onto Your PyTorch Model. ...
Step 4: Test and Save Your Pytorch Model.
You Can follow example here https://gereshes.com/2019/06/24/how-to-transfer-a-simple-keras-model-to-pytorch-the-hard-way/
When I train my model it has a two-dimension output - it is (none, 1) - corresponding to the time series I'm trying to predict. But whenever I load the saved model in order to make predictions, it has a three-dimensional output - (none, 40, 1) - where 40 corresponds to the n_steps required to fit the conv1D network. What is wrong?
Here is the code:
df = np.load('Principal.npy')
# Conv1D
#model = load_model('ModeloConv1D.h5')
model = autoencoder_conv1D((2, 20, 17), n_steps=40)
# summarize model.
# load dataset
df = df
# split into input (X) and output (Y) variables
X = f.separar_interface(df, n_steps=40)
# THE X INPUT SHAPE (59891, 17) length and attributes, respectively ##
# conv1D input format
X = X.reshape(X.shape[0], 2, 20, X.shape[2])
# Make predictions
test_predictions = model.predict(X)
## test_predictions.shape = (59891, 40, 1)
test_predictions = model.predict(X).flatten()
##test_predictions.shape = (2395640, 1)
In the plot below you can see that it is plotting the input format.
Here is the network architecture:
Layer (type) Output Shape Param #
time_distributed_70 (TimeDis (None, 1, 31, 24) 4104
time_distributed_71 (TimeDis (None, 1, 4, 24) 0
time_distributed_72 (TimeDis (None, 1, 4, 48) 9264
time_distributed_73 (TimeDis (None, 1, 1, 48) 0
time_distributed_74 (TimeDis (None, 1, 1, 64) 12352
time_distributed_75 (TimeDis (None, 1, 1, 64) 0
time_distributed_76 (TimeDis (None, 1, 64) 0
lstm_17 (LSTM) (None, 100) 66000
repeat_vector_9 (RepeatVecto (None, 40, 100) 0
lstm_18 (LSTM) (None, 40, 100) 80400
time_distributed_77 (TimeDis (None, 40, 1024) 103424
dropout_9 (Dropout) (None, 40, 1024) 0
dense_18 (Dense) (None, 40, 1) 1025
As I've found my mistake, and as I think it may be useful for someone else, I'll reply to my own question:
In fact, the network output has the same format as the training dataset labels. It means, the saved model is generating an output with shape (None, 40, 1) since it is exactly the same shape you (me) have given to the training output labels.
You (i.e. me) appreciate a difference between the network output while training and the network while predicting because you are most probably using a method such as train_test_split while training, which randomize the network output. Therefore, What you see at end of training is the production of this randomized batch.
In order to correct your problem (my problem), you should change the shape of the dataset labels from (None, 40, 1) to (None, 1), as you have a regression problem for a time series. For fixing that in your above network, you'd better set a flatten layer before the dense output layer. Therefore, I'll get the result your are looking for.
I'am trying to implement a binary segmentation model using unet in keras. Here are the upsampling blocks of my networks.
In one model i used normal transpose convolution, while in the second one i used resize bilinear+ conv2d.
# Convolution block with Transpose Convolution
def deconv_block(tensor, nfilters, size=3, padding='same', kernel_initializer = 'he_normal'):
y = Conv2DTranspose(filters=nfilters, kernel_size=size, strides=2, padding = padding, kernel_initializer = kernel_initializer)(tensor)
y = BatchNormalization()(y)
y = Dropout(0.5)(y)
y = Activation("relu")(y)
return y
# Convolution block with Upsampling+Conv2D
def deconv_block_rez(tensor, nfilters, size=3, padding='same', kernel_initializer = 'he_normal'):
y = UpSampling2D(size = (2,2),interpolation='bilinear')(tensor)
y = Conv2D(filters=nfilters, kernel_size=(size,size), padding = 'same', kernel_initializer = kernel_initializer)(y)
y = BatchNormalization()(y)
y = Dropout(0.5)(y)
y = Activation("relu")(y)
return y
Are they equivalent in terms of quality and execution time?
Quality-wise they were found to be almost same(i used 128*128 input); but in terms of execution time they were different.I read in a blog that upsampling+conv2d will not suffer from the problem of checkerboard artifacts; but does it happen at the cost of execution time?
Here is a typical block of my network....
up_sampling2d_2 (UpSampling2D) (None, 16, 16, 800) 0
__________________________________________________________________________________________________ batch_normalization_2 (Conv2D) (None, 16, 16, 256) 1843456
__________________________________________________________________________________________________ dropout_2 (Dropout) (None, 16, 16, 256) 0
__________________________________________________________________________________________________ activation_2 (Activation) (None, 16, 16, 256) 0
Transpose conv:-
conv2d_transpose_2 (Conv2DTrans (None, 16, 16, 256) 1843456
__________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 16, 16, 256) 1024
__________________________________________________________________________________________________ dropout_2 (Dropout) (None, 16, 16, 256) 0
__________________________________________________________________________________________________ activation_2 (Activation) (None, 16, 16, 256) 0
Even though both of them have almost same parameters; the block with upsample+conv2d has more execution time(resize-bilinear is taking negligible time) i.e conv vs. transpose-conv => 154ms vs 35ms).
Clearly the conv2d has stride 1 in the resize+normal-conv block and transpose-conv has stride 2; but both uses 3x3 kernels.And, recently people have begun to use the resize+conv blocks instead of normal transpose convolution.
Does this speed difference happen always or can we use any variation of the upsample+conv2d (say stride 2 or kernel 1x1 )so that execution time is same(also quality shouldn't be degraded) or is it any bug in my code.
I have a problem of applying masking layer to CNNs in RNN/LSTM model.
My data is not original image, but I converted into a shape of (16, 34, 4)(channels_first). The data is sequential, and the longest step length is 22. So for invariant way, I set the timestep as 22. Since it may be shorter than 22 steps, I fill others with np.zeros. However, for 0 padding data, it's about half among all dataset, so with 0 paddings, the training cannot reach a very good result with so much useless data. Then I want to add a mask to cancel these 0 padding data.
Here is my code.
mask = np.zeros((16,34,4), dtype = np.int8)
input_shape = (22, 16, 34, 4)
model = Sequential()
model.add(TimeDistributed(Masking(mask_value=mask), input_shape=input_shape, name = 'mask'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name = 'conv1'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn1'))
model.add(Dropout(0.5, name = 'drop1'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv2'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn2'))
model.add(Dropout(0.5, name = 'drop2'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv3'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn3'))
model.add(Dropout(0.5, name = 'drop3'))
model.add(TimeDistributed(Flatten(), name = 'flatten'))
model.add(GRU(256, activation='tanh', return_sequences=True, name = 'gru'))
model.add(Dropout(0.4, name = 'drop_gru'))
model.add(Dense(35, activation = 'softmax', name = 'softmax'))
Here's the model structure.
Layer (type) Output Shape Param #
mask (TimeDist (None, 22, 16, 34, 4) 0
conv1 (TimeDistributed) (None, 22, 100, 30, 3) 16100
bn1 (TimeDistributed) (None, 22, 100, 30, 3) 12
drop1 (Dropout) (None, 22, 100, 30, 3) 0
conv2 (TimeDistributed) (None, 22, 100, 26, 2) 100100
bn2 (TimeDistributed) (None, 22, 100, 26, 2) 8
drop2 (Dropout) (None, 22, 100, 26, 2) 0
conv3 (TimeDistributed) (None, 22, 100, 22, 1) 100100
bn3 (TimeDistributed) (None, 22, 100, 22, 1) 4
drop3 (Dropout) (None, 22, 100, 22, 1) 0
flatten (TimeDistributed) (None, 22, 2200) 0
gru (GRU) (None, 22, 256) 1886976
drop_gru (Dropout) (None, 22, 256) 0
softmax (Dense) (None, 22, 35) 8995
Total params: 2,112,295
Trainable params: 2,112,283
Non-trainable params: 12
For mask_value, I tried with either 0 or this mask structure, but neither works and it still trains through all the data with half 0 paddings in it.
Can anyone help me?
B.T.W., I used TimeDistributed here to connect RNN, and I know another one called ConvLSTM2D. Does anyone know the difference? ConvLSTM2D takes much more params for the model, and get training much slower than TimeDistributed...
Unfortunately masking is not yet supported by the Keras Conv layers. There have been several issues posted about this on the Keras Github page, here is the one with the most substantial conversation on the topic. It appears that there was some hang up implementation details and the issue was never resolved.
The workaround proposed in the discussion is to have an explicit embedding for the padding character in sequences and do global pooling. Here is another workaround I found (not helpful for my use case but maybe helpful to you) - keeping a mask array to merge through multiplication.
You can also check out the conversation around this question which is similar to yours.
I am using keras 1.1.1 in windows 7 with tensorflow backend.
I am trying to prepend the stock Resnet50 pretained model with an image downsampler. Below is my code.
from keras.applications.resnet50 import ResNet50
import keras.layers
# this could also be the output a different Keras model or layer
input = keras.layers.Input(shape=(400, 400, 1)) # this assumes K.image_dim_ordering() == 'tf'
x1 = keras.layers.AveragePooling2D(pool_size=(2,2))(input)
x2 = keras.layers.Flatten()(x1)
x3 = keras.layers.RepeatVector(3)(x2)
x4 = keras.layers.Reshape((200, 200, 3))(x3)
x5 = keras.layers.ZeroPadding2D(padding=(12,12))(x4)
m = keras.models.Model(input, x5)
model = ResNet50(input_tensor=m.output, weights='imagenet', include_top=False)
but I get an error which I am unsure how to fix.
builtins.Exception: Graph disconnected: cannot obtain value for tensor
Output("input_2:0", shape=(?, 400, 400, 1), dtype=float32) at layer
"input_2". The following previous layers were accessed without issue:
You can use both the Functional API and Sequential approaches to solve this. See working example for both approaches below:
from keras.applications.ResNet50 import ResNet50
from keras.models import Sequential, Model
from keras.layers import AveragePooling2D, Flatten, RepeatVector, Reshape, ZeroPadding2D, Input, Dense
pretrained = ResNet50(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
# Sequential method
model_1 = Sequential()
model_1.add(AveragePooling2D(pool_size=(2,2),input_shape=(400, 400, 1)))
model_1.add(Reshape((200, 200, 3)))
# functional API method
input = Input(shape=(400, 400, 1))
x = AveragePooling2D(pool_size=(2,2),input_shape=(400, 400, 1))(input)
x = Flatten()(x)
x = RepeatVector(3)(x)
x = Reshape((200, 200, 3))(x)
x = ZeroPadding2D(padding=(12,12))(x)
x = pretrained(x)
preds = Dense(1)(x)
model_2 = Model(input,preds)
The summaries (replace resnet for xception):
Layer (type) Output Shape Param #
average_pooling2d_1 (Average (None, 200, 200, 1) 0
flatten_1 (Flatten) (None, 40000) 0
repeat_vector_1 (RepeatVecto (None, 3, 40000) 0
reshape_1 (Reshape) (None, 200, 200, 3) 0
zero_padding2d_1 (ZeroPaddin (None, 224, 224, 3) 0
xception (Model) (None, 7, 7, 2048) 20861480
dense_1 (Dense) (None, 7, 7, 1) 2049
Total params: 20,863,529
Trainable params: 20,809,001
Non-trainable params: 54,528
Layer (type) Output Shape Param #
input_2 (InputLayer) (None, 400, 400, 1) 0
average_pooling2d_2 (Average (None, 200, 200, 1) 0
flatten_2 (Flatten) (None, 40000) 0
repeat_vector_2 (RepeatVecto (None, 3, 40000) 0
reshape_2 (Reshape) (None, 200, 200, 3) 0
zero_padding2d_2 (ZeroPaddin (None, 224, 224, 3) 0
xception (Model) (None, 7, 7, 2048) 20861480
dense_2 (Dense) (None, 7, 7, 1) 2049
Total params: 20,863,529
Trainable params: 20,809,001
Non-trainable params: 54,528
Both approaches work fine. If you plan on freezing the pretrained model and letting pre/post layers learn -- and afterward finetuning the model, the approach I found to work goes like so:
# given the same resnet model as before...
model = load_model('modelname.h5')
# pull out the nested model
nested_model = model.layers[5] # assuming the model is the 5th layer
# loop over the nested model to allow training
for l in nested_model.layers:
# insert the trainable pretrained model back into the original
model.layer[5] = nested_model