I'am trying to implement a binary segmentation model using unet in keras. Here are the upsampling blocks of my networks.
In one model i used normal transpose convolution, while in the second one i used resize bilinear+ conv2d.
# Convolution block with Transpose Convolution
def deconv_block(tensor, nfilters, size=3, padding='same', kernel_initializer = 'he_normal'):
y = Conv2DTranspose(filters=nfilters, kernel_size=size, strides=2, padding = padding, kernel_initializer = kernel_initializer)(tensor)
y = BatchNormalization()(y)
y = Dropout(0.5)(y)
y = Activation("relu")(y)
return y
# Convolution block with Upsampling+Conv2D
def deconv_block_rez(tensor, nfilters, size=3, padding='same', kernel_initializer = 'he_normal'):
y = UpSampling2D(size = (2,2),interpolation='bilinear')(tensor)
y = Conv2D(filters=nfilters, kernel_size=(size,size), padding = 'same', kernel_initializer = kernel_initializer)(y)
y = BatchNormalization()(y)
y = Dropout(0.5)(y)
y = Activation("relu")(y)
return y
Are they equivalent in terms of quality and execution time?
Quality-wise they were found to be almost same(i used 128*128 input); but in terms of execution time they were different.I read in a blog that upsampling+conv2d will not suffer from the problem of checkerboard artifacts; but does it happen at the cost of execution time?
Here is a typical block of my network....
Upsample:-
up_sampling2d_2 (UpSampling2D) (None, 16, 16, 800) 0
concatenate_1[0][0]
__________________________________________________________________________________________________ batch_normalization_2 (Conv2D) (None, 16, 16, 256) 1843456
up_sampling2d_2[0][0]
__________________________________________________________________________________________________ dropout_2 (Dropout) (None, 16, 16, 256) 0
batch_normalization_2[0][0]
__________________________________________________________________________________________________ activation_2 (Activation) (None, 16, 16, 256) 0
dropout_2[0][0]
Transpose conv:-
conv2d_transpose_2 (Conv2DTrans (None, 16, 16, 256) 1843456
concatenate_1[0][0]
__________________________________________________________________________________________________ batch_normalization_2 (BatchNor (None, 16, 16, 256) 1024
conv2d_transpose_2[0][0]
__________________________________________________________________________________________________ dropout_2 (Dropout) (None, 16, 16, 256) 0
batch_normalization_2[0][0]
__________________________________________________________________________________________________ activation_2 (Activation) (None, 16, 16, 256) 0
dropout_2[0][0]
Even though both of them have almost same parameters; the block with upsample+conv2d has more execution time(resize-bilinear is taking negligible time) i.e conv vs. transpose-conv => 154ms vs 35ms).
Clearly the conv2d has stride 1 in the resize+normal-conv block and transpose-conv has stride 2; but both uses 3x3 kernels.And, recently people have begun to use the resize+conv blocks instead of normal transpose convolution.
Does this speed difference happen always or can we use any variation of the upsample+conv2d (say stride 2 or kernel 1x1 )so that execution time is same(also quality shouldn't be degraded) or is it any bug in my code.
Related
I would like to implement operations on the results of two keras conv2d layers (Ix,Iy) in a deep learning architecture for a computer vision task. The operation looks as follows:
G = np.hypot(Ix, Iy)
G = G / G.max() * 255
theta = np.arctan2(Iy, Ix)
I've spent some time looking for operations provided by keras but did not have success so far. Among a few others, there's a "add" functionality that allows the user to add the results of two conv2d layers (tf.keras.layers.Add(Ix,Iy)). However, I would like to have a Pythagorean addition (first line) followed by a arctan2 operation (third line).
So ideally, if already implemented by keras it would look as follows:
tf.keras.layers.Hypot(Ix,Iy)
tf.keras.layers.Arctan2(Ix,Iy)
Does anyone know if it is possible to implement those functionalities within my deep learning architecture? Is it possible to write custom layers that meet my needs?
You could probably use simple Lambda layers for your use case, although they are not absolutely necessary:
import tensorflow as tf
inputs = tf.keras.layers.Input((16, 16, 1))
x = tf.keras.layers.Conv2D(32, (3, 3), padding='same')(inputs)
y = tf.keras.layers.Conv2D(32, (2, 2), padding='same')(inputs)
hypot = tf.keras.layers.Lambda(lambda z: tf.math.sqrt(tf.math.square(z[0]) + tf.math.square(z[1])))([x, y])
hypot = tf.keras.layers.Lambda(lambda z: z / tf.reduce_max(z) * 255)(hypot)
atan2 = tf.keras.layers.Lambda(lambda z: tf.math.atan2(z[0], z[1]))([x, y])
model = tf.keras.Model(inputs, [hypot, atan2])
print(model.summary())
model.compile(optimizer='adam', loss='mse')
model.fit(tf.random.normal((64, 16, 16, 1)), [tf.random.normal((64, 16, 16, 32)), tf.random.normal((64, 16, 16, 32))])
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) [(None, 16, 16, 1)] 0 []
conv2d_2 (Conv2D) (None, 16, 16, 32) 320 ['input_3[0][0]']
conv2d_3 (Conv2D) (None, 16, 16, 32) 160 ['input_3[0][0]']
lambda_2 (Lambda) (None, 16, 16, 32) 0 ['conv2d_2[0][0]',
'conv2d_3[0][0]']
lambda_3 (Lambda) (None, 16, 16, 32) 0 ['lambda_2[0][0]']
lambda_4 (Lambda) (None, 16, 16, 32) 0 ['conv2d_2[0][0]',
'conv2d_3[0][0]']
==================================================================================================
Total params: 480
Trainable params: 480
Non-trainable params: 0
__________________________________________________________________________________________________
None
2/2 [==============================] - 1s 71ms/step - loss: 3006.0469 - lambda_3_loss: 3001.7981 - lambda_4_loss: 4.2489
<keras.callbacks.History at 0x7ffa93dc2890>
When I train my model it has a two-dimension output - it is (none, 1) - corresponding to the time series I'm trying to predict. But whenever I load the saved model in order to make predictions, it has a three-dimensional output - (none, 40, 1) - where 40 corresponds to the n_steps required to fit the conv1D network. What is wrong?
Here is the code:
df = np.load('Principal.npy')
# Conv1D
#model = load_model('ModeloConv1D.h5')
model = autoencoder_conv1D((2, 20, 17), n_steps=40)
model.load_weights('weights_35067.hdf5')
# summarize model.
model.summary()
# load dataset
df = df
# split into input (X) and output (Y) variables
X = f.separar_interface(df, n_steps=40)
# THE X INPUT SHAPE (59891, 17) length and attributes, respectively ##
# conv1D input format
X = X.reshape(X.shape[0], 2, 20, X.shape[2])
# Make predictions
test_predictions = model.predict(X)
## test_predictions.shape = (59891, 40, 1)
test_predictions = model.predict(X).flatten()
##test_predictions.shape = (2395640, 1)
plt.figure(3)
plt.plot(test_predictions)
plt.legend('Prediction')
plt.show()
In the plot below you can see that it is plotting the input format.
Here is the network architecture:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_70 (TimeDis (None, 1, 31, 24) 4104
_________________________________________________________________
time_distributed_71 (TimeDis (None, 1, 4, 24) 0
_________________________________________________________________
time_distributed_72 (TimeDis (None, 1, 4, 48) 9264
_________________________________________________________________
time_distributed_73 (TimeDis (None, 1, 1, 48) 0
_________________________________________________________________
time_distributed_74 (TimeDis (None, 1, 1, 64) 12352
_________________________________________________________________
time_distributed_75 (TimeDis (None, 1, 1, 64) 0
_________________________________________________________________
time_distributed_76 (TimeDis (None, 1, 64) 0
_________________________________________________________________
lstm_17 (LSTM) (None, 100) 66000
_________________________________________________________________
repeat_vector_9 (RepeatVecto (None, 40, 100) 0
_________________________________________________________________
lstm_18 (LSTM) (None, 40, 100) 80400
_________________________________________________________________
time_distributed_77 (TimeDis (None, 40, 1024) 103424
_________________________________________________________________
dropout_9 (Dropout) (None, 40, 1024) 0
_________________________________________________________________
dense_18 (Dense) (None, 40, 1) 1025
=================================================================
As I've found my mistake, and as I think it may be useful for someone else, I'll reply to my own question:
In fact, the network output has the same format as the training dataset labels. It means, the saved model is generating an output with shape (None, 40, 1) since it is exactly the same shape you (me) have given to the training output labels.
You (i.e. me) appreciate a difference between the network output while training and the network while predicting because you are most probably using a method such as train_test_split while training, which randomize the network output. Therefore, What you see at end of training is the production of this randomized batch.
In order to correct your problem (my problem), you should change the shape of the dataset labels from (None, 40, 1) to (None, 1), as you have a regression problem for a time series. For fixing that in your above network, you'd better set a flatten layer before the dense output layer. Therefore, I'll get the result your are looking for.
For a image segmentation problem, I need to write a custom loss function. I am getting below said error.
Code Base: https://www.tensorflow.org/tutorials/images/segmentation
Last layer:
Conv2DTrans (128,128,2) [Note that in my case it is only 2 values]
def call(self, y_true, y_pred):
ytrue = ytrue.numpy()
.....
Error:
AttributeError: 'Tensor' object has no attribute 'numpy'
I tried py_function and numpy_function but both return the same error
and also
with tf.compat.v1.Session() as sess:
for i,j in enumerate(sess.run(y_true),sess.run(y_pred)):
Current Model Layers:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_82 (InputLayer) [(None, 128, 128, 3) 0
__________________________________________________________________________________________________
model_80 (Model) [(None, 64, 64, 96), 1841984 input_82[0][0]
__________________________________________________________________________________________________
sequential_160 (Sequential) (None, 8, 8, 512) 1476608 model_80[1][4]
__________________________________________________________________________________________________
concatenate_160 (Concatenate) (None, 8, 8, 1088) 0 sequential_160[0][0]
model_80[1][3]
__________________________________________________________________________________________________
sequential_161 (Sequential) (None, 16, 16, 256) 2507776 concatenate_160[0][0]
__________________________________________________________________________________________________
concatenate_161 (Concatenate) (None, 16, 16, 448) 0 sequential_161[0][0]
model_80[1][2]
__________________________________________________________________________________________________
sequential_162 (Sequential) (None, 32, 32, 128) 516608 concatenate_161[0][0]
__________________________________________________________________________________________________
concatenate_162 (Concatenate) (None, 32, 32, 272) 0 sequential_162[0][0]
model_80[1][1]
__________________________________________________________________________________________________
sequential_163 (Sequential) (None, 64, 64, 64) 156928 concatenate_162[0][0]
__________________________________________________________________________________________________
concatenate_163 (Concatenate) (None, 64, 64, 160) 0 sequential_163[0][0]
model_80[1][0]
__________________________________________________________________________________________________
conv2d_transpose_204 (Conv2DTra (None, 128, 128, 2) 2882 concatenate_163[0][0]
==================================================================================================
I need a numpy array to focus only more on 1's and not on zeroes. Now the metric and accuracy are overwhelmed by the presence of lot of zeros.
def tumor_loss(y_true,y_pred):
y_true = y_true.reshape((SHAPE,SHAPE))
y_pred = y_pred.reshape((SHAPE,SHAPE))
y_true_ind = np.where(y_true ==1)[1]
y_pred_ind = np.where(y_pred==1)[1]
if np.array_equal(y_true_ind,y_pred_ind):
return 0
if y_true_ind.shape[0] > y_pred_ind.shape[0]:
return y_true_ind.shape[0] - np.setdiff1d(y_true_ind,y_pred_ind).shape[0]
else:
return y_true_ind.shape[0] - np.setdiff1d(y_pred_ind,y_true_ind).shape[0]
If you are running on tf version >= 2.0, try using
model.compile(loss=custom_loss, optimizer='adam', run_eagerly=True)
if you are using Keras api.
I have a problem of applying masking layer to CNNs in RNN/LSTM model.
My data is not original image, but I converted into a shape of (16, 34, 4)(channels_first). The data is sequential, and the longest step length is 22. So for invariant way, I set the timestep as 22. Since it may be shorter than 22 steps, I fill others with np.zeros. However, for 0 padding data, it's about half among all dataset, so with 0 paddings, the training cannot reach a very good result with so much useless data. Then I want to add a mask to cancel these 0 padding data.
Here is my code.
mask = np.zeros((16,34,4), dtype = np.int8)
input_shape = (22, 16, 34, 4)
model = Sequential()
model.add(TimeDistributed(Masking(mask_value=mask), input_shape=input_shape, name = 'mask'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name = 'conv1'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn1'))
model.add(Dropout(0.5, name = 'drop1'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv2'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn2'))
model.add(Dropout(0.5, name = 'drop2'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv3'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn3'))
model.add(Dropout(0.5, name = 'drop3'))
model.add(TimeDistributed(Flatten(), name = 'flatten'))
model.add(GRU(256, activation='tanh', return_sequences=True, name = 'gru'))
model.add(Dropout(0.4, name = 'drop_gru'))
model.add(Dense(35, activation = 'softmax', name = 'softmax'))
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['acc'])
Here's the model structure.
model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mask (TimeDist (None, 22, 16, 34, 4) 0
_________________________________________________________________
conv1 (TimeDistributed) (None, 22, 100, 30, 3) 16100
_________________________________________________________________
bn1 (TimeDistributed) (None, 22, 100, 30, 3) 12
_________________________________________________________________
drop1 (Dropout) (None, 22, 100, 30, 3) 0
_________________________________________________________________
conv2 (TimeDistributed) (None, 22, 100, 26, 2) 100100
_________________________________________________________________
bn2 (TimeDistributed) (None, 22, 100, 26, 2) 8
_________________________________________________________________
drop2 (Dropout) (None, 22, 100, 26, 2) 0
_________________________________________________________________
conv3 (TimeDistributed) (None, 22, 100, 22, 1) 100100
_________________________________________________________________
bn3 (TimeDistributed) (None, 22, 100, 22, 1) 4
_________________________________________________________________
drop3 (Dropout) (None, 22, 100, 22, 1) 0
_________________________________________________________________
flatten (TimeDistributed) (None, 22, 2200) 0
_________________________________________________________________
gru (GRU) (None, 22, 256) 1886976
_________________________________________________________________
drop_gru (Dropout) (None, 22, 256) 0
_________________________________________________________________
softmax (Dense) (None, 22, 35) 8995
=================================================================
Total params: 2,112,295
Trainable params: 2,112,283
Non-trainable params: 12
_________________________________________________________________
For mask_value, I tried with either 0 or this mask structure, but neither works and it still trains through all the data with half 0 paddings in it.
Can anyone help me?
B.T.W., I used TimeDistributed here to connect RNN, and I know another one called ConvLSTM2D. Does anyone know the difference? ConvLSTM2D takes much more params for the model, and get training much slower than TimeDistributed...
Unfortunately masking is not yet supported by the Keras Conv layers. There have been several issues posted about this on the Keras Github page, here is the one with the most substantial conversation on the topic. It appears that there was some hang up implementation details and the issue was never resolved.
The workaround proposed in the discussion is to have an explicit embedding for the padding character in sequences and do global pooling. Here is another workaround I found (not helpful for my use case but maybe helpful to you) - keeping a mask array to merge through multiplication.
You can also check out the conversation around this question which is similar to yours.
I am following a Keras tutorial and want to shadow it in Pytorch, so am translating. I'm not strongly familiar with either and am coming unstuck on the input size parameter especially, but also the final layer - do I need another Linear layer? Can anyone translate the following to a Pytorch sequential definition?
visible = Input(shape=(64,64,1))
conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(16, kernel_size=4, activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
hidden1 = Dense(10, activation='relu')(pool2)
output = Dense(1, activation='sigmoid')(hidden1)
model = Model(inputs=visible, outputs=output)
This is the output of the model:
Layer (type) Output Shape Param #
_________________________________________________________________
input_1 (InputLayer) (None, 64, 64, 1) 0
conv2d_1 (Conv2D) (None, 61, 61, 32) 544
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 32) 0
conv2d_2 (Conv2D) (None, 27, 27, 16) 8208
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16) 0
dense_1 (Dense) (None, 13, 13, 10) 170
dense_2 (Dense) (None, 13, 13, 1) 11
Total params: 8,933
Trainable params: 8,933
Non-trainable params: 0
What I have worked out lacks a specification for the shape of the input, and I am also a bit perplexed at the translation of stride in the specified Keras model as it uses stride 2 in the MaxPooling2D but doesn't specify this elsewhere - it is perhaps a toy example.
model = nn.Sequential(
nn.Conv2d(1, 32, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(1, 16, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Linear(10, 1),
nn.Sigmoid(),
)