Keras : How to merge layers sequentially not use "concatenate" - keras

I'm trying to build a model that combines cnn and lstm.
I want to multivariate the input of cnn and put the outputs sequentially into the input of the LSTM. However, there is a problem in merging the cnn outputs. If you use concatenate, it will stretch to axis = -1 as shown. But I'll put it in the lstm structure so I'd like to increase it sequentailly. But I didn't find any function to merge except concatenate. The shape I want is (None, 6, 1904) in the image below. What can I do?
below is my build code.
def build_model():
in_layers, out_layers = [], []
for i in range(in_len):
inputs = Input(shape=(row,col, channel))
conv1 = Conv2D(4, (12, 12), activation='relu')(inputs)
pool1 = pooling.MaxPooling2D(pool_size=(4,4))(conv1)
conv2 = Conv2D(4, (7, 7) , activation='relu')(pool1)
pool2 = pooling.MaxPooling2D(pool_size=(3,3))(conv2)
conv3 = Conv2D(8, (5, 5) , activation='relu')(pool2)
pool3 = pooling.MaxPooling2D(pool_size=(2,2))(conv3)
flat = Flatten()(pool3)
# store layers
in_layers.append(inputs)
out_layers.append(flat)
print(type(flat))
merged = concatenate(out_layers)
model = Model(inputs=in_layers, outputs=merged)
plot_model(model, show_shapes=True, to_file='cnn_lstm_real.png')
return model

What you want is still concatenation, but on a different, new axis. The concatenation layer and function allows to specify the axis, so you can do it like this:
def build_model():
in_layers, out_layers = [], []
for i in range(in_len):
inputs = Input(shape=(row,col, channel))
conv1 = Conv2D(4, (12, 12), activation='relu')(inputs)
pool1 = pooling.MaxPooling2D(pool_size=(4,4))(conv1)
conv2 = Conv2D(4, (7, 7) , activation='relu')(pool1)
pool2 = pooling.MaxPooling2D(pool_size=(3,3))(conv2)
conv3 = Conv2D(8, (5, 5) , activation='relu')(pool2)
pool3 = pooling.MaxPooling2D(pool_size=(2,2))(conv3)
flat = Flatten()(pool3)
flat = Reshape((1, -1))(flat)
# store layers
in_layers.append(inputs)
out_layers.append(flat)
merged = concatenate(out_layers, axis = 1)
model = Model(inputs=in_layers, outputs=merged)
plot_model(model, show_shapes=True, to_file='cnn_lstm_real.png')
return model
The only big difference is that you need to add the new axis explicitly in the output of each branch (hence the Reshape layer), in order to allow for concatenation to happen along that axis.

Related

How to add an auxiliary head to an intermediary layer of a pretrained keras model?

This is my first question on stack overflow. I'm going to try to give as much context as I can. Thank you for taking the time to read my question !
I'm currently using efficentnet for a classification problem. I want to add an auxiliary head on an intermediary layer. By auxiliary head, I mean an other set of layers who is going to produce a second output (2 final outputs).
Currently I managed to add an additional head at the end of the model with the following code :
inputs = tf.keras.Input(shape=(img_size, img_size, 3), name='input')
x = efn.EfficientNetB7(input_shape=(img_size, img_size, 3), include_top=False)(inputs)
classification_head = tf.keras.layers.GlobalAveragePooling2D()(x)
classification_head = tf.keras.layers.Dense(4, activation='softmax', name = 'classification')(classification_head)
aux_head = tf.keras.layers.Conv2D(128, kernel_size = 3, padding='same')(x)
aux_head = tf.keras.layers.BatchNormalization()(aux_head)
aux_head = tf.keras.layers.ReLU()(aux_head)
aux_head = tf.keras.layers.Conv2D(1, kernel_size=1, padding= 'valid', name = 'aux_head')(aux_head)
model = tf.keras.Model(inputs, [classification_head,aux_head])
I want to do a similar procedure but by adding the aux_head directly on an intermediary layer (here it is named block5a_expand_conv), what I've tried is:
inputs = tf.keras.Input(shape=(img_size, img_size, 3), name='input')
x = efn.EfficientNetB7(input_shape=(img_size, img_size, 3), include_top=False)(inputs)
classification_head = tf.keras.layers.GlobalAveragePooling2D()(x)
classification_head = tf.keras.layers.Dense(4, activation='softmax', name = 'classification')(classification_head)
intermediary_layer = x(
input_shape=(img_sisze, img_sisze, 3),
include_top=False).get_layer(name = 'block5a_expand_conv')
aux_head = tf.keras.layers.Conv2D(128, kernel_size = 3, padding='same')(intermediary_layer.output)
aux_head = tf.keras.layers.BatchNormalization()(aux_head)
aux_head = tf.keras.layers.ReLU()(aux_head)
aux_head = tf.keras.layers.Conv2D(1, kernel_size=1, padding= 'valid', name = 'aux_head')(aux_head)
model = tf.keras.Model(inputs, [classification_head,aux_head])
But this code produce an error named:
Graph disconnected
Does anyone have an idea on what could do the job here?
This error indicates, tensorflow could not make a graph (model) between your layers defined as the inputs and outputs. Somewhere in your model the path through layers disconnected. So, check the path between your inputs and outputs layers.
In your case, you have defined one input layer as inputs. You feed this input through an efficientnet, and some your own defined layers then get the output as classification_head. Still we have no problem. As the next path, you wanted to get a hidden layer output named block5a_expand_conv and feed it through some other layers and get another output as aux_head.
So, the question is, where is the input of this path? It finds no input for it because you defined intermediary as a layer, not as the output of the layer and it can not connect through intermediary layer to the input, because you didn't define it correctly. This is where the graph gets disconnected.
Here is the modified code:
en_model = efn.EfficientNetB7(input_shape=(img_size, img_size, 3), include_top=False)
classification_head = tf.keras.layers.GlobalAveragePooling2D()(en_model.output)
classification_head = tf.keras.layers.Dense(4, activation='softmax', name = 'classification')(classification_head)
intermediary_layer = en_model.get_layer('block5a_expand_conv').output
aux_head = tf.keras.layers.Conv2D(128, kernel_size = 3, padding='same')(intermediary_layer)
aux_head = tf.keras.layers.BatchNormalization()(aux_head)
aux_head = tf.keras.layers.ReLU()(aux_head)
aux_head = tf.keras.layers.Conv2D(1, kernel_size=1, padding= 'valid', name = 'aux_head')(aux_head)
model = tf.keras.Model(en_model.input, [classification_head,aux_head])

Stacking fully connected layers on top of two autoencoders for classification

I'm training autoencoders on 2D images using convolutional layers and would like to put fully connected layers on top of encoder part for classification. My autoencoder is defined as follows (just a simple one for illustration):
def encoder(input_img):
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
conv1 = BatchNormalization()(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1)
conv2 = BatchNormalization()(conv2)
return conv2
def decoder(conv2):
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv2)
conv3 = BatchNormalization()(conv3)
up1 = UpSampling2D((2,2))(conv3)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up1)
return decoded
autoencoder = Model(input_img, decoder(encoder(input_img)))
My input images are of size (64,80,1). Now when stacking fully connected layers on top of the encoder I'm doing the following:
def fc(enco):
flat = Flatten()(enco)
den = Dense(128, activation='relu')(flat)
out = Dense(num_classes, activation='softmax')(den)
return out
encode = encoder(input_img)
full_model = Model(input_img,fc(encode))
for l1,l2 in zip(full_model.layers[:19],autoencoder.layers[0:19]):
l1.set_weights(l2.get_weights())
For only one autoencoder this works but the problem now is that I have 2 autoencoders trained on sets of images all of size (64, 80, 1).
For every label I have as input two images of size (64, 80, 1) and one label (0 or 1). I need to feed image 1 into the first autoencoder and image 2 into the second autoencoder. But how can I combine both autoencoders in the full_model in above code?
Another problem is also the input to the fit() method. Until now with only one autoencoder the input consisted just of numpy arrays of images (e.g. (1000,64,80,1)) but with two autoencoders I would have two sets of images as input. How can I feed this into the fit() method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?
Q: How can I combine both autoencoders in full_model?
A: You could concatenate the bottleneck layers enco_1 and enco_2 of both autoencoders within fc:
def fc(enco_1, enco_2):
flat_1 = Flatten()(enco_1)
flat_2 = Flatten()(enco_2)
flat = Concatenate()([enco_1, enco_2])
den = Dense(128, activation='relu')(flat)
out = Dense(num_classes, activation='softmax')(den)
return out
encode_1 = encoder_1(input_img_1)
encode_2 = encoder_2(input_img_2)
full_model = Model([input_img_1, input_img_2], fc(encode_1, encode_2))
Note that the last part where you manually set the weights of the encoder is unnecessary - see https://keras.io/getting-started/functional-api-guide/#shared-layers
Q: How can I feed this into the fit method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?
A: In the code above, note that the two encoders are fed with different inputs (one for each image set). Now, provided that the model is defined in this way, you can call full_model.fit as follows:
full_model.fit(x=[images_set_1, images_set_2],
y=label,
...)
NOTE: Not tested.

Why are Pytorch and Keras implementations giving vastly different results?

I am trying to train a 1-D ConvNet for time series classification as shown in this paper (refer to FCN om Fig. 1b) https://arxiv.org/pdf/1611.06455.pdf
The Keras implementation is giving me vastly superior performance. Could someone explain why is that the case?
The code for Pytorch is as follow:
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv1d(x_train.shape[1], 128, 8)
self.bnorm1 = nn.BatchNorm1d(128)
self.conv2 = nn.Conv1d(128, 256, 5)
self.bnorm2 = nn.BatchNorm1d(256)
self.conv3 = nn.Conv1d(256, 128, 3)
self.bnorm3 = nn.BatchNorm1d(128)
self.dense = nn.Linear(128, nb_classes)
def forward(self, x):
c1=self.conv1(x)
b1 = F.relu(self.bnorm1(c1))
c2=self.conv2(b1)
b2 = F.relu(self.bnorm2(c2))
c3=self.conv3(b2)
b3 = F.relu(self.bnorm3(c3))
output = torch.mean(b3, 2)
dense1=self.dense(output)
return F.softmax(dense1)
model = Net()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.5, momentum=0.99)
losses=[]
for t in range(1000):
y_pred_1= model(x_train.float())
loss_1 = criterion(y_pred_1, y_train.long())
print(t, loss_1.item())
optimizer.zero_grad()
loss_1.backward()
optimizer.step()
For comparison, I use the following code for Keras:
x = keras.layers.Input(x_train.shape[1:])
conv1 = keras.layers.Conv1D(128, 8, padding='valid')(x)
conv1 = keras.layers.BatchNormalization()(conv1)
conv1 = keras.layers.Activation('relu')(conv1)
conv2 = keras.layers.Conv1D(256, 5, padding='valid')(conv1)
conv2 = keras.layers.BatchNormalization()(conv2)
conv2 = keras.layers.Activation('relu')(conv2)
conv3 = keras.layers.Conv1D(128, 3, padding='valid')(conv2)
conv3 = keras.layers.BatchNormalization()(conv3)
conv3 = keras.layers.Activation('relu')(conv3)
full = keras.layers.GlobalAveragePooling1D()(conv3)
out = keras.layers.Dense(nb_classes, activation='softmax')(full)
model = keras.models.Model(inputs=x, outputs=out)
optimizer = keras.optimizers.SGD(lr=0.5, decay=0.0, momentum=0.99)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
hist = model.fit(x_train, Y_train, batch_size=x_train.shape[0], nb_epoch=2000)
The only difference I see between the two is the initialization but however, the results are just vastly different. For reference, I use the same preprocessing as follows for both the datasets, with a subtle difference in input shapes, for Pytorch (Batch_Size, Channels, Length) and for Keras: (Batch_Size, Length, Channels).
The reason of different results is due to different default parameters of layers and optimizer. For example in pytorch decay-rate of batch-norm is considered as 0.9, whereas in keras it is 0.99. Like that, there may be other variation in default parameters.
If you use same parameters and fixed random seed for initialization, there won't be much difference in the result for both library.

Keras layer asks for different shape than in the summary

I'm writing a U-net CNN in keras, and trying to use fit_generator for training. In order for this to work, I used a generator script, that could feed the images and labels for my network (simple fit function is working but I want to train a big dataset which cannot fit into the memory).
My problem is that in the model summary, it says correctly that, the output layer has a shape: (None, 288, 512, 4)
https://i.imgur.com/69xG8pO.jpg
but when I try actual training I get this error:
https://i.imgur.com/j7H6sHX.jpg
I don't get why keras wants (288, 512, 1) when in the summary it expects (288, 512, 4)
I tried it with my own unet code, and copied a working code from github also, but both of them has the exact same problem which leads me to believe that my generator script is the weak link. Below is the code I used (the image and label array functions used here were already working when I used them with "fit" in a previous CNN):
def generator(img_path, label_path, batch_size, height, width, num_classes):
input_pairs = get_pairs(img_path, label_path) # rewrite if param name changes
random.shuffle(input_pairs)
iterate_pairs = itertools.cycle(input_pairs)
while True:
X = []
Y = []
for _ in range(batch_size):
im, lab = next(iterate_pairs)
appended_im = next(iter(im))
appended_lab = next(iter(lab))
X.append(input_image_array(appended_im, width, height))
Y.append(input_label_array(appended_lab, width, height, num_classes, palette))
yield (np.array(X), np.array(Y))
I tried the generator out and the provided batches has the shapes of (for batch size of 15):
(15, 288, 512, 3)
(15, 288, 512, 4)
So I really do not know what could be the problem here.
EDIT: Here is the model code I used:
def conv_block(input_tensor, n_filter, kernel=(3, 3), padding='same', initializer="he_normal"):
x = Conv2D(n_filter, kernel, padding=padding, kernel_initializer=initializer)(input_tensor)
x = BatchNormalization()(x)
x = Activation("relu")(x)
x = Conv2D(n_filter, kernel, padding=padding, kernel_initializer=initializer)(x)
x = BatchNormalization()(x)
x = Activation("relu")(x)
return x
def deconv_block(input_tensor, residual, n_filter, kernel=(3, 3), strides=(2, 2), padding='same'):
y = Conv2DTranspose(n_filter, kernel, strides, padding)(input_tensor)
y = concatenate([y, residual], axis=3)
y = conv_block(y, n_filter)
return y
# NETWORK - n_classes is the desired number of classes, filters are fixed
def Unet(input_height, input_width, n_classes=4, filters=64):
# Downsampling
input_layer = Input(shape=(input_height, input_width, 3), name='input')
conv_1 = conv_block(input_layer, filters)
conv_1_out = MaxPooling2D(pool_size=(2, 2))(conv_1)
conv_2 = conv_block(conv_1_out, filters*2)
conv_2_out = MaxPooling2D(pool_size=(2, 2))(conv_2)
conv_3 = conv_block(conv_2_out, filters*4)
conv_3_out = MaxPooling2D(pool_size=(2, 2))(conv_3)
conv_4 = conv_block(conv_3_out, filters*8)
conv_4_out = MaxPooling2D(pool_size=(2, 2))(conv_4)
conv_4_drop = Dropout(0.5)(conv_4_out)
conv_5 = conv_block(conv_4_drop, filters*16)
conv_5_drop = Dropout(0.5)(conv_5)
# Upsampling
deconv_1 = deconv_block(conv_5_drop, conv_4, filters*8)
deconv_1_drop = Dropout(0.5)(deconv_1)
deconv_2 = deconv_block(deconv_1_drop, conv_3, filters*4)
deconv_2_drop = Dropout(0.5)(deconv_2)
deconv_3 = deconv_block(deconv_2_drop, conv_2, filters*2)
deconv_3 = deconv_block(deconv_3, conv_1, filters)
# Output - mapping each 64-component feature vector to number of classes
output = Conv2D(n_classes, (1, 1))(deconv_3)
output = BatchNormalization()(output)
output = Activation("softmax")(output)
# embed into functional API
model = Model(inputs=input_layer, outputs=output, name="Unet")
return model
Change your loss to categorical_crossentropy.
When using the sparse_categorical_crossentropy loss, your targets
should be integer targets.

Batch Normalization doesn't have gradient in tensorflow 2.0?

I am trying to make a simple GANs to generate digits from the MNIST dataset. However when I get to training(which is custom) I get this annoying warning that I suspect is the cause of not training like I'm used to.
Keep in mind this is all in tensorflow 2.0 using it's default eager execution.
GET THE DATA(not that important)
(train_images,train_labels),(test_images,test_labels) = tf.keras.datasets.mnist.load_data()
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
train_images = (train_images - 127.5) / 127.5 # Normalize the images to [-1, 1]
BUFFER_SIZE = 60000
BATCH_SIZE = 256
train_dataset = tf.data.Dataset.from_tensor_slices((train_images,train_labels)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
GENERATOR MODEL(This is where the Batch Normalization is at)
def make_generator_model():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Reshape((7, 7, 256)))
assert model.output_shape == (None, 7, 7, 256) # Note: None is the batch size
model.add(tf.keras.layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 7, 7, 128)
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, 14, 14, 64)
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
assert model.output_shape == (None, 28, 28, 1)
return model
DISCRIMINATOR MODEL (likely not that important)
def make_discriminator_model():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same'))
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'))
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1))
return model
INSTANTIATE THE MODELS(likely not that important)
generator = make_generator_model()
discriminator = make_discriminator_model()
DEFINE THE LOSSES(maybe the generator loss is important since that is where the gradient comes from)
def generator_loss(generated_output):
return tf.nn.sigmoid_cross_entropy_with_logits(labels = tf.ones_like(generated_output), logits = generated_output)
def discriminator_loss(real_output, generated_output):
# [1,1,...,1] with real output since it is true and we want our generated examples to look like it
real_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.ones_like(real_output), logits=real_output)
# [0,0,...,0] with generated images since they are fake
generated_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.zeros_like(generated_output), logits=generated_output)
total_loss = real_loss + generated_loss
return total_loss
MAKE THE OPTIMIZERS(likely not important)
generator_optimizer = tf.optimizers.Adam(1e-4)
discriminator_optimizer = tf.optimizers.Adam(1e-4)
RANDOM NOISE FOR THE GENERATOR(likely not important)
EPOCHS = 50
noise_dim = 100
num_examples_to_generate = 16
# We'll re-use this random vector used to seed the generator so
# it will be easier to see the improvement over time.
random_vector_for_generation = tf.random.normal([num_examples_to_generate,
noise_dim])
A SINGLE TRAIN STEP(This is where I get the error
def train_step(images):
# generating noise from a normal distribution
noise = tf.random.normal([BATCH_SIZE, noise_dim])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)
real_output = discriminator(images[0], training=True)
generated_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(generated_output)
disc_loss = discriminator_loss(real_output, generated_output)
This line >>>>>
gradients_of_generator = gen_tape.gradient(gen_loss, generator.variables)
<<<<< This line
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.variables)
generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.variables))
THE FULL TRAIN(not important except that it calls train_step)
def train(dataset, epochs):
for epoch in range(epochs):
start = time.time()
for images in dataset:
train_step(images)
display.clear_output(wait=True)
generate_and_save_images(generator,
epoch + 1,
random_vector_for_generation)
# saving (checkpoint) the model every 15 epochs
if (epoch + 1) % 15 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print ('Time taken for epoch {} is {} sec'.format(epoch + 1,
time.time()-start))
# generating after the final epoch
display.clear_output(wait=True)
generate_and_save_images(generator,
epochs,
random_vector_for_generation)
BEGIN TRAINING
train(train_dataset, EPOCHS)
The error I get is as follows,
W0330 19:42:57.366302 4738405824 optimizer_v2.py:928] Gradients does
not exist for variables ['batch_normalization_v2_54/moving_mean:0',
'batch_normalization_v2_54/moving_variance:0',
'batch_normalization_v2_55/moving_mean:0',
'batch_normalization_v2_55/moving_variance:0',
'batch_normalization_v2_56/moving_mean:0',
'batch_normalization_v2_56/moving_variance:0'] when minimizing the
loss.
And I get an image from the generator which looks like this:
which is kinda what I would expect without the normalization. Everything would clump to one corner because there are extreme values.
The problem is here:
gradients_of_generator = gen_tape.gradient(gen_loss, generator.variables)
You should only be getting gradients for the trainable variables. So you should change it to
gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
The same goes for the three lines following. The variables field includes stuff like the running averages batch norm uses during inference. Because they are not used during training, there are no sensible gradients defined and trying to compute them will lead to a crash.

Resources