Undestanding `weights` size in Keras Layer - keras

This question is not code related, I am struggling to understand something about Keras layers.
The very first layers of my model have shape:
(126, 126) # this should be the embedding layer of my encoder module
(126, 126)
(126,)
(126, 126)
I was trying to load a deprecated checkpoint with weights but the error message states:
ValueError: Layer #0 (named "encoder") expects 49 weight(s)
Where does this 49 come from?

Related

Data augmentation in Keras model

I am trying to add data augmentation as a layer to a model but I am getting the following error.
TypeError: The added layer must be an instance of class Layer. Found: <tensorflow.python.keras.preprocessing.image.ImageDataGenerator object at 0x7f8c2dea0710>
data_augmentation = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=30, horizontal_flip=True)
model = Sequential()
model.add(data_augmentation)
model.add(Dense(1028,input_shape=(final_features.shape[1],)))
model.add(Dropout(0.7,input_shape=(final_features.shape[1],)))
model.add(Dense(n_classes, activation= 'softmax', kernel_regularizer='l2'))
model.compile(optimizer=adam,
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(final_features, y,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2,
callbacks=[lrr,EarlyStop])
I have also tried this way:
data_augmentation = Sequential(
[
preprocessing.RandomFlip("horizontal"),
preprocessing.RandomRotation(0.1),
preprocessing.RandomZoom(0.1),
]
)
model = Sequential()
model.add(data_augmentation)
model.add(Dense(1028,input_shape=(final_features.shape[1],)))
model.add(Dropout(0.7,input_shape=(final_features.shape[1],)))
model.add(Dense(n_classes, activation= 'softmax', kernel_regularizer='l2'))
model.compile(optimizer=adam,
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(final_features, y,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2,
callbacks=[lrr,EarlyStop])
It gives an error:
ValueError: Input 0 of layer sequential_7 is incompatible with the layer: expected ndim=4, found ndim=2. Full shape received: [128, 14272]
Could you please advice how I can use augmentation in Keras?
In your first case, you are using ImageDataGenerator as a layer, which is not: as the name says, it is just a generator which applies random transformations to images (image augmentation) before feeding the network. So, the images are augmented in CPU and then feed to the neural network which can run in GPU if you have one.
Generators are usually used also to avoid loading huge datasets into memory since they allow to load only the batches being used soon.
In the second case, you are using image augmentation as layers of your model properly. The difference here is that the augmentation is run as part of your model, so if you have a GPU available for instance, those operations will run in GPU.
The problem with your second case is in the model itself (in fact the model is also wrong in the first approach, you only get an error there with the bad usage of ImageDataGenerator before your execution arrives to the model).
Note that you are using images as inputs, so, the input should be of shape (height, width, channels), but then you are starting your model with a dense layer, which expects a single array of shape (n_features,).
If your model needs to start with a Dense layer (strange, but may be ok in some case) then you need first to use Flatten layer to convert images of shape (h,w,c) into vectors of shape (h*w*c,). This change will solve your second approach for sure.
That said, you don't need to specify the input shape on every single layer: doing it in your first layer should be enough.
Last, but not least: are you sure this model is being feed with images? According to your fit call, it looks like you are using previously extracted features that may be vectors (this make sense with your current model architecture but makes no sense with the usage of image augmentation).
Please, provide more details with respect to your data to clarify this point.

How are contents of hidden_states tuple in BertModel in the transformers library arranged

model = BertModel.from_pretrained('bert-base-uncased', config=BertConfig.from_pretrained('bert-base-uncased',output_hidden_states=True))
outputs = model(input_ids)
hidden_states = outputs[2]
hidden_states is a tuple of 13 torch.FloatTensors. Each tensor is of size: (batch_size, sequence_length, hidden_size).
According to the documentation, the 13 tensors are the hidden states of the embedding and the 12 encoder layers.
My question:
Is hidden_states[0] the embedding layer while hidden_states[12] is the 12th encoder layer or
Is hidden_states[0] the embedding layer while hidden_states[12] is the 1st encoder layer or
Is hidden_states[0] the 12th encoder layer while hidden_states[12] is the embedding layer or
Is hidden_states[0] the 1st encoder layer while hidden_states[12] is the embedding layer
I havent found this found clearly stated anywhere else.
Looking at the source-code for BertModel, it can be concluded that hidden_states[0] contains the outputs of the initial embedding layer, and the rest of the elements in tuples contain the hidden states in the increasing order of each layer. Simply put, hidden_states[1] contains the outputs of the first layer of BERT and hidden_states[12] contains the last i.e. 12th layer.

Visualizing convoluational layers in autoencoder

I have built a variational autoencoder using 2D convolutions (Conv2D) in the encoder and decoder. I'm using Keras. In total I have 2 layers with 32 and 64 filters each and a a kernel size of 4x4 and stride 2x2 each. My input images are (64, 80, 1). I'm using the MSE loss. Now, I would like to visualize the individual convolutional layers (i.e. what they learn) as done here.
So, first I load my model using load_weights() function and then I call visualize_layer(encoder, 'conv2d_1') from above mentioned code where conv2d_1 is the layer name of the first convolutional layer in my encoder.
When I do so I'm getting the error message
tensorflow.python.framework.errors_impl.UnimplementedError: Fused conv implementation does not support grouped convolutions for now.
[[{{node conv2d_1/BiasAdd}}]]
When I use the VGG16 model as in the example code it works. Does somebody know how I can adapt the code to work for my case?

Maxpooling Layer causes error in Keras

I have created a CNN in Keras with 12 Convolutional layers each followed by BatchNormalization, Activation and MaxPooling. A sample of the layer is:
model.add(Conv2D(256, (3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=2))
I start with 32 feature maps and end with 512. If I add MaxPooling after every Conv Layer like in the code above, I get an error in the final layer:
ValueError: Negative dimension size caused by subtracting 2 from 1 for 'max_pooling2d_11/MaxPool' (op: 'MaxPool') with input shapes: [?,1,1,512].
If I omit one MaxPooling in any layer the model compiles and starts training. I am using Tensorflow as backend and I have the right input shape of the image in the first layer.
Are there any suggestions why this may happening?
If your spatial dimensions are 256x256, then you cannot have more than 8 Max-Pooling layers in your network. As 2 ** 8 == 256, after downsampling by a factor of two, eight times, your feature maps will be 1x1 in the spatial dimensions, meaning you cannot perform max pooling as you would get a 0x0 or negative dimensions.
Its just an obvious limitation of Max Pooling but not always discussed in papers.
This can also be caused by having your input image in the wrong format
if you're using (3,X,Y) and it expects (X,Y,3) then the down sampling occurs on the colour channels and causes issues.

Stateful LSTM with Embedding Layer (shapes don't match)

I am trying to build a stateful LSTM with Keras and I don't understand how to add a embedding layer before the LSTM runs. The problem seems to be the stateful flag. If my net is not stateful adding the embedding layer is quite straight forward and works.
A working stateful LSTM without embedding layer looks at the moment like this:
model = Sequential()
model.add(LSTM(EMBEDDING_DIM,
batch_input_shape=(batchSize, longest_sequence, 1),
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
When adding the Embedding layer I move the batch_input_shape parameter into the Embedding layer i.e. only the first layer needs to known the shape?
Like this:
model = Sequential()
model.add(Embedding(vocabSize+1, EMBEDDING_DIM,batch_input_shape=(batchSize, longest_sequence, 1),))
model.add(LSTM(EMBEDDING_DIM,
return_sequences=True,
stateful=True))
model.add(TimeDistributed(Dense(maximal_value)))
model.add(Activation('softmax'))
model.compile(...)
The exception I get know is Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
So I am stuck here at the moment. What is the trick to combine word embeddings into a stateful LSTM?
The batch_input_shape parameter of the Embedding layer should be (batch_size, time_steps), where time_steps is the length of the unrolled LSTM / number of cells and batch_size is the number of examples in a batch.
model = Sequential()
model.add(Embedding(
input_dim=input_dim, # e.g, 10 if you have 10 words in your vocabulary
output_dim=embedding_size, # size of the embedded vectors
input_length=time_steps,
batch_input_shape=(batch_size,time_steps)
))
model.add(LSTM(
10,
batch_input_shape=(batch_size,time_steps,embedding_size),
return_sequences=False,
stateful=True)
)
There is an excellent blog post which explains stateful LSTMs in Keras. Also, I've uploaded a gist which contains a simple example of a stateful LSTM with Embedding layer.

Resources