What is the difference between SeparableConv2D and Conv2D layers? - keras

I didn't find a clearly answer to this question online (sorry if it exists).
I would like to understand the differences between the two functions (SeparableConv2D and Conv2D), step by step with, for example a input dataset of (3,3,3) (as RGB image).
Running this script based on Keras-Tensorflow :
import numpy as np
from keras.layers import Conv2D, SeparableConv2D
from keras.models import Model
from keras.layers import Input
red = np.array([1]*9).reshape((3,3))
green = np.array([100]*9).reshape((3,3))
blue = np.array([10000]*9).reshape((3,3))
img = np.stack([red, green, blue], axis=-1)
img = np.expand_dims(img, axis=0)
inputs = Input((3,3,3))
conv1 = SeparableConv2D(filters=1,
strides=1,
padding='valid',
activation='relu',
kernel_size=2,
depth_multiplier=1,
depthwise_initializer='ones',
pointwise_initializer='ones',
bias_initializer='zeros')(inputs)
conv2 = Conv2D(filters=1,
strides=1,
padding='valid',
activation='relu',
kernel_size=2,
kernel_initializer='ones',
bias_initializer='zeros')(inputs)
model1 = Model(inputs,conv1)
model2 = Model(inputs,conv2)
print("Model 1 prediction: ")
print(model1.predict(img))
print("Model 2 prediction: ")
print(model2.predict(img))
print("Model 1 summary: ")
model1.summary()
print("Model 2 summary: ")
model2.summary()
I have the following output :
Model 1 prediction:
[[[[40404.]
[40404.]]
[[40404.]
[40404.]]]]
Model 2 prediction:
[[[[40404.]
[40404.]]
[[40404.]
[40404.]]]]
Model 1 summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3, 3, 3) 0
_________________________________________________________________
separable_conv2d_1 (Separabl (None, 2, 2, 1) 16
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________
Model 2 summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3, 3, 3) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 2, 2, 1) 13
=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
I understand how Keras compute the Conv2D prediction of model 2 thanks to this post, but can someone explains the SeperableConv2D computation of model 1 prediction please and its number of parameters (16) ?

As Keras uses Tensorflow, you can check in the Tensorflow's API the difference.
The conv2D is the traditional convolution. So, you have an image, with or without padding, and filter that slides through the image with a given stride.
On the other hand, the SeparableConv2D is a variation of the traditional convolution that was proposed to compute it faster.
It performs a depthwise spatial convolution followed by a pointwise convolution which mixes together the resulting output channels. MobileNet, for example, uses this operation to compute the convolutions faster.
I could explain both operations here, however, this post has a very good explanation using images and videos that I strongly recommend you to read.

Related

Keras LSTM Layer ValueError: Dimensions must be equal, but are 17 and 2

I'm working on a basic RNN model for a multiclass task and I'm facing some issues with output dimensions.
This is my input/output shapes:
input.shape = (50000, 2, 5) # (samples, features, feature_len)
output.shape = (50000, 17, 185) # (samples, features, feature_len) <-- one hot encoded
input[0].shape = (2, 5)
output[0].shape = (17, 185)
This is my model, using Keras functional API:
inp = tf.keras.Input(shape=(2, 5,))
x = tf.keras.layers.LSTM(128, input_shape=(2, 5,), return_sequences=True, activation='relu')(inp)
out = tf.keras.layers.Dense(185, activation='softmax')(x)
model = tf.keras.models.Model(inputs=inp, outputs=out)
This is my model.summary():
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 2, 5)] 0
_________________________________________________________________
lstm (LSTM) (None, 2, 128) 68608
_________________________________________________________________
dense (Dense) (None, 2, 185) 23865
=================================================================
Total params: 92,473
Trainable params: 92,473
Non-trainable params: 0
_________________________________________________________________
Then I compile the model and run fit():
model.compile(optimizer='adam',
loss=tf.nn.softmax_cross_entropy_with_logits,
metrics='accuracy')
model.fit(x=input, y=output, epochs=5)
And I'm getting a dimension error:
ValueError: Dimensions must be equal, but are 17 and 2 for '{{node Equal}} = Equal[T=DT_INT64, incompatible_shape_error=true](ArgMax, ArgMax_1)' with input shapes: [?,17], [?,2].
The error is clear, the model output a dimension 2 and my output has dimension 17, although I understand the issue, I can't find a way of fixing it, any ideas?
I think your output shape is not "output[0].shape = (17, 185)" but "dense (Dense) (None, 2, 185) ".
You need to change your output shape or change your layer structure.
LSTM output is a list of encoder_outputs, when you specify return_sequences=True. hence; I suggest just using the last item of encoder_outputs as the input of your Dense layer. you can see the example section of this link to the documentation. It may help you.

How to see keras.engine.sequential.Sequential

I am new to Keras and deep learning and was working with MNIST on Keras. When I created a model using
model = models.Sequential()
model.add(layers.Dense(512,activation = 'relu',input_shape=(28*28,)))
model.add(layers.Dense(32,activation ='relu'))
model.add(layers.Dense(10,activation='softmax'))
and then I printed it
print(model)
output is
<keras.engine.sequential.Sequential at 0x7f3d554f6710>
My question is that is there any way to see a better result of Keras, meaning if i print model i can see that i have 3 hidden layers with first hidden layer having 512 hidden units and 784 input units, 2nd hidden layer having 512 input units and 32 hidden units and so on.
You can also try plot_model()
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(512,activation = 'relu',input_shape=(28*28,)))
model.add(tf.keras.layers.Dense(32,activation ='relu'))
model.add(tf.keras.layers.Dense(10,activation='softmax'))
model.summary()
from keras.utils.vis_utils import plot_model
plot_model(model, show_shapes=True, show_layer_names=True)
model.summary() will print he entire model for you.
model = Sequential()
model.add(Dense(512,activation = 'relu',input_shape=(28*28,)))
model.add(Dense(32,activation ='relu'))
model.add(Dense(10,activation='softmax'))
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 512) 401920
_________________________________________________________________
dense_1 (Dense) (None, 32) 16416
_________________________________________________________________
dense_2 (Dense) (None, 10) 330
=================================================================
Total params: 418,666
Trainable params: 418,666
Non-trainable params: 0
____________________________

Invalid argument: indices[0,0] = -4 is not in [0, 40405)

I have a model that was kinda working on some data. I've added in some tokenized word data in the dataset (somewhat truncated for brevity):
vocab_size = len(tokenizer.word_index) + 1
comment_texts = df.comment_text.values
tokenizer = Tokenizer(num_words=num_words)
tokenizer.fit_on_texts(comment_texts)
comment_seq = tokenizer.texts_to_sequences(comment_texts)
maxtrainlen = max_length(comment_seq)
comment_train = pad_sequences(comment_seq, maxlen=maxtrainlen, padding='post')
vocab_size = len(tokenizer.word_index) + 1
df.comment_text = comment_train
x = df.drop('label', 1) # the thing I'm training
labels = df['label'].values # Also known as Y
x_train, x_test, y_train, y_test = train_test_split(
x, labels, test_size=0.2, random_state=1337)
n_cols = x_train.shape[1]
embedding_dim = 100 # TODO: why?
model = Sequential([
Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_shape=(n_cols,)),
LSTM(32),
Dense(32, activation='relu'),
Dense(512, activation='relu'),
Dense(12, activation='softmax'), # for an unknown type, we don't account for that while training
])
model.summary()
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['acc'])
# convert the y_train to a one hot encoded variable
encoder = LabelEncoder()
encoder.fit(labels) # fit on all the labels
encoded_Y = encoder.transform(y_train) # encode on y_train
one_hot_y = np_utils.to_categorical(encoded_Y)
model.fit(x_train, one_hot_y, epochs=10, batch_size=16)
Now, I get this error:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 12, 100) 4040500
_________________________________________________________________
lstm (LSTM) (None, 32) 17024
_________________________________________________________________
dense (Dense) (None, 32) 1056
_________________________________________________________________
dense_1 (Dense) (None, 512) 16896
_________________________________________________________________
dense_2 (Dense) (None, 12) 6156
=================================================================
Total params: 4,081,632
Trainable params: 4,081,632
Non-trainable params: 0
_________________________________________________________________
Train on 4702 samples
Epoch 1/10
2020-03-04 22:37:59.499238: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: indices[0,0] = -4 is not in [0, 40405)
I think this must be coming from my comment_text column since that is the only thing I added.
Here is what comment_text looks like before I make the substitution:
And here is after:
My full code (before I made the change) is here:
https://colab.research.google.com/drive/1y8Lhxa_DROZg0at3VR98fi5WCcunUhyc#scrollTo=hpEoqR4ne9TO
You should be training with comment_train, not with x which is taking whatever is in the unknown df.
The embedding_dim=100 is free to choose. It's like the number of units in a hidden layer. You can tune this parameter to find which is best for your model as well as you can tune the number of units in hidden layers.
In your case, you will need a model with two or more inputs:
One input for the comments, passing through the embedding and processing text
Another input for the rest of the data, passing probably through a standard netork.
At some point you will concatenate these two branches and keep on going.
This link has a good tutorial about the functional API models and shows a model that has two text inputs and an extra input: https://www.tensorflow.org/guide/keras/functional

How to embed 3d input in keras?

I am trying to make an Embedding layer in Keras.
My input size is 3d: (batch, 8, 6), and I want to have an embedding for the last dimension.
So the embedding should work as (batch*8, 6) -> embedding output
But I don't want to keep this batchsize for all the learning step, just for the embedding layer.
I think one of the solution is seperating 8 inputs and applying the embedding to each input.
But then this embedding layer is not the same as one big embedding layer.
Is there any possible solution? Thanks!
The solution is very simple:
input_shape = (8,6)
And pass through embedding. You will get exactly what you want.
A complete working example:
from keras.layers import *
from keras.models import *
ins = Input((8,6))
out = Embedding(10, 15)(ins)
model = Model(ins, out)
model.summary()
Where 10 is the dictionary size (number of words or similars) and 15 is the embedding size (the resulting dimension).
Resulting summary:
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 8, 6) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 8, 6, 15) 150
=================================================================
Total params: 150
Trainable params: 150
Non-trainable params: 0
_________________________________________________________________

How to model Convolutional recurrent network ( CRNN ) in Keras

I was trying to port CRNN model to Keras.
But, I got stuck while connecting output of Conv2D layer to LSTM layer.
Output from CNN layer will have a shape of ( batch_size, 512, 1, width_dash) where first one depends on batch_size, and last one depends on input width of input ( this model can accept variable width input )
For eg: an input with shape [2, 1, 32, 829] was resulting output with shape of (2, 512, 1, 208)
Now, as per Pytorch model, we have to do squeeze(2) followed by permute(2, 0, 1)
it will result a tensor with shape [208, 2, 512 ]
I was trying to implement this is Keras, but I was not able to do that because, in Keras we can not alter batch_size dimension in a keras.models.Sequential model
Can someone please guide me how to port above part of this model to Keras?
Current state of ported CNN layer
You don't need to permute the batch axis in Keras. In a pytorch model you need to do it because a pytorch LSTM expects an input shape (seq_len, batch, input_size). However in Keras, the LSTM layer expects (batch, seq_len, input_size).
So after defining the CNN and squeezing out axis 2, you just need to permute the last two axes. As a simple example (in 'channels_first' Keras image format),
model = Sequential()
model.add(Conv2D(512, 3, strides=(32, 4), padding='same', input_shape=(1, 32, None)))
model.add(Reshape((512, -1)))
model.add(Permute((2, 1)))
model.add(LSTM(32))
You can verify the shapes with model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_4 (Conv2D) (None, 512, 1, None) 5120
_________________________________________________________________
reshape_3 (Reshape) (None, 512, None) 0
_________________________________________________________________
permute_4 (Permute) (None, None, 512) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 32) 69760
=================================================================
Total params: 74,880
Trainable params: 74,880
Non-trainable params: 0
_________________________________________________________________

Resources