How to visualize input and neurons in Keras embedding layer? - keras

I realized I'm having problems understanding/visualizing Embedding layer in a form of input, edges and nodes representing neurons and connections between them. Not entirely getting what is the number of inputs and neurons etc in the layer.
In this toy example used for sentiment analysis where my vocab size is 18 and max sentence length is 4
model = Sequential()
model.add(Embedding(input_dim=vocab_size,output_dim=embedding_size,input_length=max_seq_len, name='embedding_lay'))
model.add(Flatten())
model.add(Dense(1,activation='sigmoid'))
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_lay (Embedding) (None, 4, 4) 76
_________________________________________________________________
flatten_1 (Flatten) (None, 16) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 17
=================================================================
Total params: 93
Trainable params: 93
Non-trainable params: 0
I always thought that my embedding comes from 18 inputs (vocab size) times 4 neurons in 1st hidden layer plus 4 biases chance 18*4+4 = 76 and embeddings are the weights between input and 4 neurons (embedding). Is this correct ? If it what happens with flatten so I arrive in 'dense_1' having my 4x4 flatten to -> 16 + 1 chance 17 trainable parameters. To be clear from the matrix math perspective I more or less understand this but I just can map all this to very basic neuron connection (node edge) schema which is used for instance to visualize how Dense layer works.
Thanks in advance for any help.

Related

Conv1D for matching string finder

I am using Keras Conv1D model to build a multi-class classification ML model.
The objective is to classify strings with the sub-strings inside the string.
For example, the training data will be like this:
String Category
"roomba" "robot"
"camera" "cam"
"camcorder" "device"
"washer" "machine"
...
The expected output is that strings containing the substring in the training data are classified in the same way.
String Category
"Xroomba" "robot"
"camera-C" "cam"
"washerV" "machine"
...
I have used the following Keras model, however, the prediction output is very disappointing.
The model predicts correctly only for the strings with the exactly matching strings with the same size such as, "roomba", "camera", "washer". But the model fails to predict with extended strings such as "roombaA", "roombaAB", "cameraA", "Bcamera", and etc.
I though the string with matching kernel_size is the most important factor in the decision, therefore, I thought "roombaA" should be correctly predicted as the string contains substring with exact matching with size of 6.
I have tried various kernel_size and different number of layers, but the accuracy are very bad.
model2=Sequential()
model2.add(Conv1D(filters=256, kernel_size=6, activation='relu', input_shape=(128,1)))
model2.add(Flatten())
model2.add(Dense(n_output, activation='softmax'))
model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Model: "sequential_26"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_65 (Conv1D) (None, 123, 256) 1792
_________________________________________________________________
flatten_24 (Flatten) (None, 31488) 0
_________________________________________________________________
dense_24 (Dense) (None, 36) 1133604
=================================================================
Total params: 1,135,396
Trainable params: 1,135,396
Non-trainable params: 0

Keras LSTM Layer ValueError: Dimensions must be equal, but are 17 and 2

I'm working on a basic RNN model for a multiclass task and I'm facing some issues with output dimensions.
This is my input/output shapes:
input.shape = (50000, 2, 5) # (samples, features, feature_len)
output.shape = (50000, 17, 185) # (samples, features, feature_len) <-- one hot encoded
input[0].shape = (2, 5)
output[0].shape = (17, 185)
This is my model, using Keras functional API:
inp = tf.keras.Input(shape=(2, 5,))
x = tf.keras.layers.LSTM(128, input_shape=(2, 5,), return_sequences=True, activation='relu')(inp)
out = tf.keras.layers.Dense(185, activation='softmax')(x)
model = tf.keras.models.Model(inputs=inp, outputs=out)
This is my model.summary():
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 2, 5)] 0
_________________________________________________________________
lstm (LSTM) (None, 2, 128) 68608
_________________________________________________________________
dense (Dense) (None, 2, 185) 23865
=================================================================
Total params: 92,473
Trainable params: 92,473
Non-trainable params: 0
_________________________________________________________________
Then I compile the model and run fit():
model.compile(optimizer='adam',
loss=tf.nn.softmax_cross_entropy_with_logits,
metrics='accuracy')
model.fit(x=input, y=output, epochs=5)
And I'm getting a dimension error:
ValueError: Dimensions must be equal, but are 17 and 2 for '{{node Equal}} = Equal[T=DT_INT64, incompatible_shape_error=true](ArgMax, ArgMax_1)' with input shapes: [?,17], [?,2].
The error is clear, the model output a dimension 2 and my output has dimension 17, although I understand the issue, I can't find a way of fixing it, any ideas?
I think your output shape is not "output[0].shape = (17, 185)" but "dense (Dense) (None, 2, 185) ".
You need to change your output shape or change your layer structure.
LSTM output is a list of encoder_outputs, when you specify return_sequences=True. hence; I suggest just using the last item of encoder_outputs as the input of your Dense layer. you can see the example section of this link to the documentation. It may help you.

How to embed 3d input in keras?

I am trying to make an Embedding layer in Keras.
My input size is 3d: (batch, 8, 6), and I want to have an embedding for the last dimension.
So the embedding should work as (batch*8, 6) -> embedding output
But I don't want to keep this batchsize for all the learning step, just for the embedding layer.
I think one of the solution is seperating 8 inputs and applying the embedding to each input.
But then this embedding layer is not the same as one big embedding layer.
Is there any possible solution? Thanks!
The solution is very simple:
input_shape = (8,6)
And pass through embedding. You will get exactly what you want.
A complete working example:
from keras.layers import *
from keras.models import *
ins = Input((8,6))
out = Embedding(10, 15)(ins)
model = Model(ins, out)
model.summary()
Where 10 is the dictionary size (number of words or similars) and 15 is the embedding size (the resulting dimension).
Resulting summary:
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 8, 6) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 8, 6, 15) 150
=================================================================
Total params: 150
Trainable params: 150
Non-trainable params: 0
_________________________________________________________________

Understanding keras model.summary()

I am trying to understand the model.summary() in keras, I have the code as:
model = Sequential([
Dense(3,activation='relu',input_shape=(6,)),
Dense(3,activation='relu'),
Dense(1),
])
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['mae','mape','mse','cosine']
)
And when I print(model.summary()) I get output as
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_16 (Dense) (None, 3) 21
_________________________________________________________________
dense_17 (Dense) (None, 3) 12
_________________________________________________________________
dense_18 (Dense) (None, 1) 4
=================================================================
Total params: 37
Trainable params: 37
Non-trainable params: 0
_________________________________________________________________
None
I cannot understand the meaning of dense_16, dense_17 and dense_18 with respect to my described model input layers.
Those are just the names of the layer that were autogenerated by Keras. To name layers manually, pass a keyword argument name='my_custon_name' to each layer that you want to name. Note that layer names must be unique inside a model.
Layer names are useful for debugging and to get specific layers in code, for example using model.get_layer(layer_name).
These are just the names of your layers. If you do not explicitly specify the layer names, they will just be named and numbered automatically.

What is the difference between SeparableConv2D and Conv2D layers?

I didn't find a clearly answer to this question online (sorry if it exists).
I would like to understand the differences between the two functions (SeparableConv2D and Conv2D), step by step with, for example a input dataset of (3,3,3) (as RGB image).
Running this script based on Keras-Tensorflow :
import numpy as np
from keras.layers import Conv2D, SeparableConv2D
from keras.models import Model
from keras.layers import Input
red = np.array([1]*9).reshape((3,3))
green = np.array([100]*9).reshape((3,3))
blue = np.array([10000]*9).reshape((3,3))
img = np.stack([red, green, blue], axis=-1)
img = np.expand_dims(img, axis=0)
inputs = Input((3,3,3))
conv1 = SeparableConv2D(filters=1,
strides=1,
padding='valid',
activation='relu',
kernel_size=2,
depth_multiplier=1,
depthwise_initializer='ones',
pointwise_initializer='ones',
bias_initializer='zeros')(inputs)
conv2 = Conv2D(filters=1,
strides=1,
padding='valid',
activation='relu',
kernel_size=2,
kernel_initializer='ones',
bias_initializer='zeros')(inputs)
model1 = Model(inputs,conv1)
model2 = Model(inputs,conv2)
print("Model 1 prediction: ")
print(model1.predict(img))
print("Model 2 prediction: ")
print(model2.predict(img))
print("Model 1 summary: ")
model1.summary()
print("Model 2 summary: ")
model2.summary()
I have the following output :
Model 1 prediction:
[[[[40404.]
[40404.]]
[[40404.]
[40404.]]]]
Model 2 prediction:
[[[[40404.]
[40404.]]
[[40404.]
[40404.]]]]
Model 1 summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3, 3, 3) 0
_________________________________________________________________
separable_conv2d_1 (Separabl (None, 2, 2, 1) 16
=================================================================
Total params: 16
Trainable params: 16
Non-trainable params: 0
_________________________________________________________________
Model 2 summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 3, 3, 3) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 2, 2, 1) 13
=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
I understand how Keras compute the Conv2D prediction of model 2 thanks to this post, but can someone explains the SeperableConv2D computation of model 1 prediction please and its number of parameters (16) ?
As Keras uses Tensorflow, you can check in the Tensorflow's API the difference.
The conv2D is the traditional convolution. So, you have an image, with or without padding, and filter that slides through the image with a given stride.
On the other hand, the SeparableConv2D is a variation of the traditional convolution that was proposed to compute it faster.
It performs a depthwise spatial convolution followed by a pointwise convolution which mixes together the resulting output channels. MobileNet, for example, uses this operation to compute the convolutions faster.
I could explain both operations here, however, this post has a very good explanation using images and videos that I strongly recommend you to read.

Resources