Pad vectors in tf.keras for LSTM - keras

Keras has a preprocessing util to pad sequences, but it assumes that the sequences are integer numbers.
My sequences are vectors (my own embeddings, I do not want to use Keras embeddings), is there any way in which I can pad them to use in a LSTM?
Sequences can be made equal in Python, but the padding methods in Keras provide additional metainformation for layers like LSTM to consider for masking.

this is a possibility to pad an array of float of different length with zeros
to mask the zeros you can use the masking layer (otherwise remove it)
I initialize your embeddings in a list because numpy can't handle array of different lenght. in the example, I use 4 samples of different lengths. the relative embeddings are stored in this list list([1,300],[2,300],[3,300],[4,300])
# recreate your embed
emb = []
for i in range(1,5):
emb.append(np.random.uniform(0,1, (i,300)))
# custom padding function
def pad(x, max_len):
new_x = np.zeros((max_len,x.shape[-1]))
new_x[:len(x),:] = x # post padding
return new_x
# pad own embeddings
emb = np.stack(list(map(lambda x: pad(x, max_len=100), emb)))
emb_model = tf.keras.Sequential()
emb_model.add(tf.keras.layers.Masking(mask_value=0., input_shape=(100, 300)))
emb_model.add(tf.keras.layers.LSTM(32))
emb_model(emb)

Related

numpy array for sequential network: varying sequence length

I have a recurrent network (RNN) whose task is to learn to classify vectors (float32) in two classes. My model is really simple so far:
model = Sequential([
SimpleRNN(units=10, input_shape=(None, len_vector)),
Dense(1, activation="relu")
])
model.compile(loss='mse', optimizer='Adam', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=30)
To train this network, I create a dataset with 1000 instances of sequences of vectors. When I create sequences with the same length each, the training works perfectly and the dataset has shape:
[<number of sequences>, <number of vectors in each sequence>, <number of floats in each vector>]
The problem is that my model must be able to work on sequence with various length. I don't know how (or even if it is possible) to create a numpy array where one dimension is not constant.
While searching a solution, I saw that setting the array dtype=object made it possible to assign list of different shapes to element of a numpy array, but the keras model will only accept dtype="float32".
Is there a way I can make this numpy array dataset? Or should I change the algorithm to train the model? Or is the only solution to pad sequences with nul vectors to unify their length?
(Thanks for the help. I'm fairly new to deep learning so I apologize if I'm asking for something obvious.)
Use Ragged Tensors, they provide you to make variable length inputs,
import numpy as np
_input = tf.keras.layers.Input(shape=(None, 100))
lstm = tf.keras.layers.LSTM(20,)(_input)
func = tf.keras.backend.function(inputs=_input, outputs=lstm)
rt = tf.ragged.constant([np.random.randn(1,34,100),
np.random.randn(1,55,100) ,
np.random.randn(1,60,100) ,
np.random.randn(1,70,100)])
func(rt[1])

Keras SimpleRNN - Shape MFCC vectors

I'm currently trying to implement a Recurrent Neural Network in Keras. The data consists of a collection of 45.000 whereby each entry is a collection (of variable length) of MFCC vectors with each 13 coefficients:
spoken = numpy.load('spoken.npy')
print(spoken[0]) # Gives:
example_row = [
[
5.67170000e-01 -1.79430000e-01 -7.27360000e+00 -9.59300000e-02
-9.30140000e-02 -1.62960000e-01 4.11620000e-01 3.00590000e-01
6.86360000e-02 1.07130000e+00 1.07090000e-01 5.00890000e-01
7.51750000e-01],
[.....]
]
print(spoken.shape) # Gives: (45000,0)
print(spoken[0].shape) # Gives (N, 13) --> N amount of MFCC vectors
I'm struggling to understand how I need to reshape this Numpy array in order to feed it to the SimpleRNN of Keras:
model = Sequential()
model_spoken.add(SimpleRNN(units=10, activation='relu', input_shape=?))
.....
Therefore, my question is how do I need to reshape a collection of variable length MFCC vectors so that I can feed it to the SimpleRNN object of Keras?
It was actually quite simple since Keras has built in function for reformatting the array and padding zeros to get a static length:
spoken_train = pad_sequences(spoken_train, maxlen=100)
See github issue

Does 1D Convolutional layer support variable sequence lengths?

I have a series of processed audio files I am using as input into a CNN using Keras. Does the Keras 1D Convolutional layer support variable sequence lengths? The Keras documentation makes this unclear.
https://keras.io/layers/convolutional/
At the top of the documentation it mentions you can use (None, 128) for variable-length sequences of 128-dimensional vectors. Yet at the bottom it declares that the input shape must be a
3D tensor with shape: (batch_size, steps, input_dim)
Given the following example how should I input sequences of variable length into the network
Lets say I have two examples (a and b) containing X 1 dimensional vectors of length 100 that I want to feed into the 1DConv layer as input
a.shape = (100, 100)
b.shape = (200, 100)
Can I use an input shape of (2, None, 100)? Do I need to concatenate these tensors into c where
c.shape = (300, 100)
Then reshape it to be something
c_reshape.shape = (3, 100, 100)
Where 3 is the batch size, 100, is the number of steps, and the second 100 is the input size? The documentation on the input vector is not very clear.
Keras supports variable lengths by using None in the respective dimension when defining the model.
Notice that often input_shape refers to the shape without the batch size.
So, the 3D tensor with shape (batch_size, steps, input_dim) suits perfectly a model with input_shape=(steps, input_dim).
All you need to make this model accept variable lengths is use None in the steps dimension:
input_shape=(None, input_dim)
Numpy limitation
Now, there is a numpy limitation about variable lengths. You cannot create a numpy array with a shape that suits variable lengths.
A few solutions are available:
Pad your sequences with dummy values until they all reach the same size so you can put them into a numpy array of shape (batch_size, length, input_dim). Use Masking layers to disconsider the dummy values.
Train with separate numpy arrays of shape (1, length, input_dim), each array having its own length.
Group your images by sizes into smaller arrays.
Be careful with layers that don't support variable sizes
In convolutional models using variable sizes, you can't for instance, use Flatten, the result of the flatten would have a variable size if this were possible. And the following Dense layers would not be able to have a constant number of weights. This is impossible.
So, instead of Flatten, you should start using GlobalMaxPooling1D or GlobalAveragePooling1D layers.

Keras conv1d layer parameters: filters and kernel_size

I am very confused by these two parameters in the conv1d layer from keras:
https://keras.io/layers/convolutional/#conv1d
the documentation says:
filters: Integer, the dimensionality of the output space (i.e. the number output of filters in the convolution).
kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D convolution window.
But that does not seem to relate to the standard terminologies I see on many tutorials such as https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/ and https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
Using the second tutorial link which uses Keras, I'd imagine that in fact 'kernel_size' is relevant to the conventional 'filter' concept which defines the sliding window on the input feature space. But what about the 'filter' parameter in conv1d? What does it do?
For example, in the following code snippet:
model.add(embedding_layer)
model.add(Dropout(0.2))
model.add(Conv1D(filters=100, kernel_size=4, padding='same', activation='relu'))
suppose the embedding layer outputs a matrix of dimension 50 (rows, each row is a word in a sentence) x 300 (columns, the word vector dimension), how does the conv1d layer transforms that matrix?
Many thanks
You're right to say that kernel_size defines the size of the sliding window.
The filters parameters is just how many different windows you will have. (All of them with the same length, which is kernel_size). How many different results or channels you want to produce.
When you use filters=100 and kernel_size=4, you are creating 100 different filters, each of them with length 4. The result will bring 100 different convolutions.
Also, each filter has enough parameters to consider all input channels.
The Conv1D layer expects these dimensions:
(batchSize, length, channels)
I suppose the best way to use it is to have the number of words in the length dimension (as if the words in order formed a sentence), and the channels be the output dimension of the embedding (numbers that define one word).
So:
batchSize = number of sentences
length = number of words in each sentence
channels = dimension of the embedding's output.
The convolutional layer will pass 100 different filters, each filter will slide along the length dimension (word by word, in groups of 4), considering all the channels that define the word.
The outputs are shaped as:
(number of sentences, 50 words, 100 output dimension or filters)
The filters are shaped as:
(4 = length, 300 = word vector dimension, 100 output dimension of the convolution)
Below code from the explanation can help do this. I went similar question and answered it myself.
from tensorflow.keras.layers import MaxPool1D
import tensorflow.keras.backend as K
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv1D
tf.random.set_seed(1) # nowadays instead of tf.set_random_seed(1)
batch,rows,cols = 3,8,3
m, n, k = batch, rows, cols
input_shape = (batch,rows,cols)
np.random.seed(132) # nowadays instead of np.set_random_seed = 132
data = np.random.randint(low=1,high=6,size=input_shape,dtype='int32')
data = np.float32(data)
data = tf.constant(data)
print("Data:")
print(K.eval(data))
print()
print(f'm,n,k:{input_shape}')
from tensorflow.keras.layers import Conv1D
#############################
# Understandin filters and kernel_size
##############################
num_filters=5
kernel_size= 3
'''
Few Notes about Kernel_size:
1. max_kernel_size == max_rows
2. since Conv1D, we are creating 1D Matrix of 1's with kernel_size
if kernel_size = 1, [[1,1,1..]]
if kernel_size = 2, [[1,1,1..][1,1,1,..]]
if kernel_size = 3, [[1,1,1..][1,1,1,..]]
I have chosen tf.keras.initializers.constant(1) to create a matrix of Ones.
Size of matrix is Kernel_Size
'''
y= Conv1D(filters=num_filters,kernel_size=kernel_size,
kernel_initializer=tf.keras.initializers.constant(1),
#glorot_uniform(seed=12)
input_shape=(k,n)
)(data)
#########################
# Checking the out outcome
#########################
print(K.eval(y))
print(f' Resulting output_shape == (batch_size, num_rows-kernel_size+1,num_filters): {y.shape}')
# # Verification
K.eval(tf.math.reduce_sum(data,axis=(2,1), # Sum along axis=2, and then along
axis=1,keep_dims=True)
###########################################
# Understanding MaxPool and Strides in
##########################################
pool = MaxPool1D(pool_size=3,strides=3)(y)
print(K.eval(pool))
print(f'Shape of Pool: {pool.shape}')

What is an Embedding in Keras?

Keras documentation isn't clear what this actually is. I understand we can use this to compress the input feature space into a smaller one. But how is this done from a neural design perspective? Is it an autoenocder, RBM?
As far as I know, the Embedding layer is a simple matrix multiplication that transforms words into their corresponding word embeddings.
The weights of the Embedding layer are of the shape (vocabulary_size, embedding_dimension). For each training sample, its input are integers, which represent certain words. The integers are in the range of the vocabulary size. The Embedding layer transforms each integer i into the ith line of the embedding weights matrix.
In order to quickly do this as a matrix multiplication, the input integers are not stored as a list of integers but as a one-hot matrix. Therefore the input shape is (nb_words, vocabulary_size) with one non-zero value per line. If you multiply this by the embedding weights, you get the output in the shape
(nb_words, vocab_size) x (vocab_size, embedding_dim) = (nb_words, embedding_dim)
So with a simple matrix multiplication you transform all the words in a sample into the corresponding word embeddings.
The Keras Embedding layer is not performing any matrix multiplication but it only:
1. creates a weight matrix of (vocabulary_size)x(embedding_dimension) dimensions
2. indexes this weight matrix
It is always useful to have a look at the source code to understand what a class does. In this case, we will have a look at the class Embedding which inherits from the base layer class called Layer.
(1) - Creating a weight matrix of (vocabulary_size)x(embedding_dimension) dimensions:
This is occuring at the build function of Embedding:
def build(self, input_shape):
self.embeddings = self.add_weight(
shape=(self.input_dim, self.output_dim),
initializer=self.embeddings_initializer,
name='embeddings',
regularizer=self.embeddings_regularizer,
constraint=self.embeddings_constraint,
dtype=self.dtype)
self.built = True
If you have a look at the base class Layer you will see that the function add_weight above simply creates a matrix of trainable weights (in this case of (vocabulary_size)x(embedding_dimension) dimensions):
def add_weight(self,
name,
shape,
dtype=None,
initializer=None,
regularizer=None,
trainable=True,
constraint=None):
"""Adds a weight variable to the layer.
# Arguments
name: String, the name for the weight variable.
shape: The shape tuple of the weight.
dtype: The dtype of the weight.
initializer: An Initializer instance (callable).
regularizer: An optional Regularizer instance.
trainable: A boolean, whether the weight should
be trained via backprop or not (assuming
that the layer itself is also trainable).
constraint: An optional Constraint instance.
# Returns
The created weight variable.
"""
initializer = initializers.get(initializer)
if dtype is None:
dtype = K.floatx()
weight = K.variable(initializer(shape),
dtype=dtype,
name=name,
constraint=constraint)
if regularizer is not None:
with K.name_scope('weight_regularizer'):
self.add_loss(regularizer(weight))
if trainable:
self._trainable_weights.append(weight)
else:
self._non_trainable_weights.append(weight)
return weight
(2) - Indexing this weight matrix
This is occuring at the call function of Embedding:
def call(self, inputs):
if K.dtype(inputs) != 'int32':
inputs = K.cast(inputs, 'int32')
out = K.gather(self.embeddings, inputs)
return out
This functions returns the output of the Embedding layer which is K.gather(self.embeddings, inputs). What tf.keras.backend.gather exactly does is to index the weights matrix self.embeddings (see build function above) according to the inputs which should be lists of positive integers.
These lists can be retrieved for example if you pass your text/words inputs to the one_hot function of Keras which encodes a text into a list of word indexes of size n (this is NOT one hot encoding - see also this example for more info: https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/).
Therefore, that's all. There is no matrix multiplication.
On the contrary, the Keras Embedding layer is only useful because exactly it avoids performing a matrix multiplication and hence it economizes on some computational resources.
Otherwise, you could just use a Keras Dense layer (after you have encoded your input data) to get a matrix of trainable weights (of (vocabulary_size)x(embedding_dimension) dimensions) and then simply do the multiplication to get the output which will be exactly the same with the output of the Embedding layer.
In Keras, the Embedding layer is NOT a simple matrix multiplication layer, but a look-up table layer (see call function below or the original definition).
def call(self, inputs):
if K.dtype(inputs) != 'int32':
inputs = K.cast(inputs, 'int32')
out = K.gather(self.embeddings, inputs)
return out
What it does is to map each a known integer n in inputs to a trainable feature vector W[n], whose dimension is the so-called embedded feature length.
In simple words (from the functionality point of view), it is a one-hot encoder and fully-connected layer. The layer weights are trainable.

Resources