Does 1D Convolutional layer support variable sequence lengths? - keras

I have a series of processed audio files I am using as input into a CNN using Keras. Does the Keras 1D Convolutional layer support variable sequence lengths? The Keras documentation makes this unclear.
https://keras.io/layers/convolutional/
At the top of the documentation it mentions you can use (None, 128) for variable-length sequences of 128-dimensional vectors. Yet at the bottom it declares that the input shape must be a
3D tensor with shape: (batch_size, steps, input_dim)
Given the following example how should I input sequences of variable length into the network
Lets say I have two examples (a and b) containing X 1 dimensional vectors of length 100 that I want to feed into the 1DConv layer as input
a.shape = (100, 100)
b.shape = (200, 100)
Can I use an input shape of (2, None, 100)? Do I need to concatenate these tensors into c where
c.shape = (300, 100)
Then reshape it to be something
c_reshape.shape = (3, 100, 100)
Where 3 is the batch size, 100, is the number of steps, and the second 100 is the input size? The documentation on the input vector is not very clear.

Keras supports variable lengths by using None in the respective dimension when defining the model.
Notice that often input_shape refers to the shape without the batch size.
So, the 3D tensor with shape (batch_size, steps, input_dim) suits perfectly a model with input_shape=(steps, input_dim).
All you need to make this model accept variable lengths is use None in the steps dimension:
input_shape=(None, input_dim)
Numpy limitation
Now, there is a numpy limitation about variable lengths. You cannot create a numpy array with a shape that suits variable lengths.
A few solutions are available:
Pad your sequences with dummy values until they all reach the same size so you can put them into a numpy array of shape (batch_size, length, input_dim). Use Masking layers to disconsider the dummy values.
Train with separate numpy arrays of shape (1, length, input_dim), each array having its own length.
Group your images by sizes into smaller arrays.
Be careful with layers that don't support variable sizes
In convolutional models using variable sizes, you can't for instance, use Flatten, the result of the flatten would have a variable size if this were possible. And the following Dense layers would not be able to have a constant number of weights. This is impossible.
So, instead of Flatten, you should start using GlobalMaxPooling1D or GlobalAveragePooling1D layers.

Related

PyTorch high-dimensional tensor through linear layer

I have a tensor of size (32, 128, 50) in PyTorch. These are 50-dim word embeddings with a batch size of 32. That is, the three indices in my size correspond to number of batches, maximum sequence length (with 'pad' token), and the size of each embedding. Now, I want to pass this through a linear layer to get an output of size (32, 128, 1). That is, for every word embedding in every sequence, I want to make it one dimensional. I tried adding a linear layer to my network going from 50 to 1 dimension, and my output tensor is of the desired shape. So I think this works, but I would like to understand how PyTorch deals with this issue, since I did not explicitly tell it which dimension to apply the linear layer to. I played around with this and found that:
If I input a tensor of shape (32, 50, 50) -- thus creating ambiguity by having two dimensions along which the linear layer could be applied to (two 50s) -- it only applies it to the last dim and gives an output tensor of shape (32, 50, 1).
If I input a tensor of shape (32, 50, 128) it does NOT output a tensor of shape (32, 1, 128), but rather gives me an error.
This suggests that a linear layer in PyTorch applies the transformation to the last dimension of your tensor. Is that the case?
In the nn.Linear docs, it is specified that the input of this module can be any tensor of size (*, H_in) and the output will be a tensor of size (*, H_out), where:
* means any number of dimensions
H_in is the number of in_features
H_out is the number of out_features
To understand this better, for a tensor of size (n, m, 50) can be processed by a Linear module with in_features=50, while a tensor of size (n, 50, m) can be processed by a Linear module with in_features=m (in your case 128).

Audio signal processing using LSTM [duplicate]

This seems to be one of the most common questions about LSTMs in PyTorch, but I am still unable to figure out what should be the input shape to PyTorch LSTM.
Even after following several posts (1, 2, 3) and trying out the solutions, it doesn't seem to work.
Background: I have encoded text sequences (variable length) in a batch of size 12 and the sequences are padded and packed using pad_packed_sequence functionality. MAX_LEN for each sequence is 384 and each token (or word) in the sequence has a dimension of 768. Hence my batch tensor could have one of the following shapes: [12, 384, 768] or [384, 12, 768].
The batch will be my input to the PyTorch rnn module (lstm here).
According to the PyTorch documentation for LSTMs, its input dimensions are (seq_len, batch, input_size) which I understand as following.
seq_len - the number of time steps in each input stream (feature vector length).
batch - the size of each batch of input sequences.
input_size - the dimension for each input token or time step.
lstm = nn.LSTM(input_size=?, hidden_size=?, batch_first=True)
What should be the exact input_size and hidden_size values here?
You have explained the structure of your input, but you haven't made the connection between your input dimensions and the LSTM's expected input dimensions.
Let's break down your input (assigning names to the dimensions):
batch_size: 12
seq_len: 384
input_size / num_features: 768
That means the input_size of the LSTM needs to be 768.
The hidden_size is not dependent on your input, but rather how many features the LSTM should create, which is then used for the hidden state as well as the output, since that is the last hidden state. You have to decide how many features you want to use for the LSTM.
Finally, for the input shape, setting batch_first=True requires the input to have the shape [batch_size, seq_len, input_size], in your case that would be [12, 384, 768].
import torch
import torch.nn as nn
# Size: [batch_size, seq_len, input_size]
input = torch.randn(12, 384, 768)
lstm = nn.LSTM(input_size=768, hidden_size=512, batch_first=True)
output, _ = lstm(input)
output.size() # => torch.Size([12, 384, 512])
The image passed to CNN layer and lstm layer,the feature map shape changes like this
BCHW->BCHW(BxCx1xW),
the CNN's output shape should has the height 1.
then sqeeze the dim of height.
BCHW->BCW
in rnn ,shape name changes,[batch ,seqlen,input_size],in image,[batch,width,channel],
**BCW->BWC,**this is batch_first tensor for LSTM layer(like pytorch).
Finally:
BWC is [batch,seqlen,channel].

Keras conv1d layer parameters: filters and kernel_size

I am very confused by these two parameters in the conv1d layer from keras:
https://keras.io/layers/convolutional/#conv1d
the documentation says:
filters: Integer, the dimensionality of the output space (i.e. the number output of filters in the convolution).
kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D convolution window.
But that does not seem to relate to the standard terminologies I see on many tutorials such as https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/ and https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
Using the second tutorial link which uses Keras, I'd imagine that in fact 'kernel_size' is relevant to the conventional 'filter' concept which defines the sliding window on the input feature space. But what about the 'filter' parameter in conv1d? What does it do?
For example, in the following code snippet:
model.add(embedding_layer)
model.add(Dropout(0.2))
model.add(Conv1D(filters=100, kernel_size=4, padding='same', activation='relu'))
suppose the embedding layer outputs a matrix of dimension 50 (rows, each row is a word in a sentence) x 300 (columns, the word vector dimension), how does the conv1d layer transforms that matrix?
Many thanks
You're right to say that kernel_size defines the size of the sliding window.
The filters parameters is just how many different windows you will have. (All of them with the same length, which is kernel_size). How many different results or channels you want to produce.
When you use filters=100 and kernel_size=4, you are creating 100 different filters, each of them with length 4. The result will bring 100 different convolutions.
Also, each filter has enough parameters to consider all input channels.
The Conv1D layer expects these dimensions:
(batchSize, length, channels)
I suppose the best way to use it is to have the number of words in the length dimension (as if the words in order formed a sentence), and the channels be the output dimension of the embedding (numbers that define one word).
So:
batchSize = number of sentences
length = number of words in each sentence
channels = dimension of the embedding's output.
The convolutional layer will pass 100 different filters, each filter will slide along the length dimension (word by word, in groups of 4), considering all the channels that define the word.
The outputs are shaped as:
(number of sentences, 50 words, 100 output dimension or filters)
The filters are shaped as:
(4 = length, 300 = word vector dimension, 100 output dimension of the convolution)
Below code from the explanation can help do this. I went similar question and answered it myself.
from tensorflow.keras.layers import MaxPool1D
import tensorflow.keras.backend as K
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv1D
tf.random.set_seed(1) # nowadays instead of tf.set_random_seed(1)
batch,rows,cols = 3,8,3
m, n, k = batch, rows, cols
input_shape = (batch,rows,cols)
np.random.seed(132) # nowadays instead of np.set_random_seed = 132
data = np.random.randint(low=1,high=6,size=input_shape,dtype='int32')
data = np.float32(data)
data = tf.constant(data)
print("Data:")
print(K.eval(data))
print()
print(f'm,n,k:{input_shape}')
from tensorflow.keras.layers import Conv1D
#############################
# Understandin filters and kernel_size
##############################
num_filters=5
kernel_size= 3
'''
Few Notes about Kernel_size:
1. max_kernel_size == max_rows
2. since Conv1D, we are creating 1D Matrix of 1's with kernel_size
if kernel_size = 1, [[1,1,1..]]
if kernel_size = 2, [[1,1,1..][1,1,1,..]]
if kernel_size = 3, [[1,1,1..][1,1,1,..]]
I have chosen tf.keras.initializers.constant(1) to create a matrix of Ones.
Size of matrix is Kernel_Size
'''
y= Conv1D(filters=num_filters,kernel_size=kernel_size,
kernel_initializer=tf.keras.initializers.constant(1),
#glorot_uniform(seed=12)
input_shape=(k,n)
)(data)
#########################
# Checking the out outcome
#########################
print(K.eval(y))
print(f' Resulting output_shape == (batch_size, num_rows-kernel_size+1,num_filters): {y.shape}')
# # Verification
K.eval(tf.math.reduce_sum(data,axis=(2,1), # Sum along axis=2, and then along
axis=1,keep_dims=True)
###########################################
# Understanding MaxPool and Strides in
##########################################
pool = MaxPool1D(pool_size=3,strides=3)(y)
print(K.eval(pool))
print(f'Shape of Pool: {pool.shape}')

Convolutional NN for text input in PyTorch

I am trying to implement a text classification model using a CNN. As far as I know, for text data, we should use 1d Convolutions. I saw an example in pytorch using Conv2d but I want to know how can I apply Conv1d for text? Or, it is actually not possible?
Here is my model scenario:
Number of in-channels: 1, Number of out-channels: 128
Kernel size : 3 (only want to consider trigrams)
Batch size : 16
So, I will provide tensors of shape, <16, 1, 28, 300> where 28 is the length of a sentence. I want to use Conv1d which will give me 128 feature maps of length 26 (as I am considering trigrams).
I am not sure, how to define nn.Conv1d() for this setting. I can use Conv2d but want to know is it possible to achieve the same using Conv1d?
This example of Conv1d and Pool1d layers into an RNN resolved my issue.
So, I need to consider the embedding dimension as the number of in-channels while using nn.Conv1d as follows.
m = nn.Conv1d(200, 10, 2) # in-channels = 200, out-channels = 10
input = Variable(torch.randn(10, 200, 5)) # 200 = embedding dim, 5 = seq length
feature_maps = m(input)
print(feature_maps.size()) # feature_maps size = 10,10,4
Although I don't work with text data, the input tensor in its current form would only work using conv2d. One possible way to use conv1d would be to concatenate the embeddings in a tensor of shape e.g. <16,1,28*300>. You can reshape the input with view In pytorch.

Keras: LSTM with class weights

my question is quite closely related to this question but also goes beyond it.
I am trying to implement the following LSTM in Keras where
the number of timesteps be nb_tsteps=10
the number of input features is nb_feat=40
the number of LSTM cells at each time step is 120
the LSTM layer is followed by TimeDistributedDense layers
From the question referenced above I understand that I have to present the input data as
nb_samples, 10, 40
where I get nb_samples by rolling a window of length nb_tsteps=10 across the original timeseries of shape (5932720, 40). The code is hence
model = Sequential()
model.add(LSTM(120, input_shape=(X_train.shape[1], X_train.shape[2]),
return_sequences=True, consume_less='gpu'))
model.add(TimeDistributed(Dense(50, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(10, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(3, activation='relu')))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
Now to my question (assuming the above is correct so far):
The binary responses (0/1) are heavily imbalanced and I need to pass a class_weight dictionary like cw = {0: 1, 1: 25} to model.fit(). However I get an exception class_weight not supported for 3+ dimensional targets. This is because I present the response data as (nb_samples, 1, 1). If I reshape it into a 2D array (nb_samples, 1) I get the exception Error when checking model target: expected timedistributed_5 to have 3 dimensions, but got array with shape (5932720, 1).
Thanks a lot for any help!
I think you should use sample_weight with sample_weight_mode='temporal'.
From the Keras docs:
sample_weight: Numpy array of weights for the training samples, used
for scaling the loss function (during training only). You can either
pass a flat (1D) Numpy array with the same length as the input samples
(1:1 mapping between weights and samples), or in the case of temporal
data, you can pass a 2D array with shape (samples, sequence_length),
to apply a different weight to every timestep of every sample. In this
case you should make sure to specify sample_weight_mode="temporal" in
compile().
In your case you would need to supply a 2D array with the same shape as your labels.
If this is still an issue.. I think the TimeDistributed Layer expects and returns a 3D array (kind of similar to if you have return_sequences=True in the regular LSTM layer). Try adding a Flatten() layer or another LSTM layer at the end before the prediction layer.
d = TimeDistributed(Dense(10))(input_from_previous_layer)
lstm_out = Bidirectional(LSTM(10))(d)
output = Dense(1, activation='sigmoid')(lstm_out)
Using temporal is a workaround. Check out this stack. The issue is also documented on github.

Resources