Arbitrary length inputs for CNNs in sequential learning - conv-neural-network

In An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, the authors state that TCN networks, a specific type of 1D CNNs applied to sequential data, "can also take in inputs of arbitrary lengths by sliding the 1D convolutional kernels", just like Recurrent Nets. I am asking myself how this can be done.
For an RNN, it is straight-forward that the same function would be applied as often as is the input length. However, for CNNs (or any feed-forward NN in general), one must prespecify the number of input neurons. So the only way I can see TCNs dealing with arbitrary length inputs is by specifying a fixed length input neuron space and then adding zero padding to the arbitrary length inputs.
Am I correct in my understanding?

If you have a fully convolutional neural network, there is no reason to have a fully specified input shape. You definitely need a fixed rank, and the last dimension probably should be the same but otherwise, you can definitely specify an input shape which in tensorflow would look like Input((None, 10)) in the case of 1D-CNNs.
Indeed the shape of the convolution kernel doesn't depend on the length of the input in the temporal dimension (it can depend on the last dimension though typically in convolutional neural networks), and you can apply it to any input with the same rank (and same last dimension).
For example let's say that you are applying only a single 1D convolution, with a kernel that's doing the sum of 2 neighbouring elements (kernel = (1, 1)). This operation could be applied to any input length given it's always 1D.
However, when being confronted with a sequence-to-label task and requiring further operations in the stack such as a fully-connected layer, the inputs must be of fixed length (or must be made so through zero padding).

Related

Merging several convolutional layers into one

How can I compose several convolutional layers into one layer. I mean if there is no non-linear activations in between. How do I write a code for it in pytorch?
I want the code to account for different padding and strides. I thought about having a template image and run the conv layers on it to obtain one kernel, but can't really come up with a meaningful way to do it
Here there are detailed instructions for collapsing 2 convolution layers into 1.
You can use the code to merge the first two and then to merge the outcome with the third.
Conceptually, you can vision the process in a simpler way by using the 'Toeplitz matrix' represantaion of each convolution operation, then use matrix multiplication to multiply all three, and then return to one convolution operation (since it is more efficiently implemented, and the Toeplitz representation is very sparse).
The convolution operation can be constructed as a matrix multiplication, where one of the inputs is converted into a Toeplitz matrix.
You can see an example of this approach here.

What's the difference between torch.mean and torch.nn.avg_pool?

Taking a tensor with shape [4,8,12] as an example, what's the difference between the two lines:
torch.mean(x, dim=2)
torch.nn.functional.avg_pool1d(x, kernel_size=12)
With the very example you provided the result is the same, but only because you specified dim=2 and kernel_size equal to the dimensionality of the third (index 2) dimension.
But in principle, you are applying two different functions, that sometimes just happen to collide with specific choices of hyperparameters.
torch.mean is effectively a dimensionality reduction function, meaning that when you average all values across one dimension, you effectively get rid of that dimension.
On the other hand, average 1-dimensional pooling is more powerful in this regard, as it gives you a lot more flexibility in choosing kernel size, padding and stride like you would normally do when using a convolutional layer.
You can see the first function as a specific case of 1-d pooling.

Extracting hidden representations for each token - PyTorch LSTM

I am currently working on a NLP project involving recurrent neural networks. I implemented a LSTM with PyTorch, following the tutorial here.
For my project, I need to extract the hidden representation for every token of an input text. I thought that the easiest way would be to test using a batch size and sequence length of 1, but when I do that the loss gets orders of magnitude larger than in training phase (during training I used a batch size of 64 and a sequence length of 35).
Is there any other way I can easily access these word-level hidden representations? Thank you.
Yes, that is possible with nn.LSTM as long as it is a single layer LSTM. If u check the documentation (here), for the output of an LSTM, you can see it outputs a tensor and a tuple of tensors. The tuple contains the hidden and cell for the last sequence step. What each dimension means of the output depends on how u initialized your network. Either the first or second dimension is the batch dimension and the rest is the sequence of word embeddings you want.
If u use a packed sequence as input, it is a bit of a different story.

LSTM input longer than output

I am not sure if I understand how exactly Keras version of LSTM works.
Let's say I have vector of len=20 as input and I specify keras.layers.LSTM(units=10)
So in this example does the network finish after processing 50% of input or it precess the rest from start (I mean from first cell)?
Units are never related to the input size.
Units are related only to the output size (units = output features or channels).
An LSTM layer will always process the entire data and optionally return either the "same length (all steps)" or "no length (only last step)".
In terms of shapes
You must have an input tensor with shape (batch, len=20, input_features).
And it will output:
For return_sequences=False: (batch, output_features=10) - no length
For return_sequences=True: (batch, len=20, output_features=10) - same length
Output features is always equal to units.
See a full comprehension of the LSTM layers here: Understanding Keras LSTMs

Keras LSTM: first argument

In Keras, if you want to add an LSTM layer with 10 units, you use model.add(LSTM(10)). I've heard that number 10 referred to as the number of hidden units here and as the number of output units (line 863 of the Keras code here).
My question is, are those two things the same? Is the dimensionality of the output the same as the number of hidden units? I've read a few tutorials (like this one and this one), but none of them state this explicitly.
The answers seems to refer to multi-layer perceptrons (MLP) in which the hidden layer can be of different size and often is. For LSTMs, the hidden dimension is the same as the output dimension by construction:
The h is the output for a given timestep and the cell state c is bound by the hidden size due to element wise multiplication. The addition of terms to compute the gates would require that both the input kernel W and the recurrent kernel U map to the same dimension. This is certainly the case for Keras LSTM as well and is why you only provide single units argument.
To get a good intuition for why this makes sense. Remember that the LSTM job is to encode a sequence into a vector (maybe a Gross oversimplification but its all we need). The size of that vector is specified by hidden_units, the output is:
seq vector RNN weights
(1 X input_dim) * (input_dim X hidden_units),
which has 1 X hidden_units (a row vector representing the encoding of your input sequence). And thus, the names in this case are used synonymously.
Of course RNNs require more than one multiplication and keras implements RNNs as a sequence of matrix-matrix multiplications instead vector-matrix shown above.
The number of hidden units is not the same as the number of output units.
The number 10 controls the dimension of the output hidden state (source code for the LSTM constructor method can be found here. 10 specifies the units argument). In one of the tutorial's you have linked to (colah's blog), the units argument would control the dimension of the vectors ht-1 , ht, and ht+1: RNN image.
If you want to control the number of LSTM blocks in your network, you need to specify this as an input into the LSTM layer. The input shape to the layer is (nb_samples, timesteps, input_dim) Keras documentation. timesteps controls how many LSTM blocks your network contains. Referring to the tutorial on colah's blog again, in RNN image, timesteps would control how many green blocks the network contains.

Resources