How can I compose several convolutional layers into one layer. I mean if there is no non-linear activations in between. How do I write a code for it in pytorch?
I want the code to account for different padding and strides. I thought about having a template image and run the conv layers on it to obtain one kernel, but can't really come up with a meaningful way to do it
Here there are detailed instructions for collapsing 2 convolution layers into 1.
You can use the code to merge the first two and then to merge the outcome with the third.
Conceptually, you can vision the process in a simpler way by using the 'Toeplitz matrix' represantaion of each convolution operation, then use matrix multiplication to multiply all three, and then return to one convolution operation (since it is more efficiently implemented, and the Toeplitz representation is very sparse).
The convolution operation can be constructed as a matrix multiplication, where one of the inputs is converted into a Toeplitz matrix.
You can see an example of this approach here.
Related
I have a 5-dimensional dataset and I'm interested in using a neural network to model the posterior distributions from which the data was drawn. I decided to implement a GAN to do this, and have been familiarizing myself with PyTorch.
I'm wondering how one should go about restricting what values the generator can produce for the parameters. For one of the parameters, the values must be nonnegative real values. For another case, the values must be nonnegative integer values. For the other three cases, the parameters can take on any real value.
My first idea was to control this through the transfer function applied to the nodes in the output layer of my neural network. But all of the PyTorch examples I've seen so far apply the same transfer function to all of the output nodes, which is not what I want to do. Is there a way to apply a different transfer function to each output node? Or is there maybe a better way to approach this problem?
[Yolo model summary][1]
Also can someone explain the values in arguments column
[1]: https://i.stack.imgur.com/weBPt.png
I am studying the yolov5 architecture right now, so do not take my answer as absolute truth, but for my understanding the C3 Layer is a CSP bottleneck that includes 3 convolutional layers. Essentially it does a Conv on the input tensor and it concats the result to the same tensor passed through a convolution AND a series of bottleneck layers with e=1. Then the whole thing is passed again through a Convolution layer. CSP stands for Cross Stage Partial layer.
As per the first column, it is used in the forward function of the model to understand which tensor to use as the input value of each layer. The majority of the layers has '-1', meaning they take the last layer's output before them as their input, but there are Concat layers that take different levels as input to recreate the PANet architecture in the neck.
For further questions, I suggest you to ask in the Yolov5 github issues section, as they are often quick to give you answers.
I am implementing a custom CNN with some custom modules in it. I have implemented only the forward pass for the custom modules and left their backward pass to autograd.
I have manually computed the correct formulae for backpropagation through the parameters of the custom modules, and I wished to see whether they match with the formulae used internally by autograd to compute the gradients.
Is there any way to see this?
Thanks
Edit (To add a test case) :-
I have a complex affine layer where the weights and inputs are complex-valued matrices, and the operation is a matrix multiplication of the weight and input matrices.
The multiplication of two complex numbers is given by -
(a+ib)(c+id) = (ac-bd)+i(ad+bc)
I computed the backpropagation formula for this layer given we have the incoming gradient from the higher layer.
It comes out to be dL/dI(n) = (hermitian(W(n))).matmul(dL/dI(n+1))
where I(n) and W(n) are the input and weight of nth layer and I(n+1) is input of (n+1)th layer.
So I wished to check whether autograd is also computing dL/dI(n) using the same formula that I derived.
(Since Pytorch doesn't support complex-valued tensors backpropagation as for now, I have created my own representation of complex numbers by dealing with separate real and imaginary tensors)
I don't believe there is such a feature in pytorch, even because it would be quite unreadable. What you can do is to implement a custom backward method for your layer with the formula you derived, then know by design that the backpropagation is what you want.
In An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, the authors state that TCN networks, a specific type of 1D CNNs applied to sequential data, "can also take in inputs of arbitrary lengths by sliding the 1D convolutional kernels", just like Recurrent Nets. I am asking myself how this can be done.
For an RNN, it is straight-forward that the same function would be applied as often as is the input length. However, for CNNs (or any feed-forward NN in general), one must prespecify the number of input neurons. So the only way I can see TCNs dealing with arbitrary length inputs is by specifying a fixed length input neuron space and then adding zero padding to the arbitrary length inputs.
Am I correct in my understanding?
If you have a fully convolutional neural network, there is no reason to have a fully specified input shape. You definitely need a fixed rank, and the last dimension probably should be the same but otherwise, you can definitely specify an input shape which in tensorflow would look like Input((None, 10)) in the case of 1D-CNNs.
Indeed the shape of the convolution kernel doesn't depend on the length of the input in the temporal dimension (it can depend on the last dimension though typically in convolutional neural networks), and you can apply it to any input with the same rank (and same last dimension).
For example let's say that you are applying only a single 1D convolution, with a kernel that's doing the sum of 2 neighbouring elements (kernel = (1, 1)). This operation could be applied to any input length given it's always 1D.
However, when being confronted with a sequence-to-label task and requiring further operations in the stack such as a fully-connected layer, the inputs must be of fixed length (or must be made so through zero padding).
I have a dataset for detecting fake news that i got from kaggle( https://www.kaggle.com/c/fake-news/data ).
I want to use LSTM for the classification
The mean length of words in a single article is about 750 words. I have tried to remove punctuation, stop words, removed numbers. Preprocessing the text is also taking a very long time.
I'd like a method to feed large text into the LSTM using keras. What should i do to reduce computation time and not lose a lot of accuracy.
There are some things you could try to speed things up:
1. Use CUDNN version of LSTM
It is usually faster, check available layers here keras.layers.CuDNNLSTM is what you are after.
2. Use Conv1d to create features
You can use 1 dimensional convolution with kernel_size specifying how many words should be taken into account and stride specifying the jump of moving window. For kernel_size=3 and stride=3, padding="SAME" it would drop your dimensionality three times.
You may stack more convolutional layers.
On top of that you can still employ LSTM normally.
3. Drop LSTM altogether
You may go with 1d convolutions and pooling for classification, RNNs are not the only way.
On the upside: you will not encounter vanishing gradients (could be mitigated a little by Bidirectional LSTM as well).
On the downside: you will lose strict dependence between words, though it shouldn't be much of a problem for binary classification (I suppose it's your goal).