Backprop not getting to layers pytorch - pytorch

I had some trouble getting layers in nn.Module to work. I had a bunch of layers that I combined into another layers input. I combined their input this way:
previous_out = torch.tensor([previousLayer1Out, previousLayer2Out])

I found out that doing this broke pytorch's connection between this layer and the previous ones. This fixed it:
previous_out = torch.cat((previousLayer1Out, previousLayer2Out), 0)
I think this is because pytorch keeps track of the inputs/outputs of each layer to perform the back propagation. Before I was creating new tensors, but now I'm concatenating the originals.

Related

effect of masking layer

I am trying to build a neural network model where the input data has many missing values which are hard to fill in by any means in advance. Therefore, the idea is to train a neural network with only observed data. The data vector fed in the input layer is then a vector with missing values in various positions. The positions of the missing values will not be fixed.
After some search, I found Tensorflow has a masking layer for use. Therefore, I inserted a masking layer right after the input layer,
inputs = keras.Input(shape=(inputDim,))
maskingLayer = keras.Masking(mask_value = -999)(inputs)
where the missing values are replaced with -999 in the preprocessing. After that, several dense layers are inserted and the model was compiled and fit in usual way.
The question is that I don't see much effect of the masking layer. I am wondering if the masking layer really masked out all the nodes of value -999 in the input layer as well as the weights and biases connected to them?
I found this post who had a similar question
Not fully connected layer in tensorflow
However, his unwanted links are fixed and in my case I would like to build a layer (next to the input layer) that only connects to the unmasked nodes of the input layer. Is it possible to do it?
Thanks.

Keras: change LSTM argument return_sequences=True after compile

I want to use a (2 layered) pretrained LSTM model and I want to add a new LSTM layer into that before the last Dense layer. So the layer I will add will be third. Since the second LSTM for the pretrained model has set return_sequences=True. I am unable to add third LSTM layer. How can I change the configuration of any layer of pretrained model with LSTM to add another LSTM layer. I am not keen on making another copy of model and copying the weights. I want to change in the existing model itself.
I am trying it as:
model.layers[-1].return_sequences=True
This statement does not generate any error. But the layer configuration still shows return_sequences=False
I also tried changing the configuration of layer explicitly as:
config=model.layers[-1].get_config()
config['layers']['config']['return_sequences']=True
This changes the value of return_sequences in dictionary config. But I do not know how to change the layer. Something like, is not working.
model.layers[-1]=LSTM.from_config(config)
It gives init takes at least two arguments.
Layer[-1] of model is actually a bidirectional wrapped LSTM.
I think making another copy of the model and copying the weights is your best bet. If you study the source code you'd probably be able to figure out a way to hack another layer on, but that would take effort, would potentially not work, and might break in the future.

loading weights keras LSTM not working

I am trying to load the weights from a Keras 1.0 Model into a Keras 2.0 model I created. I am sure the model architecture is exactly the same. The issues I am having is the load_weights() function is loading all the weights.
When I print the weights to a text file from the original model (loaded via load_model) and from the new model with load_weights() the later is missing many entry and are actually different. This also shows itself when making predictions as the accuracy is lower.
This problem only occurs in my LSTM layers. The embedding layers is fine and the Dense layer is also fine.
Any thoughts? I can not use load_model() as the original saved model was done in keras 1.0 and I need to use keras 2.0
EDIT MORE:
I should note I think the issue is the internal states not being loaded. Let me explain though. When I use get_weights() on each layer and I print it too terminal or a file the original model outputs a much larger matrix.
After using load_weights and then get_weights and print the weight matrix is missing many elements. I'm thinking it's the internal states.
The problem was that there was parameters for a compiled graph that were saved. I think it's safe to just port over the weights and continue training to let it catch up (maybe 1-2 epochs) if you can.
Gl

CNN architecture

At first, this question is less about programming itself but about some logic behind the CNN architecture.
I do understand how every layer works but my only question is: Does is make sense to separate the ReLU and Convolution-Layer? I mean, can a ConvLayer exist and work and update its weights by using backpropagation without having a ReLU behind it?
I thought so. This is why I created the following independent layers:
ConvLayer
ReLU
Fully Connected
Pooling
Transformation (transform the 3D output into one dimension) for ConvLayer -> Fully Connected.
I am thinking about merging Layer 1 and 2 into one. What should I go for?
Can it exist?
Yes. It can. There's nothing that stops neural networks from working without non-linearity modules in the model. The thing is, skipping the non-linearity module between two adjacent layers is equivalent to just a linear combination of inputs at layer 1 to get output at layer 2
M1 : Input =====> L1 ====> ReLU ====> L2 =====> Output
M2 : Input =====> L1 ====> ......... ====> L2 =====> Output
M3 : Input =====> L1 =====> Output
M2 & M3 are equivalent since the parameters adjust themselves over the training period to generate the same output. If there is any pooling involved in between, this may not be true but as long as the layers are consecutive, the network structure is just one large linear combination (Think PCA)
There is nothing that prevents the gradient updates & back-propagation throughout the network.
What should you do?
Keep some form of non-linearity between distinct layers. You may create convolution blocks which contain more than 1 convolution layer in it, but you should include a non-linear function at the end of these blocks and definitely after the dense layers. For the dense layer not-using an activation function is completely equivalent to using a single layer.
Have a look here Quora : Role of activation functions
The short answer is: ReLU (or other activation mechanisms) should be added to each of you convolution or fully connected layers.
CNNs and neural networks in general use activation functions like ReLU to introduce non linearity in the model.
Activation functions are usually not a layer themselves, they are an additional computation to each node of a layer. You can see them as an implementation of the mechanism that decides between finding vs not finding a specific pattern.
See this post.
From the TensorFlow perspective, all of your computations are nodes in the graph(usually called a session). So if you want to separate layers which means adding nodes to your computation graph, go ahead but I don't see any practical reason behind it. You can backpropagate it of course, since you are just calculating the gradient of every function with derivation.

Weights in Convolution Layers in Keras

I want to know if the filters' weights in a, for example, 2D convolution layer in Keras are shared along the spatial dimensions by default. If yes, is there any way to have not shared weights?
I found that LocallyConnected2D does what I am looking for.
The LocallyConnected2D layer works similarly to the Conv2D layer, except that weights are unshared, that is, a different set of filters is applied at each different patch of the input.
I'm not clear on what your asking but:
The weights in the a single convolutional layer are shared. That is, the filters share the same weights with each stride.
However The weights between two convolutonal layers are not shared by default in keras.
There is no getting around shared wiegths in the filters within the conv layer. Since the execution of the convolution if offloaded to C++ libraries.
See this answer for further reference, in particular:
The implementation of tf.nn.conv2d() is written in C++, which invokes
optimized code using either Eigen (on CPU) or the cuDNN library (on
GPU). You can find the implementation here.

Resources