I'm very new to keras and I'm facing a problem.
Suppose I have two input nodes A and B. If A is connected to a node C in a hidden layer with a weight w, I want B to be connected with C with exactly the same weight w, and reversely.
In other words, I want the input nodes A and B to share exactly the same weights with the hidden layer.
Do you know if it is possible ?
Thank you
Related
I try to follow a tutorial.
The building of the model starts with
netw <- keras_model_sequential()
### one input layer, one output layer and one hidden layer
netw %>% layer_dense(units = 500, activation = "relu", input_shape = c(6)) %>%
layer_dense(units=300, activation="relu") %>%
layer_dense(units=2, activation="softmax")
units define the number of nodes. I am confused about the number of layers. The input layer should have 6 nodes, since there are 6 features. Why does it have units=500? Does this really specify an input layer with 6 nodes and a second layer with 500 nodes? So there would be 4 layers instead of the three stated in the comment?
The input shape does refer to the size of the input layer. This is the number of data point being fed into the network while units = 500 is the number of network nodes the data is being passed to.
Here is a visualization of your model:
As you can see, there are indeed four layers in the model. The input layer is not generally counted because every network will generally contain an input layer. More specifically, the input layer is not a computation layer, so there are only three levels of computation being carried out.
I am trying to build a neural network model where the input data has many missing values which are hard to fill in by any means in advance. Therefore, the idea is to train a neural network with only observed data. The data vector fed in the input layer is then a vector with missing values in various positions. The positions of the missing values will not be fixed.
After some search, I found Tensorflow has a masking layer for use. Therefore, I inserted a masking layer right after the input layer,
inputs = keras.Input(shape=(inputDim,))
maskingLayer = keras.Masking(mask_value = -999)(inputs)
where the missing values are replaced with -999 in the preprocessing. After that, several dense layers are inserted and the model was compiled and fit in usual way.
The question is that I don't see much effect of the masking layer. I am wondering if the masking layer really masked out all the nodes of value -999 in the input layer as well as the weights and biases connected to them?
I found this post who had a similar question
Not fully connected layer in tensorflow
However, his unwanted links are fixed and in my case I would like to build a layer (next to the input layer) that only connects to the unmasked nodes of the input layer. Is it possible to do it?
Thanks.
I'm trying to build an autoencoder like network but I don't know how to specify the output shape of the network. I have an input of size mxn and a paired expected output of size pxq. I've seen
Calculate the Output size in Convolution layer
Getting the output shape of deconvolution layer using tf.nn.conv2d_transpose in tensorflow
but is there a way force an output shape without having to work out a bunch of math for every input shape?
I really don't think there is a way to do this (tough I will be happy to learn otherwise), since this output shape of a conv layer of any kind is the result of a mathematical operation (convolution) with several parameters. because of that, the resulting shape has to be one of the possible shapes, based on the input tensor and the parameters (stride, kernel size and so on).
this is in contrast to a dense (fully-connected layer), where you can get any shape you want as long as its a single number (4, 60 or 5000 - but not (60,60)).
one small trick that can help you in this kind of situations sometimes is to get the shape of the previous layer print it so you know what parameters you need for the next layer and make sure your calculations are correct:
import keras.backend as K
x = Conv2D()(x) # or any other layer
shape = K.int_shape(x)
print(shape)
x = Conv2D()(x)
I would like to know how to take gradient steps for the following mathematical operation in PyTorch (A, B and C are PyTorch modules whose parameters do not overlap)
This is somewhat different than the cost function of a Generative Adversarial Network (GAN), so I cannot use examples for GANs off the shelf, and I got stuck while trying to adapt them for the above cost.
One approach I thought of is to construct two optimizers. Optimizer opt1 has the parameters for the modules A and B, and optimizer opt2 has the parameters of module C. One can then:
take a step for minimizing the cost function for C
run the network again with the same input to get the costs (and intermediate outputs) again
take a step with respect to A and B.
I am sure they must be a better way to do this with PyTorch (maybe using some detach operations), possibly without running the network again. Any help is appreciated.
Yes it is possible without going through the network two times, which is both wasting resources and wrong mathematically, since the weights have changed and so the lost, so you are introducing a delay doing this, which may be interesting but not what you are trying to achieve.
First, create two optimizers just as you said. Compute the loss, and then call backward. At this point, the gradient for the parameters A,B,C have been filled, so now you can just have to call the step method for the optimizer minimizing the loss, but not for the one maximizing it. For the later, you need to reverse the sign of the gradient of the leaf parameter tensor C.
def d(y, x):
return torch.pow(y.abs(), x + 1)
A = torch.nn.Linear(1,2)
B = torch.nn.Linear(2,3)
C = torch.nn.Linear(2,3)
optimizer1 = torch.optim.Adam((*A.parameters(), *B.parameters()))
optimizer2 = torch.optim.Adam(C.parameters())
x = torch.rand((10, 1))
loss = (d(B(A(x)), x) - d(C(A(x)), x)).sum()
optimizer1.zero_grad()
optimizer2.zero_grad()
loss.backward()
for p in C.parameters():
if p.grad is not None: # In general, C is a NN, with requires_grad=False for some layers
p.grad.data.mul_(-1) # Update of grad.data not tracked in computation graph
optimizer1.step()
optimizer2.step()
NB: I have not checked mathematically if the result is correct but I assume it is.
In this article, I've come across the following network structure:
Figure 1(b). https://wx4.sinaimg.cn/mw690/5396ee05ly1fg9vi5phcbj20vj0kb0ty.jpg
Each layer is a fully connected one.
The weights shared by the two parts are denoted by Wc.
The pairs of the top fully connected layers of dimension 500 are concatenated to create a layer of dimension 1000 which is then used directly to reconstruct the input of size 784.
I want to implement it with keras, however I am not skilled with keras.
any ideas on how to implement this?
thank you very much!