How can I jointly optimize the parameters of a model comprising two distinct neural networks with a single optimizer? What I've tried is the following, after having initialized an optimizer:
optim_global = optim.Adam(zip(model1.parameters(), model2.parameters()))
but I get this error
TypeError: optimizer can only optimize Tensors, but one of the params is tuple
These are generator you can control either with the unpacking operator *:
>>> optim.Adam([*model1.parameters(), *model2.parameters()])
Or using itertools.chain
>>> optim.Adam(chain(model1.parameters(), model2.parameters()))
Related
I am printing a tensorflow.keras.Model instance summary. The type is tensorflow.python.keras.engine.functional.Functional object.
This model has layers with activations and batch normalization associated. When I print the list of parameters, I see
weights
bias
4 items co-dimensional with the bias
These four items are (I guess) the batch normalization and activations.
My question is: why do we have parameters associated with batch normalization and activations? And what could be the other two items?
My aim is to transpose this Keras model to a PyTorch counterpart, so I need to know the order of the parameters and what these parameters represent
there are no parameters associated with activations, those are simply some element-wise nonlinear function. So no matter how many activations you have they don't account for any parameter counts. However, your guess is right, there are in fact parameters associated with BatchNorm layer, 2 parameters in each BatchNorm layer to be precise (lambda and beta). So those BatchNorm layer does add additional parameters in your network.
When generating adversarial examples, it is typically using logits as the output of the neural network, and then train the network with cross-entropy.
However, I found that the tutorial of cleverhans uses log softmax and then convert the pytorch model to a tensorflow model, and finally train the model.
https://github.com/tensorflow/cleverhans/blob/master/cleverhans_tutorials/mnist_tutorial_pytorch.py#L65
I am wondering if anyone has the idea about whether using logits instead of log_softmax will make any difference?
As you said, when we get logits from a neural network, we train it using CrossEntropyLoss. An alternative way is to compute the log_softmax and then train the network by minimizing the negative log-likelihood (NLLLoss).
Both approaches are basically the same if you are training a network for classification tasks. However, if you have a different objective function, you may find one of these two techniques, particularly useful in your scenario.
Reference
CrossEntropyLoss
NLLLoss
I am trying to merge two networks. I can accomplish this by doing the following:
merged = Merge([CNN_Model, RNN_Model], mode='concat')
But I get a warning:
merged = Merge([CNN_Model, RNN_Model], mode='concat')
__main__:1: UserWarning: The `Merge` layer is deprecated and will be removed after 08/2017. Use instead layers from `keras.layers.merge`, e.g. `add`, `concatenate`, etc.
So I tried this:
merged = Concatenate([CNN_Model, RNN_Model])
model = Sequential()
model.add(merged)
and got this error:
ValueError: The first layer in a Sequential model must get an `input_shape` or `batch_input_shape` argument.
Can anyone give me the syntax as how I would get this to work?
Don't use sequential models for models with branches.
Use the Functional API:
from keras.models import Model
You're right in using the Concatenate layer, but you must pass "tensors" to it. And first you create it, then you call it with input tensors (that's why there are two parentheses):
concatOut = Concatenate()([CNN_Model.output,RNN_Model.output])
For creating a model out of that, you need to define the path from inputs to outputs:
model = Model([CNN_Model.input, RNN_Model.input], concatOut)
This answer assumes your existing models have only one input and output each.
I've implemented a neural network using Keras. Once trained and tested for final test accuracy, using a matrix with a bunch of rows containing features (plus corresponding labels), I have a model which I should be able to use for prediction.
How can I feed a single unseen example, meaning a feature vector to the model, to obtain a class prediction?
I've looked at their documentation here but could not find a method for it.
What you want is the predict method, it takes a batch of input samples and produces predictions, which are the outputs computer by your network. To feed a single example you can just put it inside a numpy ndarray wrapper.
I am looking for a way to implement the following network structure (currently using Keras, might be theano however):
Assume we're given some simple network, but it is not possible to compute the desired loss based on this output directly, instead another operation is needed and the loss will be defined based on the output of this operation. However, this operation does not only need the output of the network but the full network object (eg its gradient).
How can this be done? I think the operation could be performed either in a custom layer on top of the network or in a custom loss function - but for neither version I see a way to access the full network. Any suggestions?
Assume we're given some simple network, but it is not possible to compute the desired loss based on this output directly, instead another operation is needed and the loss will be defined based on the output of this operation. However, this operation does not only need the output of the network but the full network object (eg its gradient).
Say, you have the following model.
import keras.applications.vgg16 as vgg16
model = vgg16.VGG16(weights='imagenet')
model.summary()
For example, now you want to delete the last layer of this model which is actually predicts a category (a vector of length 1000 because imagenet has 1000 categories) for the input image.
# Remove last Linear/Dense layer.
model.layers.pop()
model.outputs = [model.layers[-1].output]
model.layers[-1].outbound_nodes = []
model.summary()
Now, lets add a linear layer (with output size 10) to this model and use the output of the modified neural network model.
model.add(Dense(10, activation='softmax'))
model.summary()
You will get a vector (of length 10) as an output from this model.
You can compile and train the model using model.compile() and model.fit() functions. You can set what type of loss function you want to use to train the model.