keras/theano loss or op that uses full network - theano

I am looking for a way to implement the following network structure (currently using Keras, might be theano however):
Assume we're given some simple network, but it is not possible to compute the desired loss based on this output directly, instead another operation is needed and the loss will be defined based on the output of this operation. However, this operation does not only need the output of the network but the full network object (eg its gradient).
How can this be done? I think the operation could be performed either in a custom layer on top of the network or in a custom loss function - but for neither version I see a way to access the full network. Any suggestions?

Assume we're given some simple network, but it is not possible to compute the desired loss based on this output directly, instead another operation is needed and the loss will be defined based on the output of this operation. However, this operation does not only need the output of the network but the full network object (eg its gradient).
Say, you have the following model.
import keras.applications.vgg16 as vgg16
model = vgg16.VGG16(weights='imagenet')
model.summary()
For example, now you want to delete the last layer of this model which is actually predicts a category (a vector of length 1000 because imagenet has 1000 categories) for the input image.
# Remove last Linear/Dense layer.
model.layers.pop()
model.outputs = [model.layers[-1].output]
model.layers[-1].outbound_nodes = []
model.summary()
Now, lets add a linear layer (with output size 10) to this model and use the output of the modified neural network model.
model.add(Dense(10, activation='softmax'))
model.summary()
You will get a vector (of length 10) as an output from this model.
You can compile and train the model using model.compile() and model.fit() functions. You can set what type of loss function you want to use to train the model.

Related

Do I need to apply the Softmax Function ANYWHERE in my multi-class classification Model?

I am currently turning my Binary Classification Model to a multi-class classification Model. Bare with me.. I am very knew to pytorch and Machine Learning.
Most of what I state here, I know from the following video.
https://www.youtube.com/watch?v=7q7E91pHoW4&t=654s
What I read / know is that the CrossEntropyLoss already has the Softmax function implemented, thus my output layer is linear.
What I then read / saw is that I can just choose my Model prediction by taking the torch.max() of my model output (Which comes from my last linear output. This feels weird because I Have some negative outputs and i thought I need to apply the SOftmax function first, but It seems to work right without it.
So know the big confusing question I have is, when would I use the Softmax function? Would I only use it when my loss doesnt have it implemented? BUT then I would choose my prediction based on the outputs of the SOftmax layer which wouldnt be the same as with the linear output layer.
Thank you guys for every answer this gets.
For calculating the loss using CrossEntropy you do not need softmax because CrossEntropy already includes it. However to turn model outputs to probabilities you still need to apply softmax to turn them into probabilities.
Lets say you didnt apply softmax at the end of you model. And trained it with crossentropy. And then you want to evaluate your model with new data and get outputs and use these outputs for classification. At this point you can manually apply softmax to your outputs. And there will be no problem. This is how it is usually done.
Traning()
MODEL ----> FC LAYER --->raw outputs ---> Crossentropy Loss
Eval()
MODEL ----> FC LAYER --->raw outputs --> Softmax -> Probabilites
Yes you need to apply softmax on the output layer. When you are doing binary classification you are free to use relu, sigmoid,tanh etc activation function. But when you are doing multi class classification softmax is required because softmax activation function distributes the probability throughout each output node. So that you can easily conclude that the output node which has the highest probability belongs to a particular class. Thank you. Hope this is useful!

Keras Embedding layer activation function?

In the fully connected hidden layer of Keras embedding, what is the activation function leveraged? I'm either misunderstanding the concept of this class or unable to find documentation. I understand that it is encoding from word to real-valued vector of dimension d via answers like the below on stackoverflow:
Embedding layers in Keras are trained just like any other layer in your network architecture: they are tuned to minimize the loss function by using the selected optimization method. The major difference with other layers, is that their output is not a mathematical function of the input. Instead the input to the layer is used to index a table with the embedding vectors [1]. However, the underlying automatic differentiation engine has no problem to optimize these vectors to minimize the loss function...
In my network, I have a word embedding portion that is then linked to a larger network that is predicting a binary outcome (e.g., click yes/no). I understand that this Keras embedding is not operating like word2vec because here my embedding is being trained and updated against my end cross-entropy function. But, there is no mention of how the embedding fully-connected layer is activated. Thanks!

Keras - Process single image to single Layer

I want to take a single Conv2D layer in Keras and give it a single image so I can see the output activations. (I'm doing some custom layer research)
In other words, I do not want to use the fit function, nor do back prop, and no batch of images.
How is this done.
You define a Model instance which has the layer of your interest as an output (a bit similar to what you would do with an autoencoder to get the encoding) and you call model.predict() with a single image.

How to get keras layer's output in a multiple-input model?

When a Keras model accept multiple inputs, its layers behave like there is just one input. It might be a bug.
model = vgg19.VGG19(weights='imagenet', include_top=False, pooling='avg')
model(image1)
model(image2)
model.get_output_at(0)
model.get_output_at(1)
#no error here
outputs_0 = [layer.get_output_at(0) for layer in model.layers]
#no error here
outputs_1 = [layer.get_output_at(1) for layer in model.layers]
#error "Asked to get output at node 1, but the layer has only 1 inbound nodes."
I'm really not sure about what is outputs_0, since model have two inputs, image1 and image2, and when a layer return its output, what is its corresponding input?
In keras, If you have a model: .
print your model, you can know layer name;
wrap a new model;
get output;
from keras.models import Model
print(<your_model>.summary())
<new_model> = Model(inputs=<your_model>.input, outputs=<your_model>.get_layer('your layer_name').get_output_at(<index_number>))
<your_output> = <new_model>.predict(<your_input>)
Regardless of the model's inputs and outputs, there is no rule about how the layers behave inside a model. A model may have many internal branches and reuse (or not) the same layer with different inputs, yielding thus different outputs. A layer will only have "output at 1 (or more)" if that layer was used more than once.
The only certain things are:
the input layers will match the model's input (see 1),
and the output layers will match the model's output (see 1).
But anything is possible in between (see 2).
(1) - But, a model that has many inputs/outputs actually has many "input/output layers". Each output layer has a single output. If you check the "model" outputs, you have many, but if you check the "layers" outputs, then there are several output layers, each yealding a single output (output at 0 only). The same is valid for model's inputs vs input layers.
(2) - Even though, the most common option is to have layers being used only once, and thus having only "output at 0", without additional outputs.

Feed an unseen example to a pre-trained model made in Keras

I've implemented a neural network using Keras. Once trained and tested for final test accuracy, using a matrix with a bunch of rows containing features (plus corresponding labels), I have a model which I should be able to use for prediction.
How can I feed a single unseen example, meaning a feature vector to the model, to obtain a class prediction?
I've looked at their documentation here but could not find a method for it.
What you want is the predict method, it takes a batch of input samples and produces predictions, which are the outputs computer by your network. To feed a single example you can just put it inside a numpy ndarray wrapper.

Resources