Why we use two losses while compiling the combined GAN (SRGAN) network - loss-function

I m working on SRGAN (Super resolution GAN). and i came across a code, in which the author is using MSE loss while compiling Discriminator. and two losses i.e. binary-cross-entropy and MSE, while compiling combined GAN model. i don't understand the use of these loss function. here is the code.
the code for compiling Discriminator is:
discriminator = build_discriminator()
discriminator.compile(loss='mse', optimizer=common_optimizer, metrics=['accuracy']
and the code for compiling combined GAN model is:
adversarial_model = Model([input_low_resolution, input_high_resolution], [probs, features])
adversarial_model.compile(loss=['binary_crossentropy', 'mse'], loss_weights=[1e-3, 1],
optimizer=common_optimizer)
one thing more.. that for the below line of code i get the output shown below. i don't understand what this output mean.
g_loss = adversarial_model.train_on_batch([low_resolution_images, high_resolution_images],
[real_labels, image_features])
Result of the above mentioned code

If it is a fake image generated by the generator, MSE loss is performed with the real image. If it is a false image judged by the discriminator, use BCE loss.

Related

Do I need to apply the Softmax Function ANYWHERE in my multi-class classification Model?

I am currently turning my Binary Classification Model to a multi-class classification Model. Bare with me.. I am very knew to pytorch and Machine Learning.
Most of what I state here, I know from the following video.
https://www.youtube.com/watch?v=7q7E91pHoW4&t=654s
What I read / know is that the CrossEntropyLoss already has the Softmax function implemented, thus my output layer is linear.
What I then read / saw is that I can just choose my Model prediction by taking the torch.max() of my model output (Which comes from my last linear output. This feels weird because I Have some negative outputs and i thought I need to apply the SOftmax function first, but It seems to work right without it.
So know the big confusing question I have is, when would I use the Softmax function? Would I only use it when my loss doesnt have it implemented? BUT then I would choose my prediction based on the outputs of the SOftmax layer which wouldnt be the same as with the linear output layer.
Thank you guys for every answer this gets.
For calculating the loss using CrossEntropy you do not need softmax because CrossEntropy already includes it. However to turn model outputs to probabilities you still need to apply softmax to turn them into probabilities.
Lets say you didnt apply softmax at the end of you model. And trained it with crossentropy. And then you want to evaluate your model with new data and get outputs and use these outputs for classification. At this point you can manually apply softmax to your outputs. And there will be no problem. This is how it is usually done.
Traning()
MODEL ----> FC LAYER --->raw outputs ---> Crossentropy Loss
Eval()
MODEL ----> FC LAYER --->raw outputs --> Softmax -> Probabilites
Yes you need to apply softmax on the output layer. When you are doing binary classification you are free to use relu, sigmoid,tanh etc activation function. But when you are doing multi class classification softmax is required because softmax activation function distributes the probability throughout each output node. So that you can easily conclude that the output node which has the highest probability belongs to a particular class. Thank you. Hope this is useful!

In GAN, is it necessary to compile generator

I have been working on GAN's, the thing that leave me scratching my head is that why we have to compile the generator model, even we compile the combined GAN model, then why compile generator separately.
def create_generator():
generator = Sequential()
generator.add(Dense(256, input_dim=noise_dim))
generator.add(LeakyReLU(0.2))
generator.add(Dense(512))
generator.add(LeakyReLU(0.2))
generator.add(Dense(1024))
generator.add(LeakyReLU(0.2))
generator.add(Dense(img_rows*img_cols*channels, activation='tanh'))
generator.compile(loss='binary_crossentropy', optimizer=optimizer)
return generator
def create_descriminator():
discriminator = Sequential()
discriminator.add(Dense(1024, input_dim=img_rows*img_cols*channels))
discriminator.add(LeakyReLU(0.2))
discriminator.add(Dense(512))
discriminator.add(LeakyReLU(0.2))
discriminator.add(Dense(256))
discriminator.add(LeakyReLU(0.2))
discriminator.add(Dense(1, activation='sigmoid'))
discriminator.compile(loss='binary_crossentropy', optimizer=optimizer)
return discriminator
discriminator = create_descriminator()
generator = create_generator()
# Make the discriminator untrainable when we are training the generator. This doesn't effect the discriminator by itself
discriminator.trainable = False
# Link the two models to create the GAN
gan_input = Input(shape=(noise_dim,))
fake_image = generator(gan_input)
gan_output = discriminator(fake_image)
gan = Model(gan_input, gan_output)
gan.compile(loss='binary_crossentropy', optimizer=optimizer)
As in this code one can see that generator, discriminator and gan (combined model) all three of them are compiled. According to my understanding we should be compiling only discriminator (to train discriminator) and gan (the combined model, to train the generator), because discriminator weights are frozen during GAN training as a result only generator gets trained. So why compile generator
During training, the generator and the discriminator have opposite goals:
the discriminator tries to tell fake images from real images, while the
generator tries to produce images that look real enough to trick the
discriminator.
Because the GAN is composed of two networks with different objectives, it cannot be trained like a regular neural network.
Each training iteration is divided into two phases:
In the first phase, we train the discriminator. A batch of real
images is sampled from the training set and is completed with an
equal number of fake images produced by the generator. The labels are
set to 0 for fake images and 1 for real images, and the discriminator
is trained on this labeled batch for one step, using the binary
cross-entropy loss. Importantly, backpropagation only optimizes the
weights of the discriminator during this phase.
In the second phase, we train the generator. We first use it to
produce another batch of fake images, and once again the
discriminator is used to tell whether the images are fake or real.
This time we do not add real images in the batch, and all the labels
are set to 1 (real): in other words, we want the generator to produce
images that the discriminator will (wrongly) believe to be real!
Crucially, the weights of the discriminator are frozen during this
step, so backpropagation only affects the weights of the generator.
Next, we need to compile these models. The
generator will only be trained through the gan model, so we do not need to
compile it at all. Importantly, the discriminator should not be
trained during the second phase, so we make it non-trainable before
compiling the gan model:

Understanding choice of loss and activation in deep autoencoder?

I am following this keras tutorial to create an autoencoder using the MNIST dataset. Here is the tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
However, I am confused with the choice of activation and loss for the simple one-layer autoencoder (which is the first example in the link). Is there a specific reason sigmoid activation was used for the decoder part as opposed to something such as relu? I am trying to understand whether this is a choice I can play around with, or if it should indeed be sigmoid, and if so why? Similarily, I understand the loss is taken by comparing each of the original and predicted digits on a pixel-by-pixel level, but I am unsure why the loss is binary_crossentropy as opposed to something like mean squared error.
I would love clarification on this to help me move forward! Thank you!
MNIST images are generally normalized in the range [0, 1], so the autoencoder should output images in the same range, for easier learning. This is why a sigmoid activation is used at the output.
The mean squared error loss has a non-linear penalty, with big errors having a larger penalty than smaller errors, which generally leads to converging to the mean of the solution, instead of a more accurace solution. The binary cross-entropy does not have this problem, and thus it is preferred. It works because the output of the model and the labels are in the [0, 1] range, and the loss is applied to all pixels.

How to create a custom keras loss function with opencv?

I'm developing a machine learning model using keras and I notice that the available losses functions are not giving the best results on my test set.
I am using an Unet architecture, where I input a (16,16,3) image and the net also outputs a (16,16,3) picture (auto-encoder). I notice that maybe one way to improve the model would be if I used a loss function that compares pixel to pixel on the gradients (laplacian) between the net output and the ground truth. However, I did not found any tutorial that would handle this kind of application, because it would need to use opencv laplacian function on each output image from the net.
The loss function would be something like this:
def laplacian_loss(y_true, y_pred):
# y_true already is the calculated gradients, only needs to compute on the y_pred
# calculates the gradients for each predicted image
y_pred_lap = []
for img in y_pred:
laplacian = cv2.Laplacian( np.float64(img), cv2.CV_64F )
y_pred_lap.append( laplacian )
y_pred_lap = np.array(y_pred_lap)
# mean squared error, according to keras losses documentation
return K.mean(K.square(y_pred_lap - y_true), axis=-1)
Has anyone done something like that for loss calculation?
Given the code above, it seems that it would be equivalent to using a Lambda() layer as the output layer that applies that transformation in the image, before considering the mean square error.
Regardless as whether it is implemented as a Lambda() layer or in the loss function; the transformation needs to be such that Tensorflow understands how to calculate the gradients. The simplest was to do this would probably be to reimplement the cv2.Laplacian computation using Tensorflow math operations.
In order to use the cv2 library directly, you need to create a function that calculates the gradients for what happens inside the cv2 lib; that seems significantly more error prone.
Gradient descent optimisation relies on being able to compute gradients from the inputs to the loss; and back. Any operation in the middle must be differentiable; and Tensorflow must understand the math operations for auto differentiation to work; or you need to add them manually.
I managed to reach a easy solution. The main feature was that the gradient calculation is actually a 2D filter. For more information about it, please follow the link about the laplacian kernel. In that matter, is necessary that the output of my network be filtered by the laplacian kernel. For that, I created an extra convolutional layer with fixed weights, exactly as the laplacian kernel. After that, the network will have two outputs (one been the desired image, and the other been the gradient's image). So, is also necessary to define both losses.
To make it clearer, I'll exemplify. In the end of the network you'll have something like:
channels = 3 # number of channels of network output
lap = Conv2D(channels , (3,3), padding='same', name='laplacian') (net_output)
model = Model(inputs=[net_input], outputs=[net_out, lap])
Define how you want to calculate the losses for each output:
# losses for output, laplacian and gaussian
losses = {
"enhanced": "mse",
"laplacian": "mse"
}
lossWeights = {"enhanced": 1.0, "laplacian": 0.6}
Compile the model:
model.compile(optimizer=Adam(), loss=losses, loss_weights=lossWeights)
Define the laplacian kernel, apply its values in the weights of the above convolutional layer and set trainable equals False (so it won't be updated).
bias = np.asarray([0]*3)
# laplacian kernel
l = np.asarray([
[[[1,1,1],
[1,-8,1],
[1,1,1]
]]*channels
]*channels).astype(np.float32)
bias = np.asarray([0]*3).astype(np.float32)
wl = [l,bias]
model.get_layer('laplacian').set_weights(wl)
model.get_layer('laplacian').trainable = False
When training, remember that you need two values for the ground truth:
model.fit(x=X, y = {"out": y_out, "laplacian": y_lap})
Observation: Do not utilize the BatchNormalization layer! In case you use it, the weights in the laplacian layer will be updated!

Keras "acc" metrics - an algorithm

In Keras I often see people compile a model with mean square error function and "acc" as metrics.
model.compile(optimizer=opt, loss='mse', metrics=['acc'])
I have been reading about acc and I can not find an algorithm for it?
What if I would change my loss function to binary crossentropy for an example and use 'acc' as metrics? Would this be the same metrics as in first case or Keras changes this acc based on loss function - so binary crossentropy in this case?
Check the source code from line 375. The metric_fn change dependent on loss function, so it is automatically handled by keras.
If you want to compare models using different loss function it could in some cases be necessary to specify what accuracy method you want to grade your model with, such that the models actually are tested with the same tests.

Resources