Keras, binary segmentation, add weight to loss function - python-3.x

I'm solving a binary segmentation problem with Keras (w. tf backend). How can I add more weight to the center of each area of mask?
I've tried dice coef with added cv2.erode(), but it doesn't work
def dice_coef_eroded(y_true, y_pred):
kernel = (3, 3)
y_true = cv2.erode(y_true.eval(), kernel, iterations=1)
y_pred = cv2.erode(y_pred.eval(), kernel, iterations=1)
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + 1) / (K.sum(y_true_f) + K.sum(y_pred_f) + 1)
Keras 2.1.3, tensorflow 1.4

All right, the solution I found is following:
1) Create in your Iterator a method to retrieve weights' matrix (with shape = mask shape). The output must contain [image, mask, weights]
2) Create a Lambda layer containing loss function
3) Create an Identity loss function
Example:
def weighted_binary_loss(X):
import keras.backend as K
import keras.layers.merge as merge
y_pred, weights, y_true = X
loss = K.binary_crossentropy(y_pred, y_true)
loss = merge([loss, weights], mode='mul')
return loss
def identity_loss(y_true, y_pred):
return y_pred
def get_unet_w_lambda_loss(input_shape=(1024, 1024, 3), mask_shape=(1024, 1024, 1)):
images = Input(input_shape)
mask_weights = Input(mask_shape)
true_masks = Input(mask_shape)
...
y_pred = Conv2D(1, (1, 1), activation='sigmoid')(up1) #output of original unet
loss = Lambda(weighted_binary_loss, output_shape=(1024, 1024, 1))([y_pred, mask_weights, true_masks])
model = Model(inputs=[images, mask_weights, true_masks], outputs=loss)

I'm implementing this solution but I wonder what should be the ground truth that we must give to the network. That is, now the output is the loss, and we want the loss to be 0, so should we train the network as follows?
model = get_unet_w_lambda_loss()
model.fit([inputs, weights, masks], zero_images)

Related

Implementation of mean and log_variance in variational autoencoder keras

As the implementation in Keras for the VAE https://keras.io/examples/generative/vae/, we have to pass a mean and log_variance to calculate the distribution in the latent space.
class Sampling(layers.Layer):
"""Uses (z_mean, z_log_var) to sample z, the vector encoding a digit."""
def call(self, inputs):
z_mean, z_log_var = inputs
batch = tf.shape(z_mean)[0]
dim = tf.shape(z_mean)[1]
epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
return z_mean + tf.exp(0.5 * z_log_var) * epsilon
# flatten layer
x = layers.Flatten()(x)
x = layers.Dense(16, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
z = Sampling()([z_mean, z_log_var])
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder")
encoder.summary()
I don't understand how two dense layers can represent the mean and log variance without doing any special calculation? Because from the code above is just simply create a dense layer and receive result from the previous flatten layer.
The dense output layers are trained to output mean and log variance for the input using the Kullback–Leibler divergence loss function.
In the Keras example VAE model it is calculated in the custom train_step using the output of the dense layers:
kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))

Why is the output of a convolutional network so exponentially large?

I am trying to reproduce a unet result on Carvana dataset using Ternausnet in PyTorch using Lightning.
I am using DiceLoss for that with sigmoid activation function. I think I am running into an issue of a vanishing gradient, because all gradients of weights are 0, and I see the output of the network with min value of order 10^8.
What could be the issue here? How can I address the vanishing gradient? Also, if I use a different criterion, I see a problem of loss going into negative values without stopping (for BCE with logits, for instance).
Here is the code for my Dice loss:
class DiceLoss(nn.Module):
def __init__(self):
super().__init__()
def forward(self, logits, targets, eps=0, threshold=None):
# comment out if your model contains a sigmoid or
# equivalent activation layer
proba = torch.sigmoid(logits)
proba = proba.view(proba.shape[0], 1, -1)
targets = targets.view(targets.shape[0], 1, -1)
if threshold:
proba = (proba > threshold).float()
# flatten label and prediction tensors
intersection = torch.sum(proba * targets, dim=1)
summation = torch.sum(proba, dim=1) + torch.sum(targets, dim=1)
dice = (2.0 * intersection + eps) / (summation + eps)
# print(intersection, summation, dice)
return (1 - dice).mean()

Measuring uncertainty using MC Dropout on pytorch

I am trying to implement Bayesian CNN using Mc Dropout on Pytorch,
the main idea is that by applying dropout at test time and running over many forward passes , you get predictions from a variety of different models.
I’ve found an application of the Mc Dropout and I really did not get how they applied this method and how exactly they did choose the correct prediction from the list of predictions
here is the code
def mcdropout_test(model):
model.train()
test_loss = 0
correct = 0
T = 100
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
output_list = []
for i in xrange(T):
output_list.append(torch.unsqueeze(model(data), 0))
output_mean = torch.cat(output_list, 0).mean(0)
test_loss += F.nll_loss(F.log_softmax(output_mean), target, size_average=False).data[0] # sum up batch loss
pred = output_mean.data.max(1, keepdim=True)[1] # get the index of the max log-probability
correct += pred.eq(target.data.view_as(pred)).cpu().sum()
test_loss /= len(test_loader.dataset)
print('\nMC Dropout Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
train()
mcdropout_test()
I have replaced
data, target = Variable(data, volatile=True), Variable(target)
by adding
with torch.no_grad(): at the beginning
And this is how I have defined my CNN
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 192, 5, padding=2)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(192, 192, 5, padding=2)
self.fc1 = nn.Linear(192 * 8 * 8, 1024)
self.fc2 = nn.Linear(1024, 256)
self.fc3 = nn.Linear(256, 10)
self.dropout = nn.Dropout(p=0.3)
nn.init.xavier_uniform_(self.conv1.weight)
nn.init.constant_(self.conv1.bias, 0.0)
nn.init.xavier_uniform_(self.conv2.weight)
nn.init.constant_(self.conv2.bias, 0.0)
nn.init.xavier_uniform_(self.fc1.weight)
nn.init.constant_(self.fc1.bias, 0.0)
nn.init.xavier_uniform_(self.fc2.weight)
nn.init.constant_(self.fc2.bias, 0.0)
nn.init.xavier_uniform_(self.fc3.weight)
nn.init.constant_(self.fc3.bias, 0.0)
def forward(self, x):
x = self.pool(F.relu(self.dropout(self.conv1(x)))) # recommended to add the relu
x = self.pool(F.relu(self.dropout(self.conv2(x)))) # recommended to add the relu
x = x.view(-1, 192 * 8 * 8)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(self.dropout(x)))
x = self.fc3(self.dropout(x)) # no activation function needed for the last layer
return x
Can anyone help me to get the right implementation of the Monte Carlo Dropout method on CNN?
Implementing MC Dropout in Pytorch is easy. All that is needed to be done is to set the dropout layers of your model to train mode. This allows for different dropout masks to be used during the different various forward passes. Below is an implementation of MC Dropout in Pytorch illustrating how multiple predictions from the various forward passes are stacked together and used for computing different uncertainty metrics.
import sys
import numpy as np
import torch
import torch.nn as nn
def enable_dropout(model):
""" Function to enable the dropout layers during test-time """
for m in model.modules():
if m.__class__.__name__.startswith('Dropout'):
m.train()
def get_monte_carlo_predictions(data_loader,
forward_passes,
model,
n_classes,
n_samples):
""" Function to get the monte-carlo samples and uncertainty estimates
through multiple forward passes
Parameters
----------
data_loader : object
data loader object from the data loader module
forward_passes : int
number of monte-carlo samples/forward passes
model : object
keras model
n_classes : int
number of classes in the dataset
n_samples : int
number of samples in the test set
"""
dropout_predictions = np.empty((0, n_samples, n_classes))
softmax = nn.Softmax(dim=1)
for i in range(forward_passes):
predictions = np.empty((0, n_classes))
model.eval()
enable_dropout(model)
for i, (image, label) in enumerate(data_loader):
image = image.to(torch.device('cuda'))
with torch.no_grad():
output = model(image)
output = softmax(output) # shape (n_samples, n_classes)
predictions = np.vstack((predictions, output.cpu().numpy()))
dropout_predictions = np.vstack((dropout_predictions,
predictions[np.newaxis, :, :]))
# dropout predictions - shape (forward_passes, n_samples, n_classes)
# Calculating mean across multiple MCD forward passes
mean = np.mean(dropout_predictions, axis=0) # shape (n_samples, n_classes)
# Calculating variance across multiple MCD forward passes
variance = np.var(dropout_predictions, axis=0) # shape (n_samples, n_classes)
epsilon = sys.float_info.min
# Calculating entropy across multiple MCD forward passes
entropy = -np.sum(mean*np.log(mean + epsilon), axis=-1) # shape (n_samples,)
# Calculating mutual information across multiple MCD forward passes
mutual_info = entropy - np.mean(np.sum(-dropout_predictions*np.log(dropout_predictions + epsilon),
axis=-1), axis=0) # shape (n_samples,)
Moving on to the implementation which is posted in the question above, multiple predictions from T different forward passes are obtained by first setting the model to train mode (model.train()). Note that this is not desirable because unwanted stochasticity will be introduced in the predictions if there are layers other than dropout such as batch-norm in the model. Hence the best way is to just set the dropout layers to train mode as shown in the snippet above.

Pytorch customize weight

I have a network
class Net(nn.Module)
and two different weights w0 and w1 (concatenate weights of all layers into a vector). Now I want to optimize the network on the line connecting w0 and w1, which means that the weight will have the form theta * w0 + (1-theta) * w1. So now the parameter I want to optimize is no longer the weight itself, but the theta.
How can I implement this? In Pytorch, how can I define the parameter to be theta, and set the weight to be form I want. To be specific, if I create a new class
NetOnLine(nn.Module)
how should I write the forward(self, X) function?
You can define the parameter theta in your net as an nn.Parameter. You'd define the forward function the same way as normal - pass the data through the layers or operations you want and then return it.
Here's a minimal example, where I train a "network" to learn to multiply a Tensor by 2:
import numpy as np
import torch
class SampleNet(torch.nn.Module):
def __init__(self):
super(SampleNet, self).__init__()
self.theta = torch.nn.Parameter(torch.rand(1))
def forward(self, x):
x = x * self.theta.expand_as(x) # expand_as() to match sizes
return x
train_data = np.random.rand(1000, 10)
train_data[:, 5:] = 2 * train_data[:, :5]
train_data = torch.Tensor(train_data)
sample_net = SampleNet()
optimizer = torch.optim.Adam(params=sample_net.parameters())
mse_loss = torch.nn.MSELoss()
for epoch in range(5):
for data in train_data:
x = data[:5]
y = data[5:]
optimizer.zero_grad()
prediction = sample_net(x)
loss = mse_loss(y, prediction)
loss.backward()
optimizer.step()
print(f"Epoch {epoch}, Loss {loss.data.item()}")
print(f"Learned theta: {sample_net.theta.data.item()}")
which prints out
Epoch 0, Loss 0.03369491919875145
Epoch 1, Loss 0.0018534092232584953
Epoch 2, Loss 1.2343853995844256e-05
Epoch 3, Loss 2.2044337466553543e-09
Epoch 4, Loss 4.0527581290916714e-12
Learned theta: 1.999994158744812

What are inputs of Keras layers and custom functions?

Sorry for a nub's question:
Having the NN that is trained in fit_generator mode, say something like:
Lambda(...)
or
Dense(...)
and the custom loss function, what are input tensors?
Am I correct expecting (batch size, previous layer's output) in case of a Lambda layer?
Is it going to be the same (batch size, data) in case of a custom loss function that looks like:
triplet_loss(y_true, y_pred)
Are y_true, y_pred in format (batch,previous layer's output) and (batch, true 'expected' data we fed to NN)?
I would probaly duplicate the dense layers. Instead of having 2 layers with 128 units, have 4 layers with 64 units. The result is the same, but you will be able to perform the cross products better.
from keras.models import Model
#create dense layers and store their output tensors, they use the output of models 1 and to as input
d1 = Dense(64, ....)(Model_1.output)
d2 = Dense(64, ....)(Model_1.output)
d3 = Dense(64, ....)(Model_2.output)
d4 = Dense(64, ....)(Model_2.output)
cross1 = Lambda(myFunc, output_shape=....)([d1,d4])
cross2 = Lambda(myFunc, output_shape=....)([d2,d3])
#I don't really know what kind of "merge" you want, so I used concatenate, there are
Add, Multiply and others....
output = Concatenate()([cross1,cross2])
#use the "axis" attribute of the concatenate layer to define better which axis will
be doubled due to the concatenation
model = Model([Model_1.input,Model_2.input], output)
Now, for the lambda function:
import keras.backend as K
def myFunc(x):
return x[0] * x[1]
custom loss function, what are input tensors?
It depends on how you define your model outputs.
For example, let's define a simple model that returns the input unchanged.
model = Sequential([Lambda(lambda x: x, input_shape=(1,))])
Let's use dummy input X and label Y
x = [[0]]
x = np.array(x)
y = [[4]]
y = np.array(y)
If our custom loss function looks like this
def mce(y_true, y_pred):
print(y_true.shape)
print(y_pred.shape)
return K.mean(K.pow(K.abs(y_true - y_pred), 3))
model.compile('sgd', mce)
and then we can see the shape of y_true and y_pred will be
y_true: (?, ?)
y_pred: (?, 1)
However, for triplet loss the input for the loss function also can be received like this-
ALPHA = 0.2
def triplet_loss(x):
anchor, positive, negative = x
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)
basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), ALPHA)
loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)
return loss
# Source: https://github.com/davidsandberg/facenet/blob/master/src/facenet.py
def build_model(input_shape):
# Standardizing the input shape order
K.set_image_dim_ordering('th')
positive_example = Input(shape=input_shape)
negative_example = Input(shape=input_shape)
anchor_example = Input(shape=input_shape)
# Create Common network to share the weights along different examples (+/-/Anchor)
embedding_network = faceRecoModel(input_shape)
positive_embedding = embedding_network(positive_example)
negative_embedding = embedding_network(negative_example)
anchor_embedding = embedding_network(anchor_example)
loss = merge([anchor_embedding, positive_embedding, negative_embedding],
mode=triplet_loss, output_shape=(1,))
model = Model(inputs=[anchor_example, positive_example, negative_example],
outputs=loss)
model.compile(loss='mean_absolute_error', optimizer=Adam())
return model

Resources