Implementing Triplet Loss inside Keras Layers - python-3.x

In this blog post, he implements the triple loss outside the Kears layers. He gets the anchor_out, pos_out and neg_out from the network and then passes them to the triplet_loss() function he defined.
I wonder if I can calculate the triplet_loss within the Keras layers by defining my own Lambda layers.
Here's my network design:
anchor_input = Input((600, ), name='anchor')
positive_input = Input((600, ), name='positive_input')
negative_input = Input((600, ), name='negative_input')
# Shared embedding layer for positive and negative items
Shared_DNN = Dense(300)
encoded_anchor = Shared_DNN(anchor_input)
encoded_positive = Shared_DNN(positive_input)
encoded_negative = Shared_DNN(negative_input)
DAP = Lambda(lambda tensors:K.sum(K.square(tensors[0] - tensors[1]),axis=1,keepdims=True),name='DAP_loss') #Distance for Anchor-Positive pair
DAN = Lambda(lambda tensors:K.sum(K.square(tensors[0] - tensors[1]),axis=1,keepdims=True),name='DAN_loss') #Distance for Anchor-Negative pair
Triplet_loss = Lambda(lambda loss:K.max([(loss[0] - loss[1] + margin),0],axis=0),name='Triplet_loss') #Distance for Anchor-Negative pair
DAP_loss = DAP([encoded_anchor,encoded_positive])
DAN_loss = DAN([encoded_anchor,encoded_negative])
#call this layer on list of two input tensors.
Final_loss = Triplet_loss([DAP_loss,DAN_loss])
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=Final_loss)
However, it gives me the error:
Tried to convert 'input' to a tensor and failed. Error: Shapes must be equal rank, but are 2 and 0
From merging shape 0 with other shapes. for 'Triplet_loss_4/Max/packed' (op: 'Pack') with input shapes: [?,1], []
The error is from the Triplet_loss layer. In the K.max() function, the first number loss[0] - loss[1] + margin has the shape (None,1). Yet the second number 0 has the shape (1). The two number are not of the same shape and therefore the K.max() function gives out an error.
My problem is, how to solve this error?
I have tried replacing the 0 with K.constant(0,shape=(1,)) and K.constant(0,shape=(None,1)), but they doesn't work.

Does this work?
Triplet_loss = Lambda(lambda loss: K.maximum(loss[0] - loss[1] + margin, 0.0),
I think the issue with this line
Triplet_loss = Lambda(lambda loss:K.max([(loss[0] - loss[1] + margin), 0],
is that you are putting loss[0]-loss[1]+margin tensor and 0 in the list bracket, which keras interprets as concatenating two tensors. This fails due to the size mismatch; 0 is a scalar and has rank 0, while the first one is 2d array. This is what the error means.
To compare a tensor against a single value element-wise, use K.maximum, which broadcasts automatically when one of the arguments is a scalar.


Output of the model depends on the shape of the weights tensor

I want to train the model to sum the three inputs. So it is as simple as possible.
Firstly the weights are initialized randomly. It produces bad error estimate (approx. 0.5)
Then I initialize the weights with zeros. There are two options:
the shape of the weights tensor is [1, 3]
the shape of the weights tensor is [3]
When I choose the 1st option the model still works bad and can't learn this simple formula.
When I choose the 2nd option it works perfect with the error of 10e-12.
Why the result depends on the shape of the weights? Why do I need to initialize the model with zeros to solve this simple problem?
import torch
from torch.nn import Sequential as Seq, Linear as Lin
from torch.optim.lr_scheduler import ReduceLROnPlateau
X = torch.rand((1024, 3))
y = (X[:,0] + X[:,1] + X[:,2])
m = Seq(Lin(3, 1, bias=False))
# 1 option
m[0].weight = torch.nn.parameter.Parameter(torch.tensor([[0, 0, 0]], dtype=torch.float))
# 2 option
#m[0].weight = torch.nn.parameter.Parameter(torch.tensor([0, 0, 0], dtype=torch.float))
optim = torch.optim.SGD(m.parameters(), lr=10e-2)
scheduler = ReduceLROnPlateau(optim, 'min', factor=0.5, patience=20, verbose=True)
mse = torch.nn.MSELoss()
for epoch in range(500):
out = m(X)
loss = mse(out, y)
if epoch % 20 == 0:
First option doesn't learning because it fails with broadcasting: while out.shape == (1024, 1) corresponding targets y has shape of (1024, ). MSELoss, as expected, computes mean of tensor (out - y)^2, which in this case has shape (1024, 1024), clearly wrong objective for this task. At the same time, after applying 2-nd option tensor (out - y)^2 has size (1024, ) and mean of it corresponds to actual mse. Default approach, without explicit changing weights shape (through option 1 and 2), would work if set target shape to (1024, 1) for example by y = y.unsqueeze(-1) after definition of y.

Keras: IoU backend implementation where the inputs are the box corners?

I've seen a few implementations of Intersection over Union in Keras Tensorflow, but they all use inputs that represent the contents of the box regions. The network I'm working with passes the four corners of the boxes themselves. How can I implement the algorithm this way with Keras backend?
My attempt:
def IoU():
# y_true: Tensor from the generator of shape (B, N, 5). The last value for each box is the state of the anchor (ignore, negative, positive).
# y_pred: Tensor from the network of shape (B, N, 4).
# box coordinates (in axis 2): [x1, y1, x2, y2]
def _IoU(y_true, y_pred):
anchor_state = y_true[:,:,-1]
regression = y_pred
regression_target = y_true[:,:,:-1]
indices = backend.where(keras.backend.equal(anchor_state, 1))
regression = backend.gather_nd(regression, indices)
regression_target = backend.gather_nd(regression_target, indices)
x1_pred = regression[:,0] # <- fails here
y1_pred = regression[:,1]
x2_pred = regression[:,2]
y2_pred = regression[:,3]
x1_true = regression_target[:,0]
y1_true = regression_target[:,1]
x2_true = regression_target[:,2]
y2_true = regression_target[:,3]
xA = keras.backend.maximum(x1_pred, keras.backend.transpose(x1_true))
yA = keras.backend.maximum(y1_pred, keras.backend.transpose(y1_true))
xB = keras.backend.maximum(x2_pred, keras.backend.transpose(x2_true))
yB = keras.backend.maximum(y2_pred, keras.backend.transpose(y2_true))
interArea = keras.backend.maximum((xB-xA+1),0)*keras.backend.maximum((yB-yA+1),0)
pred_area = (x2_pred-x1_pred+1)*(y2_pred-y1_pred+1)
truth_area = (x2_true-x1_true+1)*(y2_true-y1_true+1)
iou_arr = interArea/(pred_area + keras.backend.transpose(truth_area) - interArea)
iou = keras.backend.mean(iou_arr)
return iou
return _IoU
After some debugging I found that backend.gather_nd reduced regression and regression_target from shapes (?, ?, 4) to shapes (?, 4), hence the indexing for x1_pred.
However, when I try to use this as a metric for a compiled model, I get the following error during runtime:
ValueError: slice index 2 of dimension 1 out of bounds. for 'metrics/_IoU_1/strided_slice_4' (op: 'StridedSlice') with input shapes: [?,2], [2], [2], [2] and with computed input tensors: input[1] = <0 2>, input[2] = <0 3>, input[3] = <1 1>.
If anybody knows a way that I can implement this in Keras, or sees where I've gone wrong, I'd be forever grateful.
EDIT: I figured out that this was because the method was being called twice, once where (y_true, y_pred) were the box prediction/truth (and the last dimension was of shape 4), and then again where (y_true, y_pred) were the class prediction/truth (and I have only 2 classes).
I did get some negative values and some values > 1 for the returned iou values, so maybe I'm still doing something incorrectly?

Linear regression with pytorch

I tried to run linear regression on ForestFires dataset.
Dataset is available on Kaggle and gist of my attempt is here:
I am facing two problems:
Output from prediction is of shape 32x1 and target data shape is 32.
input and target shapes do not match: input [32 x 1], target [32]¶
Using view I reshaped predictions tensor.
y_pred = y_pred.view(inputs.shape[0])
Why there is a mismatch in shapes of predicted tensor and actual tensor?
SGD in pytorch never converges. I tried to compute MSE manually using
print(torch.mean((y_pred - labels)**2))
This value does not match
loss = criterion(y_pred,labels)
Can someone highlight where is the mistake in my code?
Thank you.
Problem 1
This is reference about MSELoss from Pytorch docs:
- Input: (N,∗) where * means, any number of additional dimensions
- Target: (N,∗), same shape as the input
So, you need to expand dims of labels: (32) -> (32,1), by using: torch.unsqueeze(labels, 1) or labels.view(-1,1)
torch.unsqueeze(input, dim, out=None) → Tensor
Returns a new tensor with a dimension of size one inserted at the specified position.
The returned tensor shares the same underlying data with this tensor.
Problem 2
After reviewing your code, I realized that you have added size_average param to MSELoss:
criterion = torch.nn.MSELoss(size_average=False)
size_average (bool, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
That's why 2 computed values not matched. This is sample code:
import torch
import torch.nn as nn
loss1 = nn.MSELoss()
loss2 = nn.MSELoss(size_average=False)
inputs = torch.randn(32, 1, requires_grad=True)
targets = torch.randn(32, 1)
output1 = loss1(inputs, targets)
output2 = loss2(inputs, targets)
output3 = torch.mean((inputs - targets) ** 2)
print(output1) # tensor(1.0907)
print(output2) # tensor(34.9021)
print(output3) # tensor(1.0907)

Keras "Shapes must be of equal rank" error when trying to slice tensor with another tensor

Building off some of the other questions I've asked, I'm trying to define a custom loss function that allows me to slice the contents of an input tensor using the contents of another tensor:
def innerLoss(z):
y_pred = z[0]
patch_x = z[1][0]
patch_y = z[1][1]
patch_true = y_pred[patch_y:patch_y+10, patch_x:patch_x+10, 0]
return 0
originalInputs = Input(shape=(128, 128, 1))
featureInputs = Input(shape=(2,), dtype="int64")
originalOutputs = Input(shape=(128, 128, 1))
loss = Lambda(innerLoss)([originalOutputs, featureInputs])
outerModel = Model(inputs=[originalInputs, featureInputs], outputs=loss)
I'm getting the following error:
ValueError: Shapes must be equal rank, but are 1 and 0
From merging shape 1 with other shapes. for 'lambda_3/strided_slice_2/stack_1' (op: 'Pack') with input shapes: [2], [2], [].
Here, featureInputs will consist of a pair of coordinates telling us where to begin slicing the image in originalInputs.

Input dimension mismatch binary crossentropy Lasagne and Theano

I read all posts in the net adressing the issue where people forgot to change the target vector to a matrix, and as a problem remains after this change, I decided to ask my question here. Workarounds are mentioned below, but new problems show and I am thankful for suggestions!
Using a convolution network setup and binary crossentropy with sigmoid activation function, I get a dimension mismatch problem, but not during the training data, only during validation / test data evaluation. For some strange reason, of of my validation set vectors get his dimension switched and I have no idea, why. Training, as mentioned above, works fine. Code follows below, thanks a lot for help (and sorry for hijacking the thread, but I saw no reason for creating a new one), most of it copied from the lasagne tutorial example.
Workarounds and new problems:
Removing "axis=1" in the valAcc definition helps, but validation accuracy remains zero and test classification always returns the same result, no matter how many nodes, layers, filters etc. I have. Even changing training set size (I have around 350 samples for each class with 48x64 grayscale images) does not change this. So something seems off
Network creation:
def build_cnn(imgSet, input_var=None):
# As a third model, we'll create a CNN of two convolution + pooling stages
# and a fully-connected hidden layer in front of the output layer.
# Input layer using shape information from training
network = lasagne.layers.InputLayer(shape=(None, \
imgSet.shape[1], imgSet.shape[2], imgSet.shape[3]), input_var=input_var)
# This time we do not apply input dropout, as it tends to work less well
# for convolutional layers.
# Convolutional layer with 32 kernels of size 5x5. Strided and padded
# convolutions are supported as well; see the docstring.
network = lasagne.layers.Conv2DLayer(
network, num_filters=32, filter_size=(5, 5),
# Max-pooling layer of factor 2 in both dimensions:
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# Another convolution with 16 5x5 kernels, and another 2x2 pooling:
network = lasagne.layers.Conv2DLayer(
network, num_filters=16, filter_size=(5, 5),
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# A fully-connected layer of 64 units with 25% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.25),
# And, finally, the 2-unit output layer with 50% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.5),
return network
Target matrices for all sets are created like this (training target vector as an example)
targetsTrain = np.vstack( (targetsTrain, [[targetClass], ]*numTr) );
...and the theano variables as such
inputVar = T.tensor4('inputs')
targetVar = T.imatrix('targets')
network = build_cnn(trainset, inputVar)
predictions = lasagne.layers.get_output(network)
loss = lasagne.objectives.binary_crossentropy(predictions, targetVar)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)
valPrediction = lasagne.layers.get_output(network, deterministic=True)
valLoss = lasagne.objectives.binary_crossentropy(valPrediction, targetVar)
valLoss = valLoss.mean()
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar), dtype=theano.config.floatX)
train_fn = function([inputVar, targetVar], loss, updates=updates, allow_input_downcast=True)
val_fn = function([inputVar, targetVar], [valLoss, valAcc])
Finally, here the two loops, training and test. The first is fine, the second throws the error, excerpts below
# -- Neural network training itself -- #
numIts = 100
for itNr in range(0, numIts):
train_err = 0
train_batches = 0
for batch in iterate_minibatches(trainset.astype('float32'), targetsTrain.astype('int8'), len(trainset)//4, shuffle=True):
inputs, targets = batch
print (inputs.shape)
train_err += train_fn(inputs, targets)
train_batches += 1
# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0
for batch in iterate_minibatches(valset.astype('float32'), targetsVal.astype('int8'), len(valset)//3, shuffle=False):
[inputs, targets] = batch
[err, acc] = val_fn(inputs, targets)
val_err += err
val_acc += acc
val_batches += 1
Erorr (excerpts)
Exception "unhandled ValueError"
Input dimension mis-match. (input[0].shape[1] = 52, input[1].shape[1] = 1)
Apply node that caused the error: Elemwise{eq,no_inplace}(DimShuffle{x,0}.0, targets)
Toposort index: 36
Inputs types: [TensorType(int64, row), TensorType(int32, matrix)]
Inputs shapes: [(1, 52), (52, 1)]
Inputs strides: [(416, 8), (4, 4)]
Inputs values: ['not shown', 'not shown']
Again, thanks for help!
so it seems the error is in the evaluation of the validation accuracy.
When you remove the "axis=1" in your calculation, the argmax goes on everything, returning only a number.
Then, broadcasting steps in and this is why you would see the same value for the whole set.
But from the error you have posted, the "T.eq" op throws the error because it has to compare a 52 x 1 with a 1 x 52 vector (matrix for theano/numpy).
So, I suggest you try to replace the line with:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
I hope this should fix the error, but I haven't tested it myself.
The error lies in the argmax op that is called.
Normally, the argmax is there to determine which of the output units is activated the most.
However, in your setting you only have one output neuron which means that the argmax over all output neurons will always return 0 (for first arg).
This is why you have the impression your network gives you always 0 as output.
By replacing:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
binaryPrediction = valPrediction > .5
valAcc = T.mean(T.eq(binaryPrediction, targetVar.T)
you should get the desired result.
I'm just not sure, if the transpose is still necessary or not.
