tflearn DNN gives zero loss - python-3.x

I am using pandas to extract my data. To get an idea of my data I replicated an example dataset...
data = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
which yields a dataset of shape=(100,4)...
A B C D
0 75 38 81 58
1 36 92 80 79
2 22 40 19 3
... ...
I am using tflearn so I will need a target label as well. So I created a target label by extracting one of the columns from data and then dropped it out of the data variable (I also converted everything to numpy arrays)...
# Target label used for training
labels = np.array(data['A'].values, dtype=np.float32)
# Reshape target label from (100,) to (100, 1)
labels = np.reshape(labels, (-1, 1))
# Data for training minus the target label.
data = np.array(data.drop('A', axis=1).values, dtype=np.float32)
Then I take the data and the labels and feed it into the DNN...
# Deep Neural Network.
net = tflearn.input_data(shape=[None, 3])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 1, activation='softmax')
net = tflearn.regression(net)
# Define model.
model = tflearn.DNN(net)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)
This seems like it should work, but the output I get is as follows...
Notice that the loss remains at 0, so I am definitely doing something wrong. I don't really know what form my data should be in. How can I get my training to work?

Your actual output is in range 0 to 100 while the activation softmax in the outermost layer outputs in range [0, 1]. You need to fix that. Also the default loss for tflearn.regression is categorical cross entropy which is used for classification problems and makes no sense in your scenario. You should try L2 loss. The reason you are getting zero error in this setting is that your network predicts 0 for all training examples and if you fit that value in formula for sigmoid cross entropy, loss indeed is zero. Here is its formula , where t[i] denotes the actual probabilities (which doesnt make sense in your problem) and o[i] is the predicted probabilities.
Here is more reasoning about why default choice of loss function is not suitable for your case

Related

Nan loss in keras with triplet loss

I'm trying to learn an embedding for Paris6k images combining VGG and Adrian Ung triplet loss. The problem is that after a small amount of iterations, in the first epoch, the loss becomes nan, and then the accuracy and validation accuracy grow to 1.
I've already tried lowering the learning rate, increasing the batch size (only to 16 beacuse of memory), changing optimizer (Adam and RMSprop), checking if there are None values on my dataset, changing data format from 'float32' to 'float64', adding a little bias to them and simplify the model.
Here is my code:
base_model = VGG16(include_top = False, input_shape = (512, 384, 3))
input_images = base_model.input
input_labels = Input(shape=(1,), name='input_label')
embeddings = Flatten()(base_model.output)
labels_plus_embeddings = concatenate([input_labels, embeddings])
model = Model(inputs=[input_images, input_labels], outputs=labels_plus_embeddings)
batch_size = 16
epochs = 2
embedding_size = 64
opt = Adam(lr=0.0001)
model.compile(loss=tl.triplet_loss_adapted_from_tf, optimizer=opt, metrics=['accuracy'])
label_list = np.vstack(label_list)
x_train = image_list[:2500]
x_val = image_list[2500:]
y_train = label_list[:2500]
y_val = label_list[2500:]
dummy_gt_train = np.zeros((len(x_train), embedding_size + 1))
dummy_gt_val = np.zeros((len(x_val), embedding_size + 1))
H = model.fit(
x=[x_train,y_train],
y=dummy_gt_train,
batch_size=batch_size,
epochs=epochs,
validation_data=([x_val, y_val], dummy_gt_val),callbacks=callbacks_list)
The images are 3366 with values scaled in range [0, 1].
The network takes dummy values because it tries to learn embeddings from images in a way that images of the same class should have small distance, while images of different classes should have high distances and than the real class is part of the training.
I've noticed that I was previously making an incorrect class division (and keeping images that should be discarded), and I didn't have the nan loss problem.
What should I try to do?
Thanks in advance and sorry for my english.
In some case, the random NaN loss can be caused by your data, because if there are no positive pairs in your batch, you will get a NaN loss.
As you can see in Adrian Ung's notebook (or in tensorflow addons triplet loss; it's the same code) :
semi_hard_triplet_loss_distance = math_ops.truediv(
math_ops.reduce_sum(
math_ops.maximum(
math_ops.multiply(loss_mat, mask_positives), 0.0)),
num_positives,
name='triplet_semihard_loss')
There is a division by the number of positives pairs (num_positives), which can lead to NaN.
I suggest you try to inspect your data pipeline in order to ensure there is at least one positive pair in each of your batches. (You can for example adapt some of the code in the triplet_loss_adapted_from_tf to get the num_positives of your batch, and check if it is greater than 0).
Try increasing your batch size. It happened to me also. As mentioned in the previous answer, network is unable to find any num_positives. I had 250 classes and was getting nan loss initially. I increased it to 128/256 and then there was no issue.
I saw that Paris6k has 15 classes or 12 classes. Increase your batch size 32 and if the GPU memory occurs you can try with model with less parameters. You can work on Efficient B0 model for starting. It has 5.3M compared to VGG16 which has 138M parameters.
I have implemented a package for triplet generation so that every batch is guaranteed to include postive pairs. It is compatible with TF/Keras only.
https://github.com/ma7555/kerasgen (Disclaimer: I am the owner)

Ways to limit output of NN regression problem in certain limit(i.e. I want my NN to always predict output values only between -20 to +30)

I am training NN for the regression problem. So the output layer has a linear activation function. NN output is supposed to be between -20 to 30. My NN is performing good most of the time. However, sometimes it gives output more than 30 which is not desirable for my system. So does anyone know any activation function that can provide such kind of restriction on output or any suggestions on modifying linear activation function for my application?
I am using Keras with tenserflow backend for this application
What you can do is to activate your last layer with a sigmoid, the result will be between 0 and 1 and then create a custom layer in order to get the desired range :
def get_range(input, maxx, minn):
return (minn - maxx) * ((input - K.min(input, axis=1))/ (K.max(input, axis=1)*K.min(input, axis=1))) + maxx
and then add this to your network :
out = layers.Lambda(get_range, arguments={'maxx': 30, 'minn': -20})(sigmoid_output)
The output will be normalized between 'maxx' and 'minn'.
UPDATE
If you want to clip your data without normalizing all your outputs, do this instead :
def clip(input, maxx, minn):
return K.clip(input, minn, maxx)
out = layers.Lambda(clip, arguments={'maxx': 30, 'minn': -20})(sigmoid_output)
What you should do is normalize your target outputs to the range [-1, 1] or [0, 1], and then use a tanh (for [-1, 1]) or sigmoid (for [0, 1]) activation at the output, and train the model with normalize data.
Then you can denormalize the predictions to get values in your original ranges during inference.

How to predict target label using scikit learn

Let's say I have a dataset, I will provide a toy example in this instance...
data = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
target = "A"
...which generates...
A B C D
0 75 38 81 58
1 36 92 80 79
2 22 40 19 3
... ...
This is clearly not enough data to give a good accuracy, but nevertheless, let's say I feed data and target to a random forest algorithm provided by scikit learn...
def random_forest(target, data):
# Drop the target label, which we save separately.
X = data.drop([target], axis=1).values
y = data[target].values
# Run Cross Validation on Random Forest Classifier.
clf_tree = ske.RandomForestClassifier(n_estimators=50)
unique_permutations_cross_val(X, y, clf_tree)
unique_permutations_cross_val is simply a cross-validation function I made, this is the function (it prints out the accuracy of the model as well)...
def unique_permutations_cross_val(X, y, model):
# Split data 20/80 to be used in a K-Fold Cross Validation with unique permutations.
shuffle_validator = model_selection.ShuffleSplit(n_splits=10, test_size=0.2, random_state=0)
# Calculate the score of the model after Cross Validation has been applied to it.
scores = model_selection.cross_val_score(model, X, y, cv=shuffle_validator)
# Print out the score (mean), as well as the variance.
print("Accuracy: %0.4f (+/- %0.2f)" % (scores.mean(), scores.std()))
Anyways, my main question is, how can I predict the target label using this model I created. For example, let's say I feed the model [28, 12, 33]. I want the model to predict the target which in this case is "A".
This model in the posted code is not fitted yet. You did cross validation, which will tell you how well (or not) the model is training on your data, but it will not fit the model object as you want. cross_val_score() uses clones of the supplied model object to find the scores.
For predicting the data, you need to explicitly call fit() on the model.
So maybe you can edit your random_forest method to return the fitted model. Something like this:
unique_permutations_cross_val(X, y, clf_tree)
clf_tree.fit(X, y)
return clf_tree
And then wherever you are calling the random_forest method, you can do this:
fitted_model = random_forest(target, data)
predictions = fitted_model.predict([data to predict])

How to calculate class scores when batch size changes

My question is at the bottom, but first I will explain what I am attempting to achieve.
I have an example I am trying to implement on my own model. I am creating an adversarial image, in essence I want to graph how the image score changes when the epsilon value changes.
So let's say my model has already been trained, and in this example I am using the following model...
x = tf.placeholder(tf.float32, shape=[None, 784])
...
...
# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits) # Softmax
Next, let us assume I extract an array of images of the number 2 from the mnist data set, and I saved it in the following variable...
# convert into a numpy array of shape [100, 784]
labels_of_2 = np.concatenate(labels_of_2, axis=0)
So now, in the example that I have, the next step is to try different epsilon values on every image...
# random epsilon values from -1.0 to 1.0
epsilon_res = 101
eps = np.linspace(-1.0, 1.0, epsilon_res).reshape((epsilon_res, 1))
labels = [str(i) for i in range(10)]
num_colors = 10
cmap = plt.get_cmap('hsv')
colors = [cmap(i) for i in np.linspace(0, 1, num_colors)]
# Create an empty array for our scores
scores = np.zeros((len(eps), 10))
for j in range(len(labels_of_2)):
# Pick the image for this iteration
x00 = labels_of_2[j].reshape((1, 784))
# Calculate the sign of the derivative,
# at the image and at the desired class
# label
sign = np.sign(im_derivative[j])
# Calculate the new scores for each
# adversarial image
for i in range(len(eps)):
x_fool = x00 + eps[i] * sign
scores[i, :] = logits.eval({x: x_fool,
keep_prob: 1.0})
Now we can graph the images using the following...
# Create a figure
plt.figure(figsize=(10, 8))
plt.title("Image {}".format(j))
# Loop through the score functions for each
# class label and plot them as a function of
# epsilon
for k in range(len(scores.T)):
plt.plot(eps, scores[:, k],
color=colors[k],
marker='.',
label=labels[k])
plt.legend(prop={'size':8})
plt.xlabel('Epsilon')
plt.ylabel('Class Score')
plt.grid('on')
For the first image the graph would look something like the following...
Now Here Is My Question
Let's say the model I trained used a batch_size of 100, in that case the following line would not work...
scores[i, :] = logits.eval({x: x_fool,
keep_prob: 1.0})
In order for this to work, I would need to pass an array of 100 images to the model, but in this instance x_fool is just one image of size (1, 784).
I want to graph the effect of different epsilon values on class scores for any one image, but how can I do so when I need calculate the score of 100 images at a time (since my model was trained on a batch_size of 100)?
You can choose to not choose a batch size by setting it to None. That way, any batch size can be used.
However, keep in mind that this non-choice could com with a moderate penalty.
This fixes it if you start again from scratch. If you start from an existing trained network with a batch size of 100, you can create a test network that is similar to your starting network except for the batch size. You can set the batch size to 1, or again, to None.
I realised the problem was not with the batch_size but with the format of the image I was attempting to pass to the model. As user1735003 pointed out, the batch_size does not matter.
The reason I could not pass the image to the model was because I was passing it as so...
x_fool = x00 + eps[i] * sign
scores[i, :] = logits.eval({x: x_fool})
The problem with this is that the shape of the image is simply (784,) whereas the placeholder needs to accept an array of images of shape shape=[None, 784], so what needs to be done is to reshape the image.
x_fool = labels_of_2[0].reshape((1, 784)) + eps[i] * sign
scores[i, :] = logits.eval({x:x_fool})
Now my image is shape (1, 784) which can now be accepted by the placeholder.

Input dimension mismatch binary crossentropy Lasagne and Theano

I read all posts in the net adressing the issue where people forgot to change the target vector to a matrix, and as a problem remains after this change, I decided to ask my question here. Workarounds are mentioned below, but new problems show and I am thankful for suggestions!
Using a convolution network setup and binary crossentropy with sigmoid activation function, I get a dimension mismatch problem, but not during the training data, only during validation / test data evaluation. For some strange reason, of of my validation set vectors get his dimension switched and I have no idea, why. Training, as mentioned above, works fine. Code follows below, thanks a lot for help (and sorry for hijacking the thread, but I saw no reason for creating a new one), most of it copied from the lasagne tutorial example.
Workarounds and new problems:
Removing "axis=1" in the valAcc definition helps, but validation accuracy remains zero and test classification always returns the same result, no matter how many nodes, layers, filters etc. I have. Even changing training set size (I have around 350 samples for each class with 48x64 grayscale images) does not change this. So something seems off
Network creation:
def build_cnn(imgSet, input_var=None):
# As a third model, we'll create a CNN of two convolution + pooling stages
# and a fully-connected hidden layer in front of the output layer.
# Input layer using shape information from training
network = lasagne.layers.InputLayer(shape=(None, \
imgSet.shape[1], imgSet.shape[2], imgSet.shape[3]), input_var=input_var)
# This time we do not apply input dropout, as it tends to work less well
# for convolutional layers.
# Convolutional layer with 32 kernels of size 5x5. Strided and padded
# convolutions are supported as well; see the docstring.
network = lasagne.layers.Conv2DLayer(
network, num_filters=32, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify,
W=lasagne.init.GlorotUniform())
# Max-pooling layer of factor 2 in both dimensions:
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# Another convolution with 16 5x5 kernels, and another 2x2 pooling:
network = lasagne.layers.Conv2DLayer(
network, num_filters=16, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify)
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# A fully-connected layer of 64 units with 25% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.25),
num_units=64,
nonlinearity=lasagne.nonlinearities.rectify)
# And, finally, the 2-unit output layer with 50% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.5),
num_units=1,
nonlinearity=lasagne.nonlinearities.sigmoid)
return network
Target matrices for all sets are created like this (training target vector as an example)
targetsTrain = np.vstack( (targetsTrain, [[targetClass], ]*numTr) );
...and the theano variables as such
inputVar = T.tensor4('inputs')
targetVar = T.imatrix('targets')
network = build_cnn(trainset, inputVar)
predictions = lasagne.layers.get_output(network)
loss = lasagne.objectives.binary_crossentropy(predictions, targetVar)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)
valPrediction = lasagne.layers.get_output(network, deterministic=True)
valLoss = lasagne.objectives.binary_crossentropy(valPrediction, targetVar)
valLoss = valLoss.mean()
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar), dtype=theano.config.floatX)
train_fn = function([inputVar, targetVar], loss, updates=updates, allow_input_downcast=True)
val_fn = function([inputVar, targetVar], [valLoss, valAcc])
Finally, here the two loops, training and test. The first is fine, the second throws the error, excerpts below
# -- Neural network training itself -- #
numIts = 100
for itNr in range(0, numIts):
train_err = 0
train_batches = 0
for batch in iterate_minibatches(trainset.astype('float32'), targetsTrain.astype('int8'), len(trainset)//4, shuffle=True):
inputs, targets = batch
print (inputs.shape)
print(targets.shape)
train_err += train_fn(inputs, targets)
train_batches += 1
# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0
for batch in iterate_minibatches(valset.astype('float32'), targetsVal.astype('int8'), len(valset)//3, shuffle=False):
[inputs, targets] = batch
[err, acc] = val_fn(inputs, targets)
val_err += err
val_acc += acc
val_batches += 1
Erorr (excerpts)
Exception "unhandled ValueError"
Input dimension mis-match. (input[0].shape[1] = 52, input[1].shape[1] = 1)
Apply node that caused the error: Elemwise{eq,no_inplace}(DimShuffle{x,0}.0, targets)
Toposort index: 36
Inputs types: [TensorType(int64, row), TensorType(int32, matrix)]
Inputs shapes: [(1, 52), (52, 1)]
Inputs strides: [(416, 8), (4, 4)]
Inputs values: ['not shown', 'not shown']
Again, thanks for help!
so it seems the error is in the evaluation of the validation accuracy.
When you remove the "axis=1" in your calculation, the argmax goes on everything, returning only a number.
Then, broadcasting steps in and this is why you would see the same value for the whole set.
But from the error you have posted, the "T.eq" op throws the error because it has to compare a 52 x 1 with a 1 x 52 vector (matrix for theano/numpy).
So, I suggest you try to replace the line with:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
I hope this should fix the error, but I haven't tested it myself.
EDIT:
The error lies in the argmax op that is called.
Normally, the argmax is there to determine which of the output units is activated the most.
However, in your setting you only have one output neuron which means that the argmax over all output neurons will always return 0 (for first arg).
This is why you have the impression your network gives you always 0 as output.
By replacing:
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))
with:
binaryPrediction = valPrediction > .5
valAcc = T.mean(T.eq(binaryPrediction, targetVar.T)
you should get the desired result.
I'm just not sure, if the transpose is still necessary or not.

Resources