Change learning rate within minibatch - keras - keras

I have a problem with imbalanced labels, for example 90% of the data have the label 0 and the rest 10% have the label 1.
I want to teach the network with minibatches. So I want the optimizer to give the examples labeled with 1 a learning rate (or somehow change the gradients to be) greater by 9 than those with label 0.
is there any way of doing that?
The problem is that the whole training process is done in this line:
history = model.fit(trainX, trainY, epochs=1, batch_size=minibatch_size, validation_data=(valX, valY), verbose=0)
is there a way to change the fit method in the low level?

You can try using the class_weight parameter of keras.
From keras doc:
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only).
Example of using it in imbalance data:
https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#class_weights
class_weights={"class_1": 1, "class_2": 10}
history = model.fit(trainX, trainY, epochs=1, batch_size=minibatch_size, validation_data=(valX, valY), verbose=0, class_weight=class_weights)
Full example:
# Examine the class label imbalance
# you can use your_df['label_class_column'] or just the trainY values.
neg, pos = np.bincount(your_df['label_class_column'])
total = neg + pos
print('Examples:\n Total: {}\n Positive: {} ({:.2f}% of total)\n'.format(
total, pos, 100 * pos / total))
# Scaling by total/2 helps keep the loss to a similar magnitude.
# The sum of the weights of all examples stays the same.
weight_for_0 = (1 / neg)*(total)/2.0
weight_for_1 = (1 / pos)*(total)/2.0
class_weight = {0: weight_for_0, 1: weight_for_1}

Related

Threshold does not work on numpy array for accuracy metric

I am trying to implement logistic regression from scratch using numpy. I wrote a class with the following methods to implement logistic regression for a binary classification problem and to score it based on BCE loss or Accuracy.
def accuracy(self, true_labels, predictions):
"""
This method implements the accuracy score. Where the accuracy is the number
of correct predictions our model has.
args:
true_labels: vector of shape (1, m) that contains the class labels where,
m is the number of samples in the batch.
predictions: vector of shape (1, m) that contains the model predictions.
"""
counter = 0
for y_true, y_pred in zip(true_labels, predictions):
if y_true == y_pred:
counter+=1
return counter/len(true_labels)
def train(self, score='loss'):
"""
This function trains the logistic regression model and updates the
parameters based on the Batch-Gradient Descent algorithm.
The function prints the training loss and validation loss on every epoch.
args:
X: input features with shape (num_features, m) or (num_features) for a
singluar sample where m is the size of the dataset.
Y: gold class labels of shape (1, m) or (1) for a singular sample.
"""
train_scores = []
dev_scores = []
for i in range(self.epochs):
# perform forward and backward propagation & get the training predictions.
training_predictions = self.propagation(self.X_train, self.Y_train)
# get the predictions of the validation data
dev_predictions = self.predict(self.X_dev, self.Y_dev)
# calculate the scores of the predictions.
if score == 'loss':
train_score = self.loss_function(training_predictions, self.Y_train)
dev_score = self.loss_function(dev_predictions, self.Y_dev)
elif score == 'accuracy':
train_score = self.accuracy((training_predictions==+1).squeeze(), self.Y_train)
dev_score = self.accuracy((dev_predictions==+1).squeeze(), self.Y_dev)
train_scores.append(train_score)
dev_scores.append(dev_score)
plot_training_and_validation(train_scores, dev_scores, self.epochs, score=score)
after testing the code with the following input
model = LogisticRegression(num_features=X_train.shape[0],
Learning_rate = 0.01,
Lambda = 0.001,
epochs=500,
X_train=X_train,
Y_train=Y_train,
X_dev=X_dev,
Y_dev=Y_dev,
normalize=False,
regularize = False,)
model.train(score = 'loss')
i get the following results
however when i swap the scoring metric to measure over time from loss to accuracy ass follows model.train(score='accuracy') i get the following result:
I have removed normalization and regularization to make sure i am using a simple implementation of logistic regression.
Note that i use an external method to visualize the training/validation score overtime in the LogisticRegression.train() method.
The trick you are using to create your predictions before passing into the accuracy method is wrong. You are using (dev_predictions==+1).
Your problem statement is a Logistic Regression model that would generate a value between 0 and 1. Most of the times, the values will NOT be exactly equal to +1.
So essentially, every time you are passing a bunch of False or 0 to the accuracy function. I bet if you check the number of classes in your datasets having the value False or 0 would be :
exactly 51.7 % in validation dataset
exactly 56.2 % in training dataset.
To fix this, you can use a in-between threshold like 0.5 to generate your labels. So use something like dev_predictions>0.5

Nan loss in keras with triplet loss

I'm trying to learn an embedding for Paris6k images combining VGG and Adrian Ung triplet loss. The problem is that after a small amount of iterations, in the first epoch, the loss becomes nan, and then the accuracy and validation accuracy grow to 1.
I've already tried lowering the learning rate, increasing the batch size (only to 16 beacuse of memory), changing optimizer (Adam and RMSprop), checking if there are None values on my dataset, changing data format from 'float32' to 'float64', adding a little bias to them and simplify the model.
Here is my code:
base_model = VGG16(include_top = False, input_shape = (512, 384, 3))
input_images = base_model.input
input_labels = Input(shape=(1,), name='input_label')
embeddings = Flatten()(base_model.output)
labels_plus_embeddings = concatenate([input_labels, embeddings])
model = Model(inputs=[input_images, input_labels], outputs=labels_plus_embeddings)
batch_size = 16
epochs = 2
embedding_size = 64
opt = Adam(lr=0.0001)
model.compile(loss=tl.triplet_loss_adapted_from_tf, optimizer=opt, metrics=['accuracy'])
label_list = np.vstack(label_list)
x_train = image_list[:2500]
x_val = image_list[2500:]
y_train = label_list[:2500]
y_val = label_list[2500:]
dummy_gt_train = np.zeros((len(x_train), embedding_size + 1))
dummy_gt_val = np.zeros((len(x_val), embedding_size + 1))
H = model.fit(
x=[x_train,y_train],
y=dummy_gt_train,
batch_size=batch_size,
epochs=epochs,
validation_data=([x_val, y_val], dummy_gt_val),callbacks=callbacks_list)
The images are 3366 with values scaled in range [0, 1].
The network takes dummy values because it tries to learn embeddings from images in a way that images of the same class should have small distance, while images of different classes should have high distances and than the real class is part of the training.
I've noticed that I was previously making an incorrect class division (and keeping images that should be discarded), and I didn't have the nan loss problem.
What should I try to do?
Thanks in advance and sorry for my english.
In some case, the random NaN loss can be caused by your data, because if there are no positive pairs in your batch, you will get a NaN loss.
As you can see in Adrian Ung's notebook (or in tensorflow addons triplet loss; it's the same code) :
semi_hard_triplet_loss_distance = math_ops.truediv(
math_ops.reduce_sum(
math_ops.maximum(
math_ops.multiply(loss_mat, mask_positives), 0.0)),
num_positives,
name='triplet_semihard_loss')
There is a division by the number of positives pairs (num_positives), which can lead to NaN.
I suggest you try to inspect your data pipeline in order to ensure there is at least one positive pair in each of your batches. (You can for example adapt some of the code in the triplet_loss_adapted_from_tf to get the num_positives of your batch, and check if it is greater than 0).
Try increasing your batch size. It happened to me also. As mentioned in the previous answer, network is unable to find any num_positives. I had 250 classes and was getting nan loss initially. I increased it to 128/256 and then there was no issue.
I saw that Paris6k has 15 classes or 12 classes. Increase your batch size 32 and if the GPU memory occurs you can try with model with less parameters. You can work on Efficient B0 model for starting. It has 5.3M compared to VGG16 which has 138M parameters.
I have implemented a package for triplet generation so that every batch is guaranteed to include postive pairs. It is compatible with TF/Keras only.
https://github.com/ma7555/kerasgen (Disclaimer: I am the owner)

Multi-label classification with class weights in Keras

I have a 1000 classes in the network and they have multi-label outputs. For each training example, the number of positive output is same(i.e 10) but they can be assigned to any of the 1000 classes. So 10 classes have output 1 and rest 990 have output 0.
For the multi-label classification, I am using 'binary-cross entropy' as cost function and 'sigmoid' as the activation function. When I tried this rule of 0.5 as the cut-off for 1 or 0. All of them were 0. I understand this is a class imbalance problem. From this link, I understand that, I might have to create extra output labels.Unfortunately, I haven't been able to figure out how to incorporate that into a simple neural network in keras.
nclasses = 1000
# if we wanted to maximize an imbalance problem!
#class_weight = {k: len(Y_train)/(nclasses*(Y_train==k).sum()) for k in range(nclasses)}
inp = Input(shape=[X_train.shape[1]])
x = Dense(5000, activation='relu')(inp)
x = Dense(4000, activation='relu')(x)
x = Dense(3000, activation='relu')(x)
x = Dense(2000, activation='relu')(x)
x = Dense(nclasses, activation='sigmoid')(x)
model = Model(inputs=[inp], outputs=[x])
adam=keras.optimizers.adam(lr=0.00001)
model.compile('adam', 'binary_crossentropy')
history = model.fit(
X_train, Y_train, batch_size=32, epochs=50,verbose=0,shuffle=False)
Could anyone help me with the code here and I would also highly appreciate if you could suggest a good 'accuracy' metric for this problem?
Thanks a lot :) :)
I have a similar problem and unfortunately have no answer for most of the questions. Especially the class imbalance problem.
In terms of metric there are several possibilities: In my case I use the top 1/2/3/4/5 results and check if one of them is right. Because in your case you always have the same amount of labels=1 you could take your top 10 results and see how many percent of them are right and average this result over your batch size. I didn't find a possibility to include this algorithm as a keras metric. Instead, I wrote a callback, which calculates the metric on epoch end on my validation data set.
Also, if you predict the top n results on a test dataset, see how many times each class is predicted. The Counter Class is really convenient for this purpose.
Edit: If found a method to include class weights without splitting the output.
You need a numpy 2d array containing weights with shape [number classes to predict, 2 (background and signal)].
Such an array could be calculated with this function:
def calculating_class_weights(y_true):
from sklearn.utils.class_weight import compute_class_weight
number_dim = np.shape(y_true)[1]
weights = np.empty([number_dim, 2])
for i in range(number_dim):
weights[i] = compute_class_weight('balanced', [0.,1.], y_true[:, i])
return weights
The solution is now to build your own binary crossentropy loss function in which you multiply your weights yourself:
def get_weighted_loss(weights):
def weighted_loss(y_true, y_pred):
return K.mean((weights[:,0]**(1-y_true))*(weights[:,1]**(y_true))*K.binary_crossentropy(y_true, y_pred), axis=-1)
return weighted_loss
weights[:,0] is an array with all the background weights and weights[:,1] contains all the signal weights.
All that is left is to include this loss into the compile function:
model.compile(optimizer=Adam(), loss=get_weighted_loss(class_weights))

Udacity Deep Learning, Assignment 3, Part 3: Tensorflow dropout function

I am now on assignment 3 of the Udacity Deep Learning class. I have most of it completed and it's working but I noticed that problem 3, which is about using 'dropout' with tensorflow, seems to degrade my performance rather than improve it.
So I think I'm doing something wrong. I'll put my full code here. If someone can explain to me how to properly use dropout, I'd appreciate it. (Or confirm I'm using it correctly and it's just not helping in this case.). It drops accuracy from over 94% (without dropout) down to 91.5%. If you aren't using L2 regularization, the degradation is even larger.
def create_nn(dataset, weights_hidden, biases_hidden, weights_out, biases_out):
# Original layer
logits = tf.add(tf.matmul(tf_train_dataset, weights_hidden), biases_hidden)
# Drop Out layer 1
logits = tf.nn.dropout(logits, 0.5)
# Hidden Relu layer
logits = tf.nn.relu(logits)
# Drop Out layer 2
logits = tf.nn.dropout(logits, 0.5)
# Output: Connect hidden layer to a node for each class
logits = tf.add(tf.matmul(logits, weights_out), biases_out)
return logits
# Create model
batch_size = 128
hidden_layer_size = 1024
beta = 1e-3
graph = tf.Graph()
with graph.as_default():
# Input data. For the training data, we use a placeholder that will be fed
# at run time with a training minibatch.
tf_train_dataset = tf.placeholder(tf.float32,
shape=(batch_size, image_size * image_size))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
tf_valid_dataset = tf.constant(valid_dataset)
tf_test_dataset = tf.constant(test_dataset)
# Variables.
weights_hidden = tf.Variable(
#tf.truncated_normal([image_size * image_size, num_labels]))
tf.truncated_normal([image_size * image_size, hidden_layer_size]))
#biases = tf.Variable(tf.zeros([num_labels]))
biases_hidden = tf.Variable(tf.zeros([hidden_layer_size]))
weights_out = tf.Variable(tf.truncated_normal([hidden_layer_size, num_labels]))
biases_out = tf.Variable(tf.zeros([num_labels]))
# Training computation.
#logits = tf.matmul(tf_train_dataset, weights_out) + biases_out
logits = create_nn(tf_train_dataset, weights_hidden, biases_hidden, weights_out, biases_out)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
loss += beta * (tf.nn.l2_loss(weights_hidden) + tf.nn.l2_loss(weights_out))
# Optimizer.
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
#valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights_out) + biases_out)
#test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights_out) + biases_out)
valid_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf_valid_dataset, weights_hidden) + biases_hidden), weights_out) + biases_out)
test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.matmul(tf_test_dataset, weights_hidden) + biases_hidden), weights_out) + biases_out)
num_steps = 10000
with tf.Session(graph=graph) as session:
tf.global_variables_initializer().run()
print("Initialized")
for step in range(num_steps):
# Pick an offset within the training data, which has been randomized.
# Note: we could use better randomization across epochs.
offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
#offset = (step * batch_size) % (3*128 - batch_size)
#print(offset)
# Generate a minibatch.
batch_data = train_dataset[offset:(offset + batch_size), :]
batch_labels = train_labels[offset:(offset + batch_size), :]
# Prepare a dictionary telling the session where to feed the minibatch.
# The key of the dictionary is the placeholder node of the graph to be fed,
# and the value is the numpy array to feed to it.
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
_, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
if (step % 500 == 0):
print("Minibatch loss at step %d: %f" % (step, l))
print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))
print("Validation accuracy: %.1f%%" % accuracy(valid_prediction.eval(), valid_labels))
print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
You would need to turn off dropout during inference. It may not be obvious at first, but the fact that dropout is hardcoded in the NN architecture means it will affect the test data during inference. You can avoid this by creating a placeholder keep_prob, rather than providing the value 0.5 directly. For example:
keep_prob = tf.placeholder(tf.float32)
logits = tf.nn.dropout(logits, keep_prob)
To turn on dropout during training, set the keep_prob value to 0.5:
feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, keep_prob: 0.5}
During inference/evaluation, you should be able to do something like this to set keep_prob to 1.0 in eval:
accuracy.eval(feed_dict={x: test_prediction, y_: test_labels, keep_prob: 1.0}
EDIT:
Since the issue does not seem to be that dropout is used at inference, the next culprit would be that the dropout is too high for this network size. You can potentially try decreasing the dropout to 20% (i.e. keep_prob=0.8), or increasing the size of the network to give the model an opportunity to learn the representations.
I actually gave it a try with your code, and I'm getting around ~93.5% with 20% dropout with this network size. I have added some additional resources below, including the original Dropout paper to help clarify the intuition behind it, and expands on more tips when using dropout such as increasing the learning rate.
References:
Deep MNIST for Experts: has an example on the above (dropout on/off) using MNIST
Dropout Regularization in Deep Learning Models With Keras
Dropout: A Simple Way to Prevent Neural Networks from Overfitting
2 things I think can cause the problem.
First of all I would not recommend using dropout in first layer (that too 50%, use lower, in range 10-25% if you have to)) as when you use such a high dropout even higher level features are not learnt and propagated to deeper layers. Also try a range of dropouts from 10% to 50% and see how accuracy changes. There is no way to know beforehand what value will work
Secondly, you do not usually use dropout at inference. To fix that pass in keep_prob parameter of dropout as a placeholder and set it to 1 when inferencing.
Also, if the accuracy values you state are training accuracy then there may not even be much of a problem in first place as dropout will usually decrease training accuracy by small amounts as you are not overfitting, its the test/validation accuracy that needs to be closely monitored

Tensorflow- How to display accuracy rate for a linear regression model

I have a linear regression model that seems to work. I first load the data into X and the target column into Y, after that I implement the following...
X_train, X_test, Y_train, Y_test = train_test_split(
X_data,
Y_data,
test_size=0.2
)
rng = np.random
n_rows = X_train.shape[0]
X = tf.placeholder("float")
Y = tf.placeholder("float")
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
pred = tf.add(tf.multiply(X, W), b)
cost = tf.reduce_sum(tf.pow(pred-Y, 2)/(2*n_rows))
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(cost)
init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run([init, init_local])
for epoch in range(FLAGS.training_epochs):
avg_cost = 0
for (x, y) in zip(X_train, Y_train):
sess.run(optimizer, feed_dict={X:x, Y:y})
# display logs per epoch step
if (epoch + 1) % FLAGS.display_step == 0:
c = sess.run(
cost,
feed_dict={X:X_train, Y:Y_train}
)
print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(c))
print("Optimization Finished!")
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
I cannot figure out how to print out the model's accuracy. For example, in sklearn, it is simple, if you have a model you just print model.score(X_test, Y_test). But I do not know how to do this in tensorflow or if it is even possible.
I think I'd be able to calculate the Mean Squared Error. Does this help in any way?
EDIT
I tried implementing tf.metrics.accuracy as suggested in the comments but I'm having an issue implementing it. The documentation says it takes 2 arguments, labels and predictions, so I tried the following...
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
But this gives me an error...
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value accuracy/count
[[Node: accuracy/count/read = IdentityT=DT_FLOAT, _class=["loc:#accuracy/count"], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
How exactly does one implement this?
Turns out, since this is a multi-class Linear Regression problem, and not a classification problem, that tf.metrics.accuracy is not the right approach.
Instead of displaying the accuracy of my model in terms of percentage, I instead focused on reducing the Mean Square Error (MSE) instead.
From looking at other examples, tf.metrics.accuracy is never used for Linear Regression, and only classification. Normally tf.metric.mean_squared_error is the right approach.
I implemented two ways of calculating the total MSE of my predictions to my testing data...
pred = tf.add(tf.matmul(X, W), b)
...
...
Y_pred = sess.run(pred, feed_dict={X:X_test})
mse = tf.reduce_mean(tf.square(Y_pred - Y_test))
OR
mse = tf.metrics.mean_squared_error(labels=Y_test, predictions=Y_pred)
They both do the same but obviously the second approach is more concise.
There's a good explanation of how to measure the accuracy of a Linear Regression model here.
I didn't think this was clear at all from the Tensorflow documentation, but you have to declare the accuracy operation, and then initialize all global and local variables, before you run the accuracy calculation:
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
# ...
init_global = tf.global_variables_initializer
init_local = tf.local_variables_initializer
sess.run([init_global, init_local])
# ...
# run accuracy calculation
I read something on Stack Overflow about the accuracy calculation using local variables, which is why the local variable initializer is necessary.
After reading the complete code you posted, I noticed a couple other things:
In your calculation of pred, you use
pred = tf.add(tf.multiply(X, W), b). tf.multiply performs element-wise multiplication, and will not give you the fully connected layers you need for a neural network (which I am assuming is what you are ultimately working toward, since you're using TensorFlow). To implement fully connected layers, where each layer i (including input and output layers) has ni nodes, you need separate weight and bias matrices for each pair of successive layers. The dimensions of the i-th weight matrix (the weights between the i-th layer and the i+1-th layer) should be (ni, ni + 1), and the i-th bias matrix should have dimensions (ni + 1, 1). Then, going back to the multiplication operation - replace tf.multiply with tf.matmul, and you're good to go. I assume that what you have is probably fine for a single-class linear regression problem, but this is definitely the way you want to go if you plan to solve a multiclass regression problem or implement a deeper network.
Your weight and bias tensors have a shape of (1, 1). You give the variables the initial value of np.random.randn(), which according to the documentation, generates a single floating point number when no arguments are given. The dimensions of your weight and bias tensors need to be supplied as arguments to np.random.randn(). Better yet, you can actually initialize these to random values in Tensorflow: W = tf.Variable(tf.random_normal([dim0, dim1], seed = seed) (I always initialize random variables with a seed value for reproducibility)
Just a note in case you don't know this already, but non-linear activation functions are required for neural networks to be effective. If all your activations are linear, then no matter how many layers you have, it will reduce to a simple linear regression in the end. Many people use relu activation for hidden layers. For the output layer, use softmax activation for multiclass classification problems where the output classes are exclusive (i.e., where only one class can be correct for any given input), and sigmoid activation for multiclass classification problems where the output classes are not exlclusive.

Resources