TensorFlow: Neural Network accuracy always 100% on train and test sets - python-3.x

I created a TensorFlow neural network that has 2 hidden layers with 10 units each using ReLU activations and Xavier Initialization for the weights. The output layer has 1 unit outputting binary classification (0 or 1) using the sigmoid activation function to classify whether it believes a passenger on the titanic survived based on the input features.
(The only code omitted is the load_data function which populates the variables X_train, Y_train, X_test, Y_test used later in the program)
Parameters
# Hyperparams
learning_rate = 0.001
lay_dims = [10,10, 1]
# Other params
m = X_train.shape[1]
n_x = X_train.shape[0]
n_y = Y_train.shape[0]
Inputs
X = tf.placeholder(tf.float32, shape=[X_train.shape[0], None], name="X")
norm = tf.nn.l2_normalize(X, 0) # normalize inputs
Y = tf.placeholder(tf.float32, shape=[Y_train.shape[0], None], name="Y")
Initialize Weights & Biases
W1 = tf.get_variable("W1", [lay_dims[0],n_x], initializer=tf.contrib.layers.xavier_initializer())
b1 = tf.get_variable("b1", [lay_dims[0],1], initializer=tf.zeros_initializer())
W2 = tf.get_variable("W2", [lay_dims[1],lay_dims[0]], initializer=tf.contrib.layers.xavier_initializer())
b2 = tf.get_variable("b2", [lay_dims[1],1], initializer=tf.zeros_initializer())
W3 = tf.get_variable("W3", [lay_dims[2],lay_dims[1]], initializer=tf.contrib.layers.xavier_initializer())
b3 = tf.get_variable("b3", [lay_dims[2],1], initializer=tf.zeros_initializer())
Forward Prop
Z1 = tf.add(tf.matmul(W1,X), b1)
A1 = tf.nn.relu(Z1)
Z2 = tf.add(tf.matmul(W2,A1), b2)
A2 = tf.nn.relu(Z2)
Y_hat = tf.add(tf.matmul(W3,A2), b3)
BackProp
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=tf.transpose(Y_hat), labels=tf.transpose(Y)))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Session
# Initialize
init = tf.global_variables_initializer()
with tf.Session() as sess:
# Initialize
sess.run(init)
# Normalize Inputs
sess.run(norm, feed_dict={X:X_train, Y:Y_train})
# Forward/Backprob and update weights
for i in range(10000):
c, _ = sess.run([cost, optimizer], feed_dict={X:X_train, Y:Y_train})
if i % 100 == 0:
print(c)
correct_prediction = tf.equal(tf.argmax(Y_hat), tf.argmax(Y))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print("Training Set:", sess.run(accuracy, feed_dict={X: X_train, Y: Y_train}))
print("Testing Set:", sess.run(accuracy, feed_dict={X: X_test, Y: Y_test}))
After running running 10,000 epochs of training, the cost goes down each time so it shows that the learning_rate is okay and that the cost function appears normal. However, after training, all of my Y_hat values (predictions on the training set) are 1 (predicting the passenger survived). So basically the prediction just outputs y=1 for every training example.
Also, when I run tf.argmax on Y_hat, the result is a matrix of all 0's. The same thing is happening when tf.argmax is applied to Y (ground truth labels) which is odd because Y consists of all the correct labels for the training examples.
Any help is greatly appreciated. Thanks.

I assume your Y_hat is a (1,m) matrix with m is the number of training example. Then the tf.argmax(Y_hat) will give all 0. According to tensorflow documentation, argmax
Returns the index with the largest value across axes of a tensor.
If you do not pass in axis, the axis is set as 0. Because the axis 0 only has one value, the returned index becomes 0 all the time.

I know I am late but I'd would also point out that since your label matrix is of shape (n,1), i.e., there is only 1 class to predict, and hence, cross entropy doesn't make sense. In such cases you should use something different for calculating the cost (may be a mean squared error or something similar).
I had similar problem recently while I was working on my college project and I found a work around, I turned this binary output into 2 classes such as present and absent so if it's present it's [1,0]. I know this is not the best way to do it but it can be helpful when you need the working thing instantly.

Related

TF | How to predict from CNN after training is done

Trying to work with the framework provided in the course Stanford cs231n, given the code below.
I can see the accuracy getting better and the net is trained however after the training process and checking the results on the validation set, how would I go to input one image into the model and see its prediction?
I have searched around and couldn't find some built in predict function in tensorflow as there is in keras.
Initializing the net and its parameters
# clear old variables
tf.reset_default_graph()
# setup input (e.g. the data that changes every batch)
# The first dim is None, and gets sets automatically based on batch size fed in
X = tf.placeholder(tf.float32, [None, 30, 30, 1])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)
def simple_model(X,y):
# define our weights (e.g. init_two_layer_convnet)
# setup variables
Wconv1 = tf.get_variable("Wconv1", shape=[7, 7, 1, 32]) # Filter of size 7x7 with depth of 3. No. of filters is 32
bconv1 = tf.get_variable("bconv1", shape=[32])
W1 = tf.get_variable("W1", shape=[4608, 360]) # 5408 is 13x13x32 where 13x13 is the output of 7x7 filter on 32x32 image with padding of 2.
b1 = tf.get_variable("b1", shape=[360])
# define our graph (e.g. two_layer_convnet)
a1 = tf.nn.conv2d(X, Wconv1, strides=[1,2,2,1], padding='VALID') + bconv1
h1 = tf.nn.relu(a1)
h1_flat = tf.reshape(h1,[-1,4608])
y_out = tf.matmul(h1_flat,W1) + b1
return y_out
y_out = simple_model(X,y)
# define our loss
total_loss = tf.losses.hinge_loss(tf.one_hot(y,360),logits=y_out)
mean_loss = tf.reduce_mean(total_loss)
# define our optimizer
optimizer = tf.train.AdamOptimizer(5e-4) # select optimizer and set learning rate
train_step = optimizer.minimize(mean_loss)
Function for evaluating the model whether for training or validation and plots the results:
def run_model(session, predict, loss_val, Xd, yd,
epochs=1, batch_size=64, print_every=100,
training=None, plot_losses=False):
# Have tensorflow compute accuracy
correct_prediction = tf.equal(tf.argmax(predict,1), y)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# shuffle indicies
train_indicies = np.arange(Xd.shape[0])
np.random.shuffle(train_indicies)
training_now = training is not None
# setting up variables we want to compute and optimize
# if we have a training function, add that to things we compute
variables = [mean_loss,correct_prediction,accuracy]
if training_now:
variables[-1] = training
# counter
iter_cnt = 0
for e in range(epochs):
# keep track of losses and accuracy
correct = 0
losses = []
# make sure we iterate over the dataset once
for i in range(int(math.ceil(Xd.shape[0]/batch_size))):
# generate indicies for the batch
start_idx = (i*batch_size)%Xd.shape[0]
idx = train_indicies[start_idx:start_idx+batch_size]
# create a feed dictionary for this batch
feed_dict = {X: Xd[idx,:],
y: yd[idx],
is_training: training_now }
# get batch size
actual_batch_size = yd[idx].shape[0]
# have tensorflow compute loss and correct predictions
# and (if given) perform a training step
loss, corr, _ = session.run(variables,feed_dict=feed_dict)
# aggregate performance stats
losses.append(loss*actual_batch_size)
correct += np.sum(corr)
# print every now and then
if training_now and (iter_cnt % print_every) == 0:
print("Iteration {0}: with minibatch training loss = {1:.3g} and accuracy of {2:.2g}"\
.format(iter_cnt,loss,np.sum(corr)/actual_batch_size))
iter_cnt += 1
total_correct = correct/Xd.shape[0]
total_loss = np.sum(losses)/Xd.shape[0]
print("Epoch {2}, Overall loss = {0:.3g} and accuracy of {1:.3g}"\
.format(total_loss,total_correct,e+1))
if plot_losses:
plt.plot(losses)
plt.grid(True)
plt.title('Epoch {} Loss'.format(e+1))
plt.xlabel('minibatch number')
plt.ylabel('minibatch loss')
plt.show()
return total_loss,total_correct
The functions calls that trains the model
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print('Training')
run_model(sess,y_out,mean_loss,x_train,y_train,1,64,100,train_step,True)
print('Validation')
run_model(sess,y_out,mean_loss,x_val,y_val,1,64)
You do not need to go far, you simply pass your new (test) feature matrix X_test into your network and perform a forward pass - the output layer is the prediction. So the code is something like this
session.run(y_out, feed_dict={X: X_test})

TensorFlow Trained Model Predicts Always Zero

I have one simple TensorFlow model and accuracy for that is 1. But when I try to predict some new inputs it always returns Zero(0).
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
# generate data
np.random.seed(10)
#inputs = np.random.uniform(low=1.2, high=1.5, size=[5000, 150]).astype('float32')
inputs = np.random.randint(low=50, high=500, size=[5000, 150])
label = np.random.uniform(low=1.3, high=1.4, size=[5000, 1])
# reverse_label = 1 - label
reverse_label = np.random.uniform(
low=1.3, high=1.4, size=[5000, 1])
reverse_label1 = np.random.randint(
low=80, high=140, size=[5000, 1])
#labels = np.append(label, reverse_label, 1)
#labels = np.append(labels, reverse_label1, 1)
labels = reverse_label1
print(inputs)
print(labels)
# parameters
learn_rate = 0.001
epochs = 100
n_input = 150
n_hidden = 15
n_output = 1
# set weights/biases
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
b0 = tf.Variable(tf.truncated_normal([n_hidden], stddev=0.2, seed=0))
b1 = tf.Variable(tf.truncated_normal([n_output], stddev=0.2, seed=0))
w0 = tf.Variable(tf.truncated_normal([n_input, n_hidden], stddev=0.2, seed=0))
w1 = tf.Variable(tf.truncated_normal([n_hidden, n_output], stddev=0.2, seed=0))
# step function
def returnPred(x, w0, w1, b0, b1):
z1 = tf.add(tf.matmul(x, w0), b0)
a2 = tf.nn.relu(z1)
z2 = tf.add(tf.matmul(a2, w1), b1)
h = tf.nn.relu(z2)
return h # return the first response vector from the
y_ = returnPred(x, w0, w1, b0, b1) # predict operation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=y_, labels=y)) # calculate loss between prediction and actual
model = tf.train.AdamOptimizer(learning_rate=learn_rate).minimize(
loss) # apply gradient descent based on loss
init = tf.global_variables_initializer()
tf.Session = sess
sess.run(init) # initialize graph
for step in range(0, epochs):
sess.run([model, loss], feed_dict={x: inputs, y: labels}) # train model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
inp = np.random.randint(low=50, high=500, size=[5, 150])
print(sess.run(tf.argmax(y_, 1), feed_dict={x: inp})) # predict some new inputs
All functions are working properly and my problem is with the latest line of code. I tried only "y_" instead "tf.argmax(y_, 1)" but not worked too.
How can I fix that?
Regards,
There are multiple mistakes in your code.
Starting with this lines of code:
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: inputs, y: labels})) # print accuracy
You are performing linear regression but you are checking accuracy with that of logistic regression methodology. If you want to see how your linear regression network is performing, print the loss. Ensure that your loss is decreasing after each epoch of training.
If you look into that accuracy code, run the following code:
print(y_.get_shape()) # Outputs (?, 1)
There is only one input and both of your function tf.argmax(y,1) and tf.argmax(y_,1) will always return [0,0,..]. So as a result your accuracy will be always 1.0. Delete those three lines of code.
Next, to get the outputs, just run the following code:
print(sess.run(y_, feed_dict={x: inp}))
But since your data is random, don't expect good set of outputs.

tf.metrics.accuracy not working as intended

I have linear regression model that seems to be working fine, but I want to display the accuracy of the model.
First, I initialize the variables and placeholders...
X_train, X_test, Y_train, Y_test = train_test_split(
X_data,
Y_data,
test_size=0.2
)
n_rows = X_train.shape[0]
X = tf.placeholder(tf.float32, [None, 89])
Y = tf.placeholder(tf.float32, [None, 1])
W_shape = tf.TensorShape([89, 1])
b_shape = tf.TensorShape([1])
W = tf.Variable(tf.random_normal(W_shape))
b = tf.Variable(tf.random_normal(b_shape))
pred = tf.add(tf.matmul(X, W), b)
cost = tf.reduce_sum(tf.pow(pred-Y, 2)/(2*n_rows-1))
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(cost)
X_train has shape (6702, 89) and Y_train has shape (6702, 1). Next I run the session and I display the cost per epoch as well as the total MSE...
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(FLAGS.training_epochs):
avg_cost = 0
for (x, y) in zip(X_train, Y_train):
x = np.reshape(x, (1, 89))
y = np.reshape(y, (1,1))
sess.run(optimizer, feed_dict={X:x, Y:y})
# display logs per epoch step
if (epoch + 1) % FLAGS.display_step == 0:
c = sess.run(
cost,
feed_dict={X:X_train, Y:Y_train}
)
y_pred = sess.run(pred, feed_dict={X:X_test})
test_error = r2_score(Y_test, y_pred)
print(test_error)
print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(c))
print("Optimization Finished!")
pred_y = sess.run(pred, feed_dict={X:X_test})
mse = tf.reduce_mean(tf.square(pred_y - Y_test))
print("MSE: %4f" % sess.run(mse))
This all seems to work correctly. However, now I want to see the accuracy of my model, so I want to implement tf.metrics.accuracy. The documentation says it has 2 arguments, labels and predictions. I added the following next...
accuracy, accuracy_op = tf.metrics.accuracy(labels=Y_test, predictions=pred)
init_local = tf.local_variables_initializer()
sess.run(init_local)
print(sess.run(accuracy))
Apparently I need to initialize local variales, however I think I am doing something wrong because the accuracy result that gets printed out is 0.0.
I searched everywhere for a working example but I cannot get it to work for my model, what is the proper way to implement it?
I think you are learning a regression model. The tf.metrics.accuracy is supposed to run for a classification model.
When your model predicts 1.2 but your target value is 1.15, it does not make sense to use accuracy to measure whether this is a correct prediction. accuracy is for classification problems (e.g., mnist), when your model predicts a digit to be '9' and your target image is also '9': this is a correct prediction and you get full credit; Or when your model predicts a digit to be '9' but your target image is '6': this is a wrong prediction and you get no credit.
For your regression problem, we measure the difference between prediction and target value either by absolute error - |target - prediction| or mean squared error - the one you used in your MSE calculation. Thus tf.metrics.mean_squared_error or tf.metrics.mean_absolute_error is the one you should use to measure the prediction error for regression models.

TensorFlow variable configuration

I successfully implemented a feed-forward algorithm in TensorFlow that looked as follows...
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# tf Graph Input
x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes
# set model weights
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits) # Softmax
# minimize error using cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred), reduction_indices=1))
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(cost)
# initializing the variables
init = tf.global_variables_initializer()
...and the training cycle was as follows...
# launch the graph
with tf.Session() as sess:
sess.run(init)
# training cycle
for epoch in range(FLAGS.training_epochs):
avg_cost = 0
total_batch = int(mnist.train.num_examples/FLAGS.batch_size)
# loop over all batches
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(FLAGS.batch_size)
_, c = sess.run([optimizer, cost], feed_dict={x: batch_xs, y: batch_ys})
...the rest of the code is not necessary. Up until this point the code works perfect. It is important to note that my batch_size is 100. The problem is I am using tf.placeholder for my values but in fact I need to change them to use tf.get_variable. The first thing I did was change the following...
# tf Graph Input
x = tf.get_variable("input_image", shape=[100,784], dtype=tf.float32)
y = tf.placeholder(shape=[100,10], name='input_label', dtype=tf.float32) # 0-9 digits recognition => 10 classes
# set model weights
W = tf.get_variable("weights", shape=[784, 10], dtype=tf.float32, initializer=tf.random_normal_initializer())
b = tf.get_variable("biases", shape=[1, 10], dtype=tf.float32, initializer=tf.zeros_initializer())
# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits) # Softmax
# minimize error using cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred), reduction_indices=1))
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(cost)
# initializing the variables
init = tf.global_variables_initializer()
...so far so good. But now I am trying to implement the training cycle and this is where I run into issues. I run the exact same training cycle as above with batch_size = 100 and I get the following errors...
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input 0 of node GradientDescent/update_input_image/ApplyGradientDescent was passed float from _recv_input_image_0:0 incompatible with expected float_ref.
How can I fix this issue? The error is coming from the following line...
_, c = sess.run([optimizer, cost], feed_dict={x: batch_xs, y: batch_ys})
It's unclear to me why you needed to change x to a tf.Variable when you are continuing to feed a value for it. There are two workarounds (not counting the case where you could just revert x to being tf.placeholder() as in the working code):
The error is being raised because the optimizer is attempting to apply an SGD update to the value that you're feeding (which leads to a confusing runtime type error). You could prevent optimizer from doing this by passing trainable=False when constructing x:
x = tf.get_variable("input_image", shape=[100, 784], dtype=tf.float32,
trainable=False)
Since x is a variable, you could assign the image to the variable in a separate step before running the optimizer.
x = tf.get_variable("input_image", shape=[100, 784], dtype=tf.float32)
x_placeholder = tf.placeholder(tf.float32, shape=[100, 784])
assign_x_op = x.assign(x_placeholder).op
# ...
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(FLAGS.batch_size)
# Assign the contents of `batch_xs` to variable `x`.
sess.run(assign_x_op, feed_dict={x_placeholder: batch_xs})
# N.B. Now you do not need to feed `x`.
_, c = sess.run([optimizer, cost], feed_dict={y: batch_ys})
This latter version would allow you to perform gradient descent on the contents of the image (which might be why you'd want to store it in a variable in the first place).

tensorflow stop optimizing when activation step is in a function

I'm trying to reproduce the result of https://www.tensorflow.org/tutorials/mnist/beginners/
So I designed several functions to take care of the training step such as these two:
def layer_computation(previous_layer_output, weights, bias, activation):
return activation(tf.add(tf.matmul(previous_layer_output, weights), bias))
def multilayer_perceptron_forward(x, weights, biaises, activations):
return reduce(lambda output_layer, args: layer_computation(output_layer, *args),
zip(weights, biaises, activations), x)
By using these two functions for the training
def training(session,
features, labels,
mlp,
# cost = (tf.reduce_mean, ),
optimizer=tf.train.GradientDescentOptimizer,
epochs=100, learning_rate=0.001, display=100):
x = tf.placeholder("float")
y = tf.placeholder("float")
weights, biases, activations = mlp
pred = multilayer_perceptron_forward(x, weights, biases, activations)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
opti = optimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
session.run(init)
for i in range(1, epochs + 1):
batch_size = 100
avg_cost = 0
number_of_bacth = int(features.shape[0]/batch_size)
for j in range(number_of_bacth):
my_x = features[j*100:(j+1)*100, :]
my_y = labels[j*100:(j+1)*100, :]
_, c = session.run([opti, cost], feed_dict={x: my_x,
y: my_y})
avg_cost += c/number_of_bacth
if i % display == 0:
print("Epoch {i} cost = {cost}".format(i=i, cost=avg_cost))
The optimization stops at a cost of 2.3... and the overall accuracy is of 10% whereas in the example it get closer to zero and the accuracy is close to 96%. Does anyone have an explanation for this peculiar behavior?
PS when I'm using layer_computation in the source code in the example I also get stuck at a cost of 2.3.
I caught the error, I was trying to perform back-propagation on the last layer. This question may have been better on cross-validated.

Resources