While trying to run my 3D Convolutional Neural Network, I am getting the following error. What could be the reason ?
ResourceExhaustedError (see above for traceback): OOM when allocating
tensor with shape[54080,1024] [[Node: Variable_10/Adam/Assign =
Assign[T=DT_FLOAT, _class=["loc:#Variable_10"], use_locking=true,
validate_shape=true,
_device="/job:localhost/replica:0/task:0/gpu:0"](Variable_10/Adam, zeros_4)]]
This is the code i have used:
import tensorflow as tf
import numpy as np
IMG_SIZE_PX = 50
SLICE_COUNT = 20
n_classes = 2
batch_size = 10
x = tf.placeholder('float')
y = tf.placeholder('float')
keep_rate = 0.8
def conv3d(x, W):
return tf.nn.conv3d(x, W, strides=[1,1,1,1,1], padding='SAME')
def maxpool3d(x):
return tf.nn.max_pool3d(x, ksize=[1,2,2,2,1], strides=[1,2,2,2,1], padding='SAME')
def convolutional_neural_network(x):
weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,3,1,32])),
'W_conv2':tf.Variable(tf.random_normal([3,3,3,32,64])),
'W_fc':tf.Variable(tf.random_normal([54080,1024])),
'out':tf.Variable(tf.random_normal([1024, n_classes]))}
biases = {'b_conv1':tf.Variable(tf.random_normal([32])),
'b_conv2':tf.Variable(tf.random_normal([64])),
'b_fc':tf.Variable(tf.random_normal([1024])),
'out':tf.Variable(tf.random_normal([n_classes]))}
x = tf.reshape(x, shape=[-1, IMG_SIZE_PX, IMG_SIZE_PX, SLICE_COUNT, 1])
conv1 = tf.nn.relu(conv3d(x, weights['W_conv1']) + biases['b_conv1'])
conv1 = maxpool3d(conv1)
conv2 = tf.nn.relu(conv3d(conv1, weights['W_conv2']) + biases['b_conv2'])
conv2 = maxpool3d(conv2)
fc = tf.reshape(conv2,[-1, 54080])
fc = tf.nn.relu(tf.matmul(fc, weights['W_fc'])+biases['b_fc'])
fc = tf.nn.dropout(fc, keep_rate)
output = tf.matmul(fc, weights['out'])+biases['out']
return output
much_data = np.load('muchdata-50-50-20.npy')
# If you are working with the basic sample data, use maybe 2 instead of 100 here... you don't have enough data to really do this
train_data = much_data[:-100]
validation_data = much_data[-100:]
def train_neural_network(x):
prediction = convolutional_neural_network(x)
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y) )
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(cost)
hm_epochs = 10
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
successful_runs = 0
total_runs = 0
for epoch in range(hm_epochs):
epoch_loss = 0
for data in train_data:
total_runs += 1
try:
X = data[0]
Y = data[1]
_, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y})
epoch_loss += c
successful_runs += 1
except Exception as e:
# I am passing for the sake of notebook space, but we are getting 1 shaping issue from one
# input tensor. Not sure why, will have to look into it. Guessing it's
# one of the depths that doesn't come to 20.
pass
#print(str(e))
print('Epoch', epoch+1, 'completed out of',hm_epochs,'loss:',epoch_loss)
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))
print('Done. Finishing accuracy:')
print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))
print('fitment percent:',successful_runs/total_runs)
train_neural_network(x)
I am running this using tensorflow-gpu version. I am using a GTX970M and have installed CUDA and imported the cudnn files properly. When running the last command i get the following error. Kindly help !
You have ran out of memory for some reasons.
It can be that you have some application using your GPU (another tensorflow session still active for example).Check if that's not the case. (you can use nvidia-smi to monitor that).
If that's not, it will mainly be because of the size of your model and the size of your GPU's memory. What you can do is try to launch it in CPU mode, list all your variables with tf.Variables, do the math of how much memory it represents, and see if it fit in your GPU.
Until you have done this, there not much more advice I can provide.
Related
In Python, Tensorflow: I trained and applied a Neural Network using Tensorflow, now I need to save its progress to train it further at a later point in time.
I went through a lot of configurations using
saver = tf.train.Saver()
saver.restore()
tf.train.import_meta_graph('model/model.ckpt.meta')
tf.train.export_meta_graph('model/model.ckpt.meta')
etc...
but it always produces an Error.
Here is my code. It is similar to the Mnist example codes, but uses custom generated input and has a single, continuous output neuron.
x = tf.placeholder('float', [None, 4000]) # 4000 is my structure, just an example
y = tf.placeholder('float') # And I need a single, continuous output
def train_neural_network(x):
testdata_images, testdata_labels = generate_training_data_batch(size = 10)
#Generates test data
data=[]
for i in range(how_many_batches):
data.append(generate_training_data_batch(size = 10))
#Generates training data
prediction = neural_network_model(x)
# neural_network_model() is defined as a 4000x15x15x10x1 neural network
cost = tf.reduce_mean( tf.square( tf.subtract(y, prediction) ) )
optimizer = tf.train.AdamOptimizer(0.01).minimize(cost)
hm_epochs = 10
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(hm_epochs):
epoch_loss = 0
for i in range(how_many_batches):
epoch_x, epoch_y = data[i]
_, c = sess.run([optimizer, cost], feed_dict = {x: epoch_x, y: epoch_y})
epoch_loss += c
accuracy1 = tf.subtract(y, prediction)
result = sess.run(accuracy1, feed_dict={x: epoch_x, y: epoch_y})
print(result)
# This is just here so I can see what is going on
saver.save(sess, 'model/model.ckpt')
tf.train.export_meta_graph('model/model.ckpt.meta')
tf.reset_default_graph()
Later in the same file I want to use the saved neural network to make some predictions with it:
train_neural_network(x)
X, Y = generate_training_data_batch(size = 1)
prediction = neural_network_model(x)
with tf.Session() as sess:
tf.train.import_meta_graph('model/model.ckpt.meta')
sess.run(tf.global_variables_initializer())
thought = sess.run(prediction, feed_dict={x: X})
print(Y, thought)
With this version i get the Error message
ValueError: Tensor("Variable:0", shape=(4000, 15), dtype=float32_ref) must be from the same graph as Tensor("Placeholder_35:0", shape=(?, 4000), dtype=float32).
I also got Error Messages like
ValueError: At least two variables have the same name: Variable/Adam
I am looking for a solution of this for a few weeks now, so I would be very relieved to finally get this sorted out.
I use the convolutional autoencoder neural network method to train my model and then save it, but when I restore my model to reconstruct the image which is similar to the training image, the reconstruction result is very bad and the loss is large. I am not sure if I am wrong with saving and reading files.
Training model and save it!
#--------------------------------------------------------------------------
x = tf.placeholder(tf.float32, [None, dim], name = "X")
y = tf.placeholder(tf.float32, [None, dim], name = "Y")
keepprob = tf.placeholder(tf.float32, name = "K")
pred = cae(x, weights, biases, keepprob, imgsize)["out"]
cost = tf.reduce_sum(tf.square(cae(x, weights, biases, keepprob,imgsize)["out"] - tf.reshape(y, shape=[-1, imgsize, imgsize, 1])))
learning_rate = 0.01
optm = tf.train.AdamOptimizer(learning_rate).minimize(cost)
#--------------------------------------------------------------------------
sess = tf.Session()
save_model = os.path.join(PATH,'temp_saved_model')
saver = tf.train.Saver()
tf.add_to_collection("COST", cost)
tf.add_to_collection("PRED", pred)
sess.run(tf.global_variables_initializer())
mean_img = np.zeros((dim))
batch_size = 100
n_epochs = 1000
for epoch_i in range(n_epochs):
for batch_i in range(ntrain // batch_size):
trainbatch = np.array(train)
trainbatch = np.array([img - mean_img for img in trainbatch])
sess.run(optm, feed_dict={x: trainbatch, y: trainbatch, keepprob: 1.})
save_path = saver.save(sess, save_model)
print('Model saved in file: %s' %save_path)
sess.close()
Restoring the model and try to reconstruct the image.
tf.reset_default_graph()
save_model = os.path.join(PATH + 'SaveModel/','temp_saved_model.meta')
imgsize = 64
dim = imgsize * imgsize
mean_img = np.zeros((dim))
with tf.Session() as sess:
saver = tf.train.import_meta_graph(save_model)
saver.restore(sess, tf.train.latest_checkpoint(PATH + 'SaveModel/'))
cost = tf.get_collection("COST")[0]
pred = tf.get_collection("PRED")[0]
graph = tf.get_default_graph()
x = graph.get_tensor_by_name("X:0")
y = graph.get_tensor_by_name("Y:0")
k = graph.get_tensor_by_name("K:0")
for i in range(10):
test_xs = np.array(data)
test = load_image(test_xs, imgsize)
test = np.array([img - mean_img for img in test])
print ("[%02d/%02d] cost: %.4f" % (i, 10, sess.run(cost, feed_dict={x: test, y: test, K: 1.})))
The loss value in the training process is 1.321..., but the reconstruction loss is 16545.10441... Is there something wrong in my code?
First make sure that Your Restore and Save functions are in Different files.
There are a few problems that I have debugged so far,
keepprob changes from 'K' to 'k' while building graph after restore.
You are facing same Images as Logits and labels (Doesn't make sense until you are trying to learn an Identity function)
You are calculating training cost before saving the model and validation/test cost after restoring the model.
Your code in saver
recon = sess.run(pred, feed_dict={x: testbatch, keepprob: 1.})
fig, axs = plt.subplots(2, n_examples, figsize=(15, 4))
for example_i in range(5):
axs[0][example_i].matshow(np.reshape(testbatch[example_i, :], (imgsize, imgsize)), cmap=plt.get_cmap('gray'))
axs[1][example_i].matshow(np.reshape(np.reshape(recon[example_i, ...], (dim,)) + mean_img, (imgsize, imgsize)), cmap=plt.get_cmap('gray'))
plt.show()
Your code in restore function
recon = sess.run(pred, feed_dict={x: test, k: 1.})
cost = sess.run(cost, feed_dict={x: test, y: test, k: 1.})
if (i % 2) == 0:
fig, axs = plt.subplots(2, n_examples, figsize=(15, 4))
for example_i in range(n_examples):
axs[0][example_i].matshow(np.reshape(test[example_i, :], (imgsize, imgsize)), cmap=plt.get_cmap('gray'))
axs[1][example_i].matshow(np.reshape(np.reshape(recon[example_i, ...], (dim,)) + mean_img, (imgsize, imgsize)), cmap=plt.get_cmap('gray'))
plt.show()
Also nowhere in your code are you printing/plotting cost even in your recover module you are plotting recon variable
If you are trying to test autoencoder-decoder pair, to generate the original Image, your model is a bit too small(Shallow), If that makes sense, try implementing it, if you are confused, check out this link. https://pgaleone.eu/neural-networks/deep-learning/2016/12/13/convolutional-autoencoders-in-tensorflow/
And in any case, feel free to add comments for further clarification.
I have made a convolutional neural network with tensorflow, i've trained it and tested it (about 98% accuracy)... I saved the model with
saver = tf.train.Saver()
saver.save(sess, 'model.ckpt')
Then i restored with the saver, but i always get an accuracy lower than 50%... why ?
Here's the code:
import tensorflow as tf
import matplotlib.pyplot as plt
import pickle
import numpy as np
with open('X_train.pickle', 'rb') as y:
u = pickle._Unpickler(y)
u.encoding = 'latin1'
X_train = u.load()
with open('X_test.pickle', 'rb') as y:
u = pickle._Unpickler(y)
u.encoding = 'latin1'
X_test = u.load()
X_test = np.array(X_test).reshape(-1, 2500)
with open('y_train.pickle', 'rb') as y:
u = pickle._Unpickler(y)
u.encoding = 'latin1'
y_train = u.load()
with open('y_test.pickle', 'rb') as y:
u = pickle._Unpickler(y)
u.encoding = 'latin1'
y_test = u.load()
n_classes = 3
batch_size = 100
x = tf.placeholder('float', [None, 2500])
y = tf.placeholder('float')
keep_rate = 0.8
keep_prob = tf.placeholder(tf.float32)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')
def maxpool2d(x):
# size of window movement of window
return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
def convolutional_neural_network(x):
weights = {'W_conv1':tf.Variable(tf.random_normal([5,5,1,32])),
'W_conv2':tf.Variable(tf.random_normal([5,5,32,64])),
'W_fc':tf.Variable(tf.random_normal([13*13*64,1024])),
'out':tf.Variable(tf.random_normal([1024, n_classes]))}
biases = {'b_conv1':tf.Variable(tf.random_normal([32])),
'b_conv2':tf.Variable(tf.random_normal([64])),
'b_fc':tf.Variable(tf.random_normal([1024])),
'out':tf.Variable(tf.random_normal([n_classes]))}
x = tf.reshape(x, shape=[-1, 50, 50, 1])
conv1 = tf.nn.relu(conv2d(x, weights['W_conv1']) + biases['b_conv1'])
conv1 = maxpool2d(conv1)
conv2 = tf.nn.relu(conv2d(conv1, weights['W_conv2']) + biases['b_conv2'])
conv2 = maxpool2d(conv2)
fc = tf.reshape(conv2,[-1, 13*13*64])
fc = tf.nn.relu(tf.matmul(fc, weights['W_fc'])+biases['b_fc'])
#fc = tf.nn.dropout(fc, keep_rate)
output = tf.matmul(fc, weights['out'])+biases['out']
return output
def use_neural_network(input_data):
prediction = convolutional_neural_network(x)
sess.run(tf.global_variables_initializer())
result = (sess.run(tf.argmax(prediction.eval(feed_dict={x:[input_data]}),1)))
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print('Accuracy:',accuracy.eval({x:X_test, y:y_test}))
return result
with tf.Session() as sess:
c = convolutional_neural_network(x)
saver = tf.train.Saver()
saver.restore(sess, "model.ckpt")
sample = X_train[432].reshape(2500)
res = use_neural_network(sample)
if res == [0]: print('Go straight')
elif res == [1]: print('Turn right')
else: print('Turn left')
img = sample.reshape(50,50)
plt.imshow(img)
plt.show()
sample = X_train[1222].reshape(2500)
res = use_neural_network(sample)
if res == [0]: print('Go straight')
elif res == [1]: print('Turn right')
else: print('Turn left')
img = sample.reshape(50,50)
plt.imshow(img)
plt.show()
sample = X_train[2986].reshape(2500)
res = use_neural_network(sample)
if res == [0]: print('Go straight')
elif res == [1]: print('Turn right')
else: print('Turn left')
img = sample.reshape(50,50)
plt.imshow(img)
plt.show()
The problem can't be overfitting, since i'm testing it with elements of the training dataset ...
I'm quite sure that the problem is the saver, but i can't figure out how to solve it ...
When you train a model using tensorlfow , make sure that your using the tensorflow version 1.0 and above. once you trained model using latest version 3 file will be created named as follows :
modelname.data
It is TensorBundle collection, save the values of all variables.
modelname.index
.index stores the list of variable names and shapes saved.
modelname.meta
this file describes the saved graph structure, includes GraphDef, SaverDef, and so on.
To reload/restore your model use model.load(modelname) it not only loads your model but also the accuracy won't be fluctuated.
Note : Please use TFLearn , TFLearn introduces a High-Level API that makes neural network building and training fast and easy.For more detail visit http://tflearn.org/getting_started/
The simple and Generalized way of building and using CNN using tensorflow is as follows:
Construct Network :
Here your will create n convolution , max-poll layer and fully connected layer then apply whatever activation function you want and return your model object
Train model :
fit your training data into your model using model.fit(X,Y)
Save Model :
Save your model using model.save(modelName)
Reload Model :
Reload your model using model.load(modelName)
This is the generic and simplified way to build and use CNN.
Hope it may help you :)
I'm trying to reproduce the result of https://www.tensorflow.org/tutorials/mnist/beginners/
So I designed several functions to take care of the training step such as these two:
def layer_computation(previous_layer_output, weights, bias, activation):
return activation(tf.add(tf.matmul(previous_layer_output, weights), bias))
def multilayer_perceptron_forward(x, weights, biaises, activations):
return reduce(lambda output_layer, args: layer_computation(output_layer, *args),
zip(weights, biaises, activations), x)
By using these two functions for the training
def training(session,
features, labels,
mlp,
# cost = (tf.reduce_mean, ),
optimizer=tf.train.GradientDescentOptimizer,
epochs=100, learning_rate=0.001, display=100):
x = tf.placeholder("float")
y = tf.placeholder("float")
weights, biases, activations = mlp
pred = multilayer_perceptron_forward(x, weights, biases, activations)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
opti = optimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
session.run(init)
for i in range(1, epochs + 1):
batch_size = 100
avg_cost = 0
number_of_bacth = int(features.shape[0]/batch_size)
for j in range(number_of_bacth):
my_x = features[j*100:(j+1)*100, :]
my_y = labels[j*100:(j+1)*100, :]
_, c = session.run([opti, cost], feed_dict={x: my_x,
y: my_y})
avg_cost += c/number_of_bacth
if i % display == 0:
print("Epoch {i} cost = {cost}".format(i=i, cost=avg_cost))
The optimization stops at a cost of 2.3... and the overall accuracy is of 10% whereas in the example it get closer to zero and the accuracy is close to 96%. Does anyone have an explanation for this peculiar behavior?
PS when I'm using layer_computation in the source code in the example I also get stuck at a cost of 2.3.
I caught the error, I was trying to perform back-propagation on the last layer. This question may have been better on cross-validated.
I am trying to use a Tensorflow DNN for a Kaggle Competion. The data is about 100 columns of categorical data, 29 columns of numerical data, and 1 column for the output. What I did was I split it into training and testing with X and y using Scikit's train test split function, where X is a list of each rows without the "id" or the value that needs to be predicted, and y is the value that is needed to be predicted. I then built the model, shown below:
import tensorflow as tf
import numpy as np
import time
import pickle
with open('pickle.pickle', 'rb') as f:
trainX, trainy, testX, testy = pickle.load(f)
trainX = np.array(trainX)
trainy = np.array(trainy)
trainy = trainy.reshape(trainy.shape[0], 1)
testX = np.array(testX)
testy = np.array(testy)
print (trainX.shape)
print (trainy.shape)
testX = testX.reshape(testX.shape[0], 130)
testy = testy.reshape(testy.shape[0], 1)
print (testX.shape)
print (testy.shape)
n_nodes_hl1 = 256
n_nodes_hl2 = 256
n_nodes_hl3 = 256
n_classes = 1
batch_size = 100
# Matrix = h X w
X = tf.placeholder('float', [None, len(trainX[0])])
y = tf.placeholder('float')
def model(data):
hidden_1_layer = {'weights':tf.Variable(tf.random_normal([trainX.shape[1], n_nodes_hl1])),
'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}
hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}
hidden_3_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
'biases':tf.Variable(tf.random_normal([n_nodes_hl3]))}
output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),
'biases':tf.Variable(tf.random_normal([n_classes]))}
# (input_data * weights) + biases
l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['biases'])
l1 = tf.nn.sigmoid(l1)
l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])
l2 = tf.nn.sigmoid(l2)
l3 = tf.add(tf.matmul(l2, hidden_3_layer['weights']), hidden_3_layer['biases'])
l3 = tf.nn.sigmoid(l3)
output = tf.matmul(l3, output_layer['weights']) + output_layer['biases']
return output
def train(x):
pred = model(x)
#loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(pred, y))
loss = tf.reduce_mean(tf.square(pred - y))
optimizer = tf.train.AdamOptimizer(0.01).minimize(loss)
epochs = 1
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
print ('Beginning Training \n')
for e in range(epochs):
timeS = time.time()
epoch_loss = 0
i = 0
while i < len(trainX):
start = i
end = i + batch_size
batch_x = np.array(trainX[start:end])
batch_y = np.array(trainy[start:end])
_, c = sess.run([optimizer, loss], feed_dict = {x: batch_x, y: batch_y})
epoch_loss += c
i += batch_size
done = time.time() - timeS
print ('Epoch', e + 1, 'completed out of', epochs, 'loss:', epoch_loss, "\nTime:", done, 'seconds\n')
correct = tf.equal(tf.arg_max(pred, 1), tf.arg_max(y, 1))
acc = tf.reduce_mean(tf.cast(correct, 'float'))
print("Accuracy:", acc.eval({x:testX, y:testy}))
train(X)
Output for 1 epoch:
Epoch 1 completed out of 1 loss: 1498498282.5
Time: 1.3765859603881836 seconds
Accuracy: 1.0
I do realize that the loss is very high, and I am using 1 epoch just for testing purposes, and yes, I know my code is quite messy. But all I want to do is print out a prediction. How would I do that? I know that I need to feed a list of features for X, but I just don't understand how to do it. I also don't quite understand why my accuracy is at 1.0, so if you have any suggestions for that, or any ways to change my code, I would be more that happy to listen to any ideas.
Thanks in advance
To get a prediction you just have to evaluate pred, which is the operation that defines the output of the model.
How to do it? With pred.eval(). But you need an input to evalaute its prediction, so you have to provide a feed_dict dictionary to eval() with the sample (or samples) you want to process.
The resulting code looks like:
predictions = pred.eval(feed_dict = {x:testX})
Notice how this is very similar to acc.eval({x:testX, y:testy}), because the idea is the same. You have an operation (acc in this case) which needs some input to be evaluated, and you can evaluate it either by calling acc.eval() or sess.run(acc) with the corresponding feed_dict with the necessary inputs.
The simplest way would be to use the existing session while training (between iterations):
print (sess.run(model, {x:X_example}))
where X_example is some numpy example tensor.
The below line will give you probability scores for every class for example is you 3 classes then the below line will give you a array of shape of 1x3
Considering you want prediction of a single data point X_test you can do the following:
output = sess.run(pred, {x:X_test})
the maximum number in the above variable output will be you prediction so for that we will modify the above statement :
output = sess.run(tf.argmax(pred, 1), {x:X_test})
print("your prediction for X_test is :", output[0])
Other thing you can do is :
output = sess.run(pred, {x:X_test})
output = np.argmax(output)
print("your prediction for X_test is :", output)