.grad() returns None in pytorch - pytorch

I am trying to write a simple script for parameter estimation (where parameters are weights here). I am facing problem when .grad() returns None. I have gone through this and this link also and understood the concept both theoretically and practically. For me following script should work but unfortunately, it is not working.
My 1st attempt: Following script is my first attempt
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
pred = torch.tensor([[x_dt],
[y0_dt],
[y_dt]],device=device
)
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
Above code throws an error message
element 0 of tensors does not require grad and does not have a grad_fn
at line loss.backward()
My Attempt 2: Improvement in 1st attempt is as follows:
gamma = torch.tensor(2.0, device=device, dtype=torch.float, requires_grad=True)
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
pred = torch.tensor([[x_dt],
[y0_dt],
[y_dt]],device=device,
dtype=torch.float,
requires_grad=True)
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
# with torch.no_grad():
# gamma -= leraning_rate * gamma.grad
Now the script is working but except pred.gred other two return None.
I want to update all the parameters after computing loss.backward() and update them but it is not happening due to None. Can anyone suggest me how to improve this script? Thanks.

You're breaking the computation graph by declaring a new tensor for pred. Instead you can use torch.stack. Also, x_dt and pred are non-leaf tensors so the gradients aren't retained by default. You can override this behavior by using .retain_grad().
gamma = torch.tensor(2.0, device=device, dtype=torch.float, requires_grad=True)
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
# retain the gradient for non-leaf tensors
x_dt.retain_grad()
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
# use stack instead of declaring a new tensor
pred = torch.stack([x_dt, y0_dt, y_dt], dim=0).unsqueeze(1)
# pred is also a non-leaf tensor so we need to tell pytorch to retain its grad
pred.retain_grad()
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
with torch.no_grad():
gamma -= learning_rate * gamma.grad
Closed form solution
Assuming you want to optimize for the parameters defined at the top of the function gamma, alpha_xy, beta_y, etc... Then what you have here is an example of ordinary least squares. See least squares for a slightly friendlier introduction to the topic. Take a look at the components of pred and you'll notice that x_dt, y0_dt, and y_dt are actually independent of each other with respect to the parameters (in this case it's obvious because they each use totally different parameters). This makes the problem much easier because it means we can actually optimize the terms (x_dt - target[0])**2, (y0_dt - target[1])**2 and (y_dt - target[2])**2 separately!
Without getting into the details the solution (without back-propagation or gradient descent) ends up being
# supposing x_train is [N,3] and y_train is [N,3]
x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
y1 = y_train[:, 0].unsqueeze(1)
# avoid inverses using solve to get p1 = inv(x1 . x1^T) . x1 . y1
p1, _ = torch.solve(x1 # y1, x1 # x1.transpose(1, 0))
# gamma and alpha1 are redundant. As long as gamma + alpha1 = p1[0] we get the same optimal value for loss
gamma = p1[0] / 2
alpha_xy = p1[1]
alpha1 = p1[0] / 2
x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
y2 = y_train[:, 1].unsqueeze(1)
p2, _ = torch.solve(x2 # y2, x2 # x2.transpose(1, 0))
beta_y = p2[0]
alpha2 = p2[1]
x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
y3 = y_train[:, 2].unsqueeze(1)
p3, _ = torch.solve(x3 # y3, x3 # x3.transpose(1, 0))
alpha0 = p3[0]
alpha_y = p3[1]
alpha3 = p3[2]
loss_1 = torch.sum((x1.transpose(1, 0) # p1 - y1)**2 + (x2.transpose(1, 0) # p2 - y2)**2 + (x3.transpose(1, 0) # p3 - y3)**2)
mse = loss_1 / x_train.size(0)
To test this code is working I generated some fake data which I knew the underlying model coefficients (there's some noise added so the final result won't exactly match the expected).
def gen_fake_data(samples=50000):
x_train = torch.randn(samples, 3)
# define fake data with known minimal solutions
x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
y1 = x1.transpose(1, 0) # torch.tensor([[1.0], [2.0]]) # gamma + alpha1 = 1.0
y2 = x2.transpose(1, 0) # torch.tensor([[3.0], [4.0]])
y3 = x3.transpose(1, 0) # torch.tensor([[5.0], [6.0], [7.0]])
y_train = torch.cat((y1, y2, y3), dim=1) + 0.1 * torch.randn(samples, 3)
return x_train, y_train
x_train, y_train = gen_fake_data()
# optimization code from above
...
print('loss_1:', loss_1.item())
print('MSE:', mse.item())
print('Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0')
print('Actual', gamma.item(), alpha_xy.item(), alpha1.item(), beta_y.item(), alpha2.item(), alpha0.item(), alpha_y.item(), alpha3.item())
which results in
loss_1: 1491.731201171875
MSE: 0.029834624379873276
Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0
Actual 0.50002 2.0011 0.50002 3.0009 3.9997 5.0000 6.0002 6.9994

Related

Which among best stochastic optimizers gives better visualization?

I am trying to make a visual comparison of predictions among the best neural network optimization algorithms [1] implemented from scratch.
The loss for SGD with momentum is: 0.2235
The loss for RMSprop is: 0.2075
The loss for Adam is: 0.6931
Are the results for Adam correct or not?
Here is what I have got as graphs:
Code for SGD with momentum:
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])
eta = 0.05 # learning rate
alpha = 0.9 # momentum
nu = np.zeros_like(w)
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))
for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
if i % 10 == 0:
visualize(X_expanded[ind, :], y[ind], w, loss)
grad = compute_grad(X_expanded, y, w)
nu = alpha * nu + eta * grad
w = w - nu
visualize(X, y, w, loss)
plt.clf()
Code for RMSprop:
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1.])
eta = 0.1 # learning rate
alpha = 0.9 # moving average of gradient norm squared
g2 = np.zeros_like(w)
eps = 1e-8
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
if i % 10 == 0:
visualize(X_expanded[ind, :], y[ind], w, loss)
grad = compute_grad(X_expanded, y, w)
grad2 = grad ** 2
g2 = alpha * g2 + (1-alpha) * grad2
w = w - eta * grad / np.sqrt(g2 + eps)
visualize(X, y, w, loss)
plt.clf()
Code for Adam:
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1.])
eta = 0.01 # learning rate
beta1 = 0.9 # moving average of gradient norm
beta2 = 0.999 # moving average of gradient norm squared
m = np.zeros_like(w) # Initial 1st moment estimates
nu = np.zeros_like(w) # Initial 2nd moment estimates
eps = 1e-8 # A small constant for numerical stability
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
if i % 10 == 0:
visualize(X_expanded[ind, :], y[ind], w, loss)
grad = compute_grad(X_expanded, y, w)
grad2 = grad ** 2
m = ((beta1 * m) + ((1 - beta1) * grad)) / (1 - beta1)
nu = ((beta2 * nu) + ((1 - beta2) * grad2)) / (1 - beta2)
w = (w - eta * m) / (np.sqrt(nu) + eps)
visualize(X, y, w, loss)
plt.clf()
I am expecting to get a lower cost for Adam. I mean less than that provided by RMSprop (0.2075).
[1] https://stackoverflow.com/a/37723962/10543310

RNN_LSTM_TENSORFLOW does not feed updated w, b in new epoch, although it does in the next batch in the same epoch

I have a problem, it may be obvious but I don't know how to fix it.
Although it seems that the w, b are updated in each batch, when a new epoch beggins the w, b are not the last ones(the ones that come from the last batch). Thus, the nn does the same thing in each epoch without getting better.
Here is the code, if you see something please tell me!
if name == 'main':
tf.reset_default_graph() #sos τελειο
# Training Parameters
lr = 0.0001
epochs = 10
batch_size = 100
total_series_length = 10000
training_steps = int(total_series_length / batch_size) #ποσες φορες θα αλλαξουν τα w, b
display_step = int(training_steps / 4)
# Network Parameters
timesteps = 1
look_back_window = 80 # num of inputs
num_hidden = 200 # num of features/nodes at hidden layer
num_output = 1
# inputs
a, b, steps = (1, 10*np.pi, total_series_length - 1)
step = (b - a)/steps
x = np.array( [ a + i*step for i in range(steps + 1) ], dtype = np.float32 )
sequence = np.sin(x)
traindata = sequence[ : len(sequence) ] #διαλεγω τι ποσοστο θα κανω train
print('traindata.shape = {}'.format(traindata.shape))
trainX, trainY = create_dataset(traindata, look_back_window)
print('trainX.shape = {}, trainX.shape = {}'.format(trainX.shape, trainY.shape))
# Graph input
X = tf.placeholder(tf.float32, [None, timesteps, look_back_window])
Y = tf.placeholder(tf.float32, [None])#, num_output])
# Define weights
w = {'out': tf.Variable(tf.random_normal([num_hidden, num_output]), dtype = tf.float32)}
b = {'out': tf.Variable(tf.random_normal([num_output]), dtype = tf.float32)}
last_output = RNN(X, w, b,look_back_window, num_hidden)
prediction_operation = tf.nn.tanh(last_output) #sigmoid maybe better, check
# Define loss and optimizer
loss_operation = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = last_output, labels = Y))
optimizer = tf.train.AdamOptimizer(lr)
train_operation = optimizer.minimize(loss_operation)
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
# Run the initializers
sess.run(init)
for epoch in range(epochs) :
print('\n epoch = {}'.format(epoch))
for step in range(0, training_steps): #παει απο το 0 εως το 99 step = 100steps
batch_x, batch_y = trainX[ (step * batch_size) : (batch_size + step * batch_size), : ], trainY[ (step * batch_size) : (batch_size + step * batch_size) ]
batch_x = batch_x.reshape(batch_x.shape[0], timesteps, batch_x.shape[1])
sess.run(train_operation, feed_dict = {X : batch_x, Y : batch_y})
loss = sess.run(loss_operation, feed_dict={X : batch_x, Y : batch_y})
pred_batch = sess.run(prediction_operation, feed_dict = {X : batch_x})
if (step % display_step == 0 or step == training_steps - 1 ):
# Calculate batch loss
print( '{} training step'.format(step) )
#print( 'batch_x.shape = {}, batch_y.shape = {}'.format(batch_x.shape, batch_y.shape) )
print( 'loss = {}'.format(loss) )

tensorflow cost function is a tensor, not a scalar - why? Optimization fails

the following code runs, but it does not work. The variable cost is always a tensor full of 1.0, but why? I estimated a scalar, because a 1x5 matrix multiplied by a 5x1 matrix is a scalar. The biases and weights also do not change while optimization. What am I doing wrong?
#KI-Model
x = tf.placeholder(tf.float32, [None, 5], name='input') #x_1-x_5
#Init the graph
W = tf.Variable(tf.zeros([5,1]))
b = tf.Variable(tf.zeros([1]))
#activation with sigmoid
y = tf.nn.sigmoid(tf.matmul(x, W) + b) #berechneter Wert für y
#Training
y_tensor = tf.placeholder(tf.float32, [None, 1], name='output')
#cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_tensor * tf.log(y), reduction_indices=[1])) #Hier Cross-Entropie statt minimum squares method
loss = y-y_tensor
cost = tf.square(loss)
optimizer = tf.train.GradientDescentOptimizer(0.001).minimize(cost)
#Start
session = tf.Session() #Google-> was ist das?
init = tf.global_variables_initializer()
session.run(init)
#init first 1000 training_batches
for i in range(1000):
batch_xs.append([dataA[i], dataB[i], dataC[i], dataD[i],
dataE[i]])
batch_ys.append(dataG[i])
for i in range(10000):
session.run(optimizer, feed_dict={x: batch_xs, y_tensor: batch_ys})
print(session.run(cost, feed_dict={x: batch_xs, y_tensor: batch_ys}))

How to predict new data using a trained simple feed forward neural network in tensorflow

Forgive me if this sounds like a dumb question. Assuming that I have a neural network that is trained with the data of shape [m, n], How do I test the trained network with data of shape [1, 3]
here is the code that I currently have:
n_hidden_1 = 1024
n_hidden_2 = 1024
n = len(test_data[0]) - 1
m = len(test_data)
alpha = 0.005
training_epoch = 1000
display_epoch = 100
train_X = np.array([i[:-1:] for i in test_data]).astype('float32')
train_X = normalize_data(train_X)
train_Y = np.array([i[-1::] for i in test_data]).astype('float32')
train_Y = normalize_data(train_Y)
X = tf.placeholder(dtype=np.float32, shape=[m, n])
Y = tf.placeholder(dtype=np.float32, shape=[m, 1])
weights = {
'h1': tf.Variable(tf.random_normal([n, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, 1]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([1])),
}
layer_1 = tf.add(tf.matmul(X, weights['h1']), biases['b1'])
layer_1 = tf.nn.sigmoid(layer_1)
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.sigmoid(layer_2)
activation = tf.matmul(layer_2, weights['out']) + biases['out']
cost = tf.reduce_sum(tf.square(activation - Y)) / (2 * m)
optimizer = tf.train.GradientDescentOptimizer(alpha).minimize(cost)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(training_epoch):
sess.run([optimizer, cost], feed_dict={X: train_X, Y: train_Y})
cost_ = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
if epoch % display_epoch == 0:
print('Epoch:', epoch, 'Cost:', cost_)
How do I test a new data? For regression I know I can use something like this for the data [0.4, 0.5, 0.1]
predict_x = np.array([0.4, 0.5, 0.1], dtype=np.float32).reshape([1, 3])
predict_x = (predict_x - mean) / std
predict_y = tf.add(tf.matmul(predict_x, W), b)
result = sess.run(predict_y).flatten()[0]
How do I do the same with a neural network?
If you use
X = tf.placeholder(dtype=np.float32, shape=[None, n])
Y = tf.placeholder(dtype=np.float32, shape=[None, 1])
the first dimension of those two placeholders will have variable size, i.e. at training time it can be different (e.g. 720) than at test time (e.g. 1). This is often referred to as having "variable batch sizes" as it is quite common to have different batch sizes during training and testing.
On this line:
cost = tf.reduce_sum(tf.square(activation - Y)) / (2 * m)
you are making use of m which is now variable. to make this line work with variable batch sizes (as m is now unknown before execution of the graph) you should do something like:
m = tf.shape(X)[0]
cost = tf.reduce_sum(tf.square(activation - Y)) / (tf.multiply(m, 2))
tf.shape evaluates the dynamic shape of X, i.e. the shape it has at runtime.

Tensor flow, making predictions using a trained network

So I am training a network to classify images in tensor flow. After I trained the network I began work on trying to use it to classify other images. The goal is to import an image, feed it to the classifier and have it print the result. I am having some trouble getting that part off the ground though. Here is what I have so far. I found that having tf.argmax(y,1) gave an error. I found that changing it to 0 fixed that error. However I am not convinced that it is actually working. I tossed 2 images through the classifier and they both got the same class even though they are vastly different. Just need some perspective here. Is this valid? Or is there something wrong here that will always feed me the same class (in this case I got class 0 for both of the images I tried).
Is this even the right way to approach making predictions in tensor flow? This is just the culmination of my debugging, not sure if it is what should be done or not.
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
X_train,X_validation,y_train,y_validation=train_test_split(X_train,y_train, test_size=20,random_state=0)
X_train, y_train = shuffle(X_train, y_train)
def LeNet(x):
# Arguments used for tf.truncated_normal, randomly defines variables
for the weights and biases for each layer
mu = 0
sigma = 0.1
# SOLUTION: Layer 1: Convolutional. Input = 32x32x3. Output = 28x28x6.
conv1_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 3, 6), mean = mu, stddev = sigma))
conv1_b = tf.Variable(tf.zeros(6))
conv1 = tf.nn.conv2d(x, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b
# SOLUTION: Activation.
conv1 = tf.nn.relu(conv1)
# SOLUTION: Pooling. Input = 28x28x6. Output = 14x14x6.
conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# SOLUTION: Layer 2: Convolutional. Output = 10x10x16.
conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), mean = mu, stddev = sigma))
conv2_b = tf.Variable(tf.zeros(16))
conv2 = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
# SOLUTION: Activation.
conv2 = tf.nn.relu(conv2)
# SOLUTION: Pooling. Input = 10x10x16. Output = 5x5x16.
conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# SOLUTION: Flatten. Input = 5x5x16. Output = 400.
fc0 = flatten(conv2)
# SOLUTION: Layer 3: Fully Connected. Input = 400. Output = 120.
fc1_W = tf.Variable(tf.truncated_normal(shape=(400, 120), mean = mu, stddev = sigma))
fc1_b = tf.Variable(tf.zeros(120))
fc1 = tf.matmul(fc0, fc1_W) + fc1_b
# SOLUTION: Activation.
fc1 = tf.nn.relu(fc1)
# SOLUTION: Layer 4: Fully Connected. Input = 120. Output = 84.
fc2_W = tf.Variable(tf.truncated_normal(shape=(120, 84), mean = mu, stddev = sigma))
fc2_b = tf.Variable(tf.zeros(84))
fc2 = tf.matmul(fc1, fc2_W) + fc2_b
# SOLUTION: Activation.
fc2 = tf.nn.relu(fc2)
# SOLUTION: Layer 5: Fully Connected. Input = 84. Output = 43.
fc3_W = tf.Variable(tf.truncated_normal(shape=(84, 43), mean = mu, stddev = sigma))
fc3_b = tf.Variable(tf.zeros(43))
logits = tf.matmul(fc2, fc3_W) + fc3_b
return logits
import tensorflow as tf
x = tf.placeholder(tf.float32, (None, 32, 32, 3))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y, 43)
EPOCHS=10
BATCH_SIZE=128
rate = 0.001
logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, one_hot_y)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()
def evaluate(X_data, y_data):
num_examples = len(X_data)
total_accuracy = 0
sess = tf.get_default_session()
for offset in range(0, num_examples, BATCH_SIZE):
batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y})
total_accuracy += (accuracy * len(batch_x))
return total_accuracy / num_examples
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
num_examples = len(X_train)
print("Training...")
print()
for i in range(EPOCHS):
X_train, y_train = shuffle(X_train, y_train)
for offset in range(0, num_examples, BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train[offset:end], y_train[offset:end]
sess.run(training_operation, feed_dict={x: batch_x, y: batch_y})
validation_accuracy = evaluate(X_validation, y_validation)
print("EPOCH {} ...".format(i+1))
print("Validation Accuracy = {:.3f}".format(validation_accuracy))
print()
saver.save(sess, './lenet')
print("Model saved")
import cv2
image=cv2.imread('File path')
image=cv2.resize(image,(32,32)) #classifier takes 32X32 images
image=np.array(image)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver3 = tf.train.import_meta_graph('./lenet.meta')
saver3.restore(sess, "./lenet")
pred = tf.nn.softmax(logits)
predictions = sess.run(tf.argmax(y,0), feed_dict={x: image})
print (predictions)
So what had to happen here was first clear the kernel and outputs. Somewhere along the way my placeholders got muddled up and clearing the kernel fixed that right up. Then I had to realize what really had to get done here: I had to call up the softmax function on my new data.
Like this:
pred = tf.nn.softmax(logits)
classification = sess.run(pred, feed_dict={x: image_array})

Resources