Which among best stochastic optimizers gives better visualization? - python-3.x

I am trying to make a visual comparison of predictions among the best neural network optimization algorithms [1] implemented from scratch.
The loss for SGD with momentum is: 0.2235
The loss for RMSprop is: 0.2075
The loss for Adam is: 0.6931
Are the results for Adam correct or not?
Here is what I have got as graphs:
Code for SGD with momentum:
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])
eta = 0.05 # learning rate
alpha = 0.9 # momentum
nu = np.zeros_like(w)
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))
for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
if i % 10 == 0:
visualize(X_expanded[ind, :], y[ind], w, loss)
grad = compute_grad(X_expanded, y, w)
nu = alpha * nu + eta * grad
w = w - nu
visualize(X, y, w, loss)
plt.clf()
Code for RMSprop:
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1.])
eta = 0.1 # learning rate
alpha = 0.9 # moving average of gradient norm squared
g2 = np.zeros_like(w)
eps = 1e-8
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
if i % 10 == 0:
visualize(X_expanded[ind, :], y[ind], w, loss)
grad = compute_grad(X_expanded, y, w)
grad2 = grad ** 2
g2 = alpha * g2 + (1-alpha) * grad2
w = w - eta * grad / np.sqrt(g2 + eps)
visualize(X, y, w, loss)
plt.clf()
Code for Adam:
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1.])
eta = 0.01 # learning rate
beta1 = 0.9 # moving average of gradient norm
beta2 = 0.999 # moving average of gradient norm squared
m = np.zeros_like(w) # Initial 1st moment estimates
nu = np.zeros_like(w) # Initial 2nd moment estimates
eps = 1e-8 # A small constant for numerical stability
n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):
ind = np.random.choice(X_expanded.shape[0], batch_size)
loss[i] = compute_loss(X_expanded, y, w)
if i % 10 == 0:
visualize(X_expanded[ind, :], y[ind], w, loss)
grad = compute_grad(X_expanded, y, w)
grad2 = grad ** 2
m = ((beta1 * m) + ((1 - beta1) * grad)) / (1 - beta1)
nu = ((beta2 * nu) + ((1 - beta2) * grad2)) / (1 - beta2)
w = (w - eta * m) / (np.sqrt(nu) + eps)
visualize(X, y, w, loss)
plt.clf()
I am expecting to get a lower cost for Adam. I mean less than that provided by RMSprop (0.2075).
[1] https://stackoverflow.com/a/37723962/10543310

Related

.grad() returns None in pytorch

I am trying to write a simple script for parameter estimation (where parameters are weights here). I am facing problem when .grad() returns None. I have gone through this and this link also and understood the concept both theoretically and practically. For me following script should work but unfortunately, it is not working.
My 1st attempt: Following script is my first attempt
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
pred = torch.tensor([[x_dt],
[y0_dt],
[y_dt]],device=device
)
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
Above code throws an error message
element 0 of tensors does not require grad and does not have a grad_fn
at line loss.backward()
My Attempt 2: Improvement in 1st attempt is as follows:
gamma = torch.tensor(2.0, device=device, dtype=torch.float, requires_grad=True)
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
pred = torch.tensor([[x_dt],
[y0_dt],
[y_dt]],device=device,
dtype=torch.float,
requires_grad=True)
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
# with torch.no_grad():
# gamma -= leraning_rate * gamma.grad
Now the script is working but except pred.gred other two return None.
I want to update all the parameters after computing loss.backward() and update them but it is not happening due to None. Can anyone suggest me how to improve this script? Thanks.
You're breaking the computation graph by declaring a new tensor for pred. Instead you can use torch.stack. Also, x_dt and pred are non-leaf tensors so the gradients aren't retained by default. You can override this behavior by using .retain_grad().
gamma = torch.tensor(2.0, device=device, dtype=torch.float, requires_grad=True)
alpha_xy = torch.tensor(3.7, device=device, dtype=torch.float, requires_grad=True)
beta_y = torch.tensor(1.5, device=device, dtype=torch.float, requires_grad=True)
alpha0 = torch.tensor(1.1, device=device, dtype=torch.float, requires_grad=True)
alpha_y = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha1 = torch.tensor(0.1, device=device, dtype=torch.float, requires_grad=True)
alpha2 = torch.tensor(0.9, device=device, dtype=torch.float, requires_grad=True)
alpha3 = torch.tensor(0.001, device=device, dtype=torch.float, requires_grad=True)
learning_rate = 1e-4
total_loss = []
for epoch in tqdm(range(500)):
loss_1 = 0
for j in range(x_train.size(0)):
input = x_train[j:j+1]
target = y_train[j:j+1]
input = input.to(device,non_blocking=True)
target = target.to(device,non_blocking=True)
x_dt = gamma*input[0][0] + \
alpha_xy*input[0][0]*input[0][2] + \
alpha1*input[0][0]
# retain the gradient for non-leaf tensors
x_dt.retain_grad()
y0_dt = beta_y*input[0][0] + \
alpha2*input[0][1]
y_dt = alpha0*input[0][1] + \
alpha_y*input[0][2] + \
alpha3*input[0][0]*input[0][2]
# use stack instead of declaring a new tensor
pred = torch.stack([x_dt, y0_dt, y_dt], dim=0).unsqueeze(1)
# pred is also a non-leaf tensor so we need to tell pytorch to retain its grad
pred.retain_grad()
loss = (pred - target).pow(2).sum()
loss_1 += loss
loss.backward()
print(pred.grad, x_dt.grad, gamma.grad)
with torch.no_grad():
gamma -= learning_rate * gamma.grad
Closed form solution
Assuming you want to optimize for the parameters defined at the top of the function gamma, alpha_xy, beta_y, etc... Then what you have here is an example of ordinary least squares. See least squares for a slightly friendlier introduction to the topic. Take a look at the components of pred and you'll notice that x_dt, y0_dt, and y_dt are actually independent of each other with respect to the parameters (in this case it's obvious because they each use totally different parameters). This makes the problem much easier because it means we can actually optimize the terms (x_dt - target[0])**2, (y0_dt - target[1])**2 and (y_dt - target[2])**2 separately!
Without getting into the details the solution (without back-propagation or gradient descent) ends up being
# supposing x_train is [N,3] and y_train is [N,3]
x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
y1 = y_train[:, 0].unsqueeze(1)
# avoid inverses using solve to get p1 = inv(x1 . x1^T) . x1 . y1
p1, _ = torch.solve(x1 # y1, x1 # x1.transpose(1, 0))
# gamma and alpha1 are redundant. As long as gamma + alpha1 = p1[0] we get the same optimal value for loss
gamma = p1[0] / 2
alpha_xy = p1[1]
alpha1 = p1[0] / 2
x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
y2 = y_train[:, 1].unsqueeze(1)
p2, _ = torch.solve(x2 # y2, x2 # x2.transpose(1, 0))
beta_y = p2[0]
alpha2 = p2[1]
x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
y3 = y_train[:, 2].unsqueeze(1)
p3, _ = torch.solve(x3 # y3, x3 # x3.transpose(1, 0))
alpha0 = p3[0]
alpha_y = p3[1]
alpha3 = p3[2]
loss_1 = torch.sum((x1.transpose(1, 0) # p1 - y1)**2 + (x2.transpose(1, 0) # p2 - y2)**2 + (x3.transpose(1, 0) # p3 - y3)**2)
mse = loss_1 / x_train.size(0)
To test this code is working I generated some fake data which I knew the underlying model coefficients (there's some noise added so the final result won't exactly match the expected).
def gen_fake_data(samples=50000):
x_train = torch.randn(samples, 3)
# define fake data with known minimal solutions
x1 = torch.stack((x_train[:, 0], x_train[:, 0] * x_train[:, 2]), dim=0)
x2 = torch.stack((x_train[:, 0], x_train[:, 1]), dim=0)
x3 = torch.stack((x_train[:, 1], x_train[:, 2], x_train[:, 0] * x_train[:, 2]), dim=0)
y1 = x1.transpose(1, 0) # torch.tensor([[1.0], [2.0]]) # gamma + alpha1 = 1.0
y2 = x2.transpose(1, 0) # torch.tensor([[3.0], [4.0]])
y3 = x3.transpose(1, 0) # torch.tensor([[5.0], [6.0], [7.0]])
y_train = torch.cat((y1, y2, y3), dim=1) + 0.1 * torch.randn(samples, 3)
return x_train, y_train
x_train, y_train = gen_fake_data()
# optimization code from above
...
print('loss_1:', loss_1.item())
print('MSE:', mse.item())
print('Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0')
print('Actual', gamma.item(), alpha_xy.item(), alpha1.item(), beta_y.item(), alpha2.item(), alpha0.item(), alpha_y.item(), alpha3.item())
which results in
loss_1: 1491.731201171875
MSE: 0.029834624379873276
Expected 0.5, 2.0, 0.5, 3.0, 4.0, 5.0, 6.0, 7.0
Actual 0.50002 2.0011 0.50002 3.0009 3.9997 5.0000 6.0002 6.9994

RNN_LSTM_TENSORFLOW does not feed updated w, b in new epoch, although it does in the next batch in the same epoch

I have a problem, it may be obvious but I don't know how to fix it.
Although it seems that the w, b are updated in each batch, when a new epoch beggins the w, b are not the last ones(the ones that come from the last batch). Thus, the nn does the same thing in each epoch without getting better.
Here is the code, if you see something please tell me!
if name == 'main':
tf.reset_default_graph() #sos τελειο
# Training Parameters
lr = 0.0001
epochs = 10
batch_size = 100
total_series_length = 10000
training_steps = int(total_series_length / batch_size) #ποσες φορες θα αλλαξουν τα w, b
display_step = int(training_steps / 4)
# Network Parameters
timesteps = 1
look_back_window = 80 # num of inputs
num_hidden = 200 # num of features/nodes at hidden layer
num_output = 1
# inputs
a, b, steps = (1, 10*np.pi, total_series_length - 1)
step = (b - a)/steps
x = np.array( [ a + i*step for i in range(steps + 1) ], dtype = np.float32 )
sequence = np.sin(x)
traindata = sequence[ : len(sequence) ] #διαλεγω τι ποσοστο θα κανω train
print('traindata.shape = {}'.format(traindata.shape))
trainX, trainY = create_dataset(traindata, look_back_window)
print('trainX.shape = {}, trainX.shape = {}'.format(trainX.shape, trainY.shape))
# Graph input
X = tf.placeholder(tf.float32, [None, timesteps, look_back_window])
Y = tf.placeholder(tf.float32, [None])#, num_output])
# Define weights
w = {'out': tf.Variable(tf.random_normal([num_hidden, num_output]), dtype = tf.float32)}
b = {'out': tf.Variable(tf.random_normal([num_output]), dtype = tf.float32)}
last_output = RNN(X, w, b,look_back_window, num_hidden)
prediction_operation = tf.nn.tanh(last_output) #sigmoid maybe better, check
# Define loss and optimizer
loss_operation = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = last_output, labels = Y))
optimizer = tf.train.AdamOptimizer(lr)
train_operation = optimizer.minimize(loss_operation)
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
# Run the initializers
sess.run(init)
for epoch in range(epochs) :
print('\n epoch = {}'.format(epoch))
for step in range(0, training_steps): #παει απο το 0 εως το 99 step = 100steps
batch_x, batch_y = trainX[ (step * batch_size) : (batch_size + step * batch_size), : ], trainY[ (step * batch_size) : (batch_size + step * batch_size) ]
batch_x = batch_x.reshape(batch_x.shape[0], timesteps, batch_x.shape[1])
sess.run(train_operation, feed_dict = {X : batch_x, Y : batch_y})
loss = sess.run(loss_operation, feed_dict={X : batch_x, Y : batch_y})
pred_batch = sess.run(prediction_operation, feed_dict = {X : batch_x})
if (step % display_step == 0 or step == training_steps - 1 ):
# Calculate batch loss
print( '{} training step'.format(step) )
#print( 'batch_x.shape = {}, batch_y.shape = {}'.format(batch_x.shape, batch_y.shape) )
print( 'loss = {}'.format(loss) )

Not found: Key Variable_<x> not found in checkpoint

I am trying to save a trained model and use it later in another instance (function). But, somehow this throws me the variable not found error. After reagin through SO and other forums, I understand the problem is the way I store it.
dictionary, reverse_dictionary = build_dataset(training_data)
vocab_size = len(dictionary)
n_input = 3
n_hidden = 512
# RNN output node weights and biases
weights = {'out': tf.Variable(tf.random_normal([n_hidden, vocab_size]))}
biases = {'out': tf.Variable(tf.random_normal([vocab_size]))}
# tf Graph input
x = tf.placeholder("float", [None, n_input, 1])
y = tf.placeholder("float", [None, vocab_size])
# RNN implementation in Tensorflow
def RNN(x,weights,biases):
x = tf.reshape(x, [-1, n_input])
x = tf.split(x, n_input, 1)
rnn_cell = rnn.BasicLSTMCell(n_hidden)
outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)
return tf.matmul(outputs[-1], weights['out']) + biases['out']
pred = RNN(x, weights, biases)
learning_rate = 0.001
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate).minimize(cost)
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.global_variables_initializer()
training_iters = 1000
display_step = 500
saver = tf.train.Saver()
# Launch the graph
with tf.Session() as session:
session.run(init)
step = 0
offset = random.randint(0, n_input+1)
end_offset = n_input + 1
acc_total = 0
loss_total = 0
while step < training_iters:
if offset > (len(training_data)-end_offset):
offset = random.randint(0, n_input+1)
symbols_in_keys = [ [dictionary[ str(training_data[i])]] for i in range(offset, offset+n_input) ]
symbols_in_keys = np.reshape(np.array(symbols_in_keys), [-1, n_input, 1])
symbols_out_onehot = np.zeros([vocab_size], dtype=float)
symbols_out_onehot[dictionary[str(training_data[offset+n_input])]] = 1.0
symbols_out_onehot = np.reshape(symbols_out_onehot, [1, -1])
_, acc, loss, onehot_pred = session.run([optimizer, accuracy, cost, pred], \
feed_dict={x: symbols_in_keys, y: symbols_out_onehot})
loss_total += loss
acc_total += acc
if (step+1) % display_step == 0:
print("Iter= " + str(step+1) + ", Average Loss= " + \
"{:.6f}".format(loss_total/display_step) + ", Average Accuracy= " + \
"{:.2f}%".format(100*acc_total/display_step))
acc_total = 0
loss_total = 0
symbols_in = [training_data[i] for i in range(offset, offset + n_input)]
symbols_out = training_data[offset + n_input]
symbols_out_pred = reverse_dictionary[int(tf.argmax(onehot_pred, 1).eval())]
print("%s - [%s] vs [%s]" % (symbols_in,symbols_out,symbols_out_pred))
step += 1
offset += (n_input+1)
saver.save(session, 'userLocation/Model')
While the model files are generated, but when I try to restore the model using
saver = tf.train.Saver()
with tf.Session() as restored_session:
saver.restore(restored_session, 'userLocation/Model')
Error
tensorflow.python.framework.errors_impl.NotFoundError: Key Variable_3 not found in checkpoint
[[Node: save_1/RestoreV2_7 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/RestoreV2_7/tensor_names, save_1/RestoreV2_7/shape_and_slices)]]
Any pointers as to what am i missing while saving.
I will explain this in 2 different part -
When you save the model in tensorflow, it will save graph in one file(usually the extention is .meta) and variable tensors in other file(usually index file).
Now, while importing you have to do the same 2 step process - a) import the graph first b) then create a session and import variables.
Here is a sample code -
import tensorflow as tf
import numpy as np
tf.set_random_seed(10)
#define graph location in variable
meta_file = 'userLocation/Model.meta'
#importing the graph
ns = tf.train.import_meta_graph(meta_file , clear_devices=True)
#create a session
with tf.Session().as_default() as sess:
#import variables
ns.restore(sess, meta_file[0:len(meta_file)-5])
# for example, if you have 'x' tenbsor in graph
x=tf.get_default_graph().get_tensor_by_name("x:0")
.
.
.
#Further processing/prediction etc

Why doesn't a TensorFlow cubic model work when an equivalent quadratic model works?

In this sample code (mostly like the example code for a linear regression here), TensorFlow is supposed to find a, b, c, and d values for given points making up a cubic. In this case, it should be 0x^3 + 0x^2 + 1x + 0, but instead gets steadily larger and larger until it hits nan.
The strange thing is that the same code with a modification to the line:
model = a * x * x * x + b * x * x + c * x + d
to
model = a * x * x + b * x + c
will give correct output (for a quadratic instead of cubic, of course). What's the issue?
Code here:
import os
import tensorflow as tf
import numpy as np
# Don't remove this, I need it to mitigate tf build warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# Model parameters
a = tf.Variable([1.], tf.float64)
b = tf.Variable([1.], tf.float64)
c = tf.Variable([1.], tf.float64)
d = tf.Variable([1.], tf.float64)
# Model input and output
x = tf.placeholder(tf.float32)
model = a * x * x * x + b * x * x + c * x + d
y = tf.placeholder(tf.float32)
# Loss
squared_deltas = tf.square(model-y)
loss = tf.reduce_sum(squared_deltas)
# Optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# Training Data
x_train = [-2, -1, 0, 1, 2]
y_train = [-2, -1, 0, 1, 2]
# Training Loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(1000):
curr_a, curr_b, curr_c, curr_d = sess.run([a, b, c, d], {x: x_train, y: y_train})
print("Formula: %s x^3 + %s x^2 + %s x + %s" % (curr_a, curr_b, curr_c, curr_d))
sess.run([train], {x: x_train, y: y_train})
# Evaluate Training Accuracy
curr_a, curr_b, curr_c, curr_d = sess.run([a, b, c, d], {x: x_train, y: y_train})
print("Formula: %s x^3 + %s x^2 + %s x + %s" % (np.round(curr_a), np.round(curr_b), np.round(curr_c), np.round(curr_d)))
All about the gradient
Now with a larger possible loss function your gradient of 0.01 is too large which makes corrections become unstable.
Also, to accomidate a smaller gradient you'll need more steps. Here is the working code
Code
import os
import tensorflow as tf
import numpy as np
# Don't remove this, I need it to mitigate tf build warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# Model parameters
a = tf.Variable([1.], tf.float64)
b = tf.Variable([1.], tf.float64)
c = tf.Variable([1.], tf.float64)
d = tf.Variable([1.], tf.float64)
# Model input and output
x = tf.placeholder(tf.float32)
model = a * x * x * x + b * x * x + c * x + d
y = tf.placeholder(tf.float32)
# Loss
squared_deltas = tf.square(model-y)
loss = tf.reduce_mean(squared_deltas)
# Optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# Training Data
x_train = [-2, -1, 0, 1, 2]
y_train = [-2, -1, 0, 1, 2]
# Training Loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(10000):
curr_a, curr_b, curr_c, curr_d = sess.run([a, b, c, d], {x: x_train, y: y_train})
if i % 100 == 0 :
print("Formula: %s x^3 + %s x^2 + %s x + %s" % (curr_a, curr_b, curr_c, curr_d))
sess.run([train], {x: x_train, y: y_train})
# Evaluate Training Accuracy
curr_a, curr_b, curr_c, curr_d = sess.run([a, b, c, d], {x: x_train, y: y_train})
print("Formula: %s x^3 + %s x^2 + %s x + %s" % (np.round(curr_a), np.round(curr_b), np.round(curr_c), np.round(curr_d)))
Output
...
Formula: [ 3.50048867e-06] x^3 + [ 8.49209730e-11] x^2 + [ 0.99998665] x + [ 7.22413340e-13]
Formula: [ 3.49762831e-06] x^3 + [ 8.49209730e-11] x^2 + [ 0.99998665] x + [ 5.92354182e-13]
Formula: [ 3.50239748e-06] x^3 + [ 8.49209730e-11] x^2 + [ 0.99998665] x + [ 4.85032262e-13]
Formula: [ 0.] x^3 + [ 0.] x^2 + [ 1.] x + [ 0.]
Final comments (updated)
This problem really comes from the lines :
# Loss
squared_deltas = tf.square(model-y)
loss = tf.reduce_sum(squared_deltas)
The gradient of loss can become HUGEer when we add that x^3 term.
Another solution would be to change the loss function to use tf.reduce_mean. I didn't see this the first time I looked at the code.
# Loss
squared_deltas = tf.square(model-y)
loss = tf.reduce_mean(squared_deltas)
# Optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
If you use tf.reduce_mean your workflow will not have to be re-adjusted each time you change your batch size or number of parameters. This is now my preferred solution.
Cheers

How to predict new data using a trained simple feed forward neural network in tensorflow

Forgive me if this sounds like a dumb question. Assuming that I have a neural network that is trained with the data of shape [m, n], How do I test the trained network with data of shape [1, 3]
here is the code that I currently have:
n_hidden_1 = 1024
n_hidden_2 = 1024
n = len(test_data[0]) - 1
m = len(test_data)
alpha = 0.005
training_epoch = 1000
display_epoch = 100
train_X = np.array([i[:-1:] for i in test_data]).astype('float32')
train_X = normalize_data(train_X)
train_Y = np.array([i[-1::] for i in test_data]).astype('float32')
train_Y = normalize_data(train_Y)
X = tf.placeholder(dtype=np.float32, shape=[m, n])
Y = tf.placeholder(dtype=np.float32, shape=[m, 1])
weights = {
'h1': tf.Variable(tf.random_normal([n, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, 1]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([1])),
}
layer_1 = tf.add(tf.matmul(X, weights['h1']), biases['b1'])
layer_1 = tf.nn.sigmoid(layer_1)
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_2 = tf.nn.sigmoid(layer_2)
activation = tf.matmul(layer_2, weights['out']) + biases['out']
cost = tf.reduce_sum(tf.square(activation - Y)) / (2 * m)
optimizer = tf.train.GradientDescentOptimizer(alpha).minimize(cost)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(training_epoch):
sess.run([optimizer, cost], feed_dict={X: train_X, Y: train_Y})
cost_ = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
if epoch % display_epoch == 0:
print('Epoch:', epoch, 'Cost:', cost_)
How do I test a new data? For regression I know I can use something like this for the data [0.4, 0.5, 0.1]
predict_x = np.array([0.4, 0.5, 0.1], dtype=np.float32).reshape([1, 3])
predict_x = (predict_x - mean) / std
predict_y = tf.add(tf.matmul(predict_x, W), b)
result = sess.run(predict_y).flatten()[0]
How do I do the same with a neural network?
If you use
X = tf.placeholder(dtype=np.float32, shape=[None, n])
Y = tf.placeholder(dtype=np.float32, shape=[None, 1])
the first dimension of those two placeholders will have variable size, i.e. at training time it can be different (e.g. 720) than at test time (e.g. 1). This is often referred to as having "variable batch sizes" as it is quite common to have different batch sizes during training and testing.
On this line:
cost = tf.reduce_sum(tf.square(activation - Y)) / (2 * m)
you are making use of m which is now variable. to make this line work with variable batch sizes (as m is now unknown before execution of the graph) you should do something like:
m = tf.shape(X)[0]
cost = tf.reduce_sum(tf.square(activation - Y)) / (tf.multiply(m, 2))
tf.shape evaluates the dynamic shape of X, i.e. the shape it has at runtime.

Resources