Why tensorflow1.1 gets slower and slower when training? Is it memory leak or queue starvation? - memory-leaks

I trained a ESPCN in tensorflow1.1, the costed time per patch increase nearly linearly when training. The first 100 epoch takes only 4-5 seconds, but the 70th epoch takes about half a minute. See the training result below:
I've searched the same question on Google and Stack-overflow, and tried the solutions below, but seemed no work:
1.add tf.reset_default_graph() after every sess.run();
2.add time.sleep(5) to prevent queue starvation;
I know the general idea, that is to reduce the operations in Session(). But how? Anyone have the solution?
Here's part of my code:
L3, var_w_list, var_b_list = model_train(IN, FLAGS)
cost = tf.reduce_mean(tf.reduce_sum(tf.square(OUT - L3), reduction_indices=0))
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(FLAGS.base_lr, global_step * FLAGS.batch_size, FLAGS.decay_step, 0.96, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost, global_step = global_step, var_list = var_w_list + var_b_list)
# optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(cost, var_list = var_w_list + var_b_list)
cnt = 0
with tf.Session() as sess:
init_op = tf.initialize_all_variables()
sess.run(init_op)
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
print('\n\n\n =========== All initialization finished, now training begins ===========\n\n\n')
t_start = time.time()
t1 = t_start
for i in range(1, FLAGS.max_Epoch + 1):
LR_batch, HR_batch = batch.__next__()
global_step += 1
[_, cost1] = sess.run([optimizer, cost], feed_dict = {IN: LR_batch, OUT: HR_batch})
# tf.reset_default_graph()
if i % 100 == 0 or i == 1:
print_step = i
print_loss = cost1 / FLAGS.batch_size
test_LR_batch, test_HR_batch = test_batch.__next__()
test_SR_batch = test_HR_batch.copy()
test_SR_batch[:,:,:,0:3] = sess.run(L3, feed_dict = {IN: test_LR_batch[:,:,:,0:3]})
# tf.reset_default_graph()
psnr_tmp = 0.0
ssim_tmp = 0.0
for k in range(test_SR_batch.shape[0]):
com1 = test_SR_batch[k, :, :, 0]
com2 = test_HR_batch[k, :, :, 0]
psnr_tmp += get_psnr(com1, com2, FLAGS.HR_size, FLAGS.HR_size)
ssim_tmp += get_ssim(com1, com2, FLAGS.HR_size, FLAGS.HR_size)
psnr[cnt] = psnr_tmp / test_SR_batch.shape[0]
ssim[cnt] = ssim_tmp / test_SR_batch.shape[0]
ep[cnt] = print_step
t2 = time.time()
print_time = t2 - t1
t1 = t2
print(("[Epoch] : {0:d} [Current cost] : {1:5.8f} \t [Validation PSNR] : {2:5.8f} \t [Duration time] : {3:10.8f} s \n").format(print_step, print_loss, psnr[cnt], print_time))
# tf.reset_default_graph()
cnt += 1
if i % 1000 == 0:
L3_test = model_test(IN_TEST, var_w_list, var_b_list, FLAGS)
output_img = single_HR.copy()
output_img[:,:,:,0:3] = sess.run(L3_test, feed_dict = {IN_TEST:single_LR[:,:,:,0:3]})
tf.reset_default_graph()
subname = FLAGS.img_save_dir + '/' + str(i) + ".jpg"
img_gen(output_img[0,:,:,:], subname)
print(('================= Saving model to {}/model.ckpt ================= \n').format(FLAGS.checkpoint_dir))
time.sleep(5)
# saver.save(sess, FLAGS.checkpoint_dir + '/model.ckpt', print_step)
t_tmp = time.time() - t_start
My configuration is: windows10 + tf1.1 + python3.5 + cuda8.0 + cudnn5.1
================================================================
Besides, I used pixel-shuffle(PS) layer instead of deconvolution in the last layer. I copied the PS code from others, which is shown below:
def _phase_shift(I, r):
bsize, a, b, c = I.get_shape().as_list()
bsize = tf.shape(I)[0] # Handling Dimension(None) type for undefined batch dim
X = tf.reshape(I, (bsize, a, b, r, r))
X = tf.transpose(X, (0, 1, 2, 4, 3)) # bsize, a, b, 1, 1
X = tf.split(X, a, 1) # a, [bsize, b, r, r]
X = tf.concat([tf.squeeze(x, axis=1) for x in X], 2) # bsize, b, a*r, r
X = tf.split(X, b, 1) # b, [bsize, a*r, r]
X = tf.concat([tf.squeeze(x, axis=1) for x in X], 2) # bsize, a*r, b*r
return tf.reshape(X, (bsize, a*r, b*r, 1))
def PS(X, r, color=False):
if color:
Xc = tf.split(X, 3, 3)
X = tf.concat([_phase_shift(x, r) for x in Xc], 3)
else:
X = _phase_shift(X, r)
return X
Which X is the 4-dimensional image tensor, r means the up-scaling factor, color determine whether the channel of images is 3(Ycbcr format) or 1(Grayscale format).
To use the layer is very simple, just like the tf.nn.relu() does:
L3_ps = PS(L3, scale, True)
Now I'm wondering whether this layer caused the slowing-down, because the program goes well when using deconvolution layer. Using deconvolution layer may be a solution, but I have to use PS layer for some reason.

I suspect this line is causing a memory leak (although without seeing the code, I can't say for certain):
L3_test = model_test(IN_TEST, var_w_list, var_b_list, FLAGS)
L3_test seems to be a tf.Tensor (because you later pass it to sess.run(), so it seems likely that model_test() is adding new nodes to the graph each time it is called (every 1000 steps), which causes more work to be done over time.
The solution is quite simple though: since model_test() does not depend on anything calculated in the training loop, you can move the call to outside the training loop, so it is only called once.

Related

Problem with reinforcement learning in keras not learning

My name is Andy and I am new to stackoverflow and this is my first question.
I started learning python 40ish days ago thanks to covid19 and jumped into machine learning/qlearning about 3 weeks ago and got stuck there since.
Goal:
have the computer play Rad Racer 2 (NES racing game) using reinforcement learning.
Plans to make this work:
after various tutorials/sites, I decided to use a double network to train/learn.
2x 256 convolution network using keras since I have watched a few tutorial vids on keras basic
3 actions(hold down accelerate(J), accelerate Left(JA), accelerate Right(JD)
I am using directinput keys codes I found online to send inputs to game as sending regular keys does not work.
I know ppl uses retro gym for these type of games but I wanted to see the inner working of reward/observation and such so I used yolov5 to detect lines/objects. Based on the result from yolov5, I calculate the reward for the step.
My input is a series of grayscale images(4) to represent motion using deque then stacked with numpy.
Once I have gather enough experiences/replay memory(1500) I started the training at the end of each of episode instead of each step. I found that it lag out a lot training after each step.
Problem:
My biggest problem currently is the model does not seem to learn properly. I seem to be slightly okay around episode 20-30 then after that it get worst and worst. It get to a point where it only does one action for hours.
I have tried playing around with the learning rate(0.1 - 0.00001), different inputs(1 bgr layer, grayscale layer, 4 layer..etc), different epsilon decay rate. I commented most of the reward stuffs, only basic reward for now.
most codes beside the yolo stuffs, had to removed a few lines due to # character limitation
# parameters
training = True
learning_rate = 0.0001
DISCOUNT = 0.99
REPLAY_MEMORY_SIZE = 50_000 # How many last steps to keep for model training
MIN_REPLAY_MEMORY_SIZE = 1500 # Minimum number of steps in a memory to start training
MINIBATCH_SIZE = 1000 # How many steps (samples) to use for training
batch_size = 32
UPDATE_TARGET_EVERY = 0 # Terminal states (end of episodes)
MODEL_NAME = 'RC'
MIN_REWARD = 0 # For model save
save_every = 5 # save every x episodes
EPISODES = 2_000
# Exploration settings
if training is True:
epsilon = 1 # not a constant, going to be decayed
else:
epsilon = 0
MIN_EPSILON = 0.01
START_EPISODE_DECAY = 0
END_EPISODE_DECAY = 20
if epsilon > MIN_EPSILON:
EPS_DECAY = -(epsilon/((END_EPISODE_DECAY-START_EPISODE_DECAY)/epsilon))
else:
EPS_DECAY = 0
# Agent class
class DQNAgent:
def __init__(self):
# Main model
self.model = self.create_model()
# self.model = self.load_model()
# Target network
self.target_model = self.create_model()
self.target_model.set_weights(self.model.get_weights())
# An array with last n steps for training
self.replay_memory = deque(maxlen=REPLAY_MEMORY_SIZE)
# Used to count when to update target network with main network's weights
self.target_update_counter = 0
def create_model(self):
dropout = 0.1
model = Sequential()
model.add(Conv2D(256, (2, 2), input_shape=(int(height/resize_ratio), int(width/resize_ratio), img_channels)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(dropout))
model.add(Conv2D(256, (2, 2)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(dropout))
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(env.ACTION_SPACE_SIZE, activation='linear')) # ACTION_SPACE_SIZE = how many choices (9)
model.compile(loss="mse", optimizer=Adam(lr=learning_rate), metrics=['accuracy'])
return model
# Trains main network at end of episode
def train(self, terminal_state):
# Start training only if certain number of samples is already saved
if len(self.replay_memory) < MIN_REPLAY_MEMORY_SIZE:
return
minibatch = random.sample(self.replay_memory, MINIBATCH_SIZE)
current_states = np.array([transition[0] for transition in minibatch])
# from (MINIBATCH_SIZE, 1, h, w, 4) > (MINIBATCH_SIZE, h, w, 4)
current_states = current_states.reshape(current_states.shape[0], current_states.shape[2],
current_states.shape[3], current_states.shape[4])
current_qs_list = self.model.predict(current_states)
new_current_states = np.array([transition[3] for transition in minibatch])
new_current_states = new_current_states.reshape(new_current_states.shape[0], new_current_states.shape[2],
new_current_states.shape[3], new_current_states.shape[4])
# new_current_states = np.expand_dims(new_current_states, axis=-1)
future_qs_list = self.target_model.predict(new_current_states)
X = []
y = []
for index, (current_state_img, current_action, current_reward, new_current_img, current_done) in enumerate(minibatch):
if not current_done:
max_future_q = np.max(future_qs_list[index])
new_q = current_reward + (DISCOUNT * max_future_q)
else:
new_q = 0.0
current_qs = current_qs_list[index]
current_qs[current_action] = new_q
X.append(np.squeeze(current_state_img, axis=0))
y.append(current_qs)
X = np.array(X)
# X = np.expand_dims(X, axis=-1)
# X = X.reshape(X.shape[0], X.shape[2], X.shape[3], X.shape[4])
y = np.array(y)
self.model.fit(X, y, batch_size=batch_size, verbose=0, shuffle=False)
# self.model.train_on_batch(X, y)
if terminal_state:
self.target_update_counter += 1
# If counter reaches set value, update target network with weights of main network
if self.target_update_counter > UPDATE_TARGET_EVERY:
self.target_model.set_weights(self.model.get_weights())
self.target_update_counter = 0
print('target_model trained!')
# Queries main network for Q values given current observation space (environment state)
def get_qs(self, state):
result = agent.model.predict(state)
result = result[0]
return result
agent = DQNAgent()
current_img_stack = deque(maxlen=4)
# make the game active
game = gw.getWindowsWithTitle('Mesen')[0]
game.activate()
time.sleep(1)
release_all()
# Iterate over episodes
for episode in tqdm(range(1, EPISODES + 1), ascii=True, unit='episodes'):
episode_reward = 0
step = 1
if episode <= START_EPISODE_DECAY - 1:
start_epsilon = False
elif episode >= END_EPISODE_DECAY + 1:
start_epsilon = False
else:
start_epsilon = True
# Reset environment and get initial state
# blackscreens followed by the 1st screen starting out
current_state = env.reset()
blackscreen = np.zeros_like(current_state)
current_img_stack.append(blackscreen)
current_img_stack.append(blackscreen)
current_img_stack.append(blackscreen)
current_img_stack.append(current_state)
stacked_state = np.stack(current_img_stack, axis=2)
stacked_state = np.ascontiguousarray(stacked_state, dtype=np.float32) / 255
stacked_state = np.transpose(stacked_state, (1, 0, 2))
stacked_state = np.expand_dims(stacked_state, axis=0)
start_time = time.time()
# Reset flag and start iterating until episode ends
done = False
while not done:
if np.random.random() > epsilon:
action = np.argmax(agent.get_qs(stacked_state))
else:
action = np.random.randint(0, env.ACTION_SPACE_SIZE)
new_state, reward, done, prediction, preview = env.step(action)
if done is False:
next_img_stack = current_img_stack
next_img_stack.append(new_state)
next_stack = np.stack(next_img_stack, axis=2)
next_stack = np.ascontiguousarray(next_stack, dtype=np.float32) / 255
next_stack = np.transpose(next_stack, (1, 0, 2))
next_stack = np.expand_dims(next_stack, axis=0)
# current_state = new_state
current_img_stack = next_img_stack
stacked_state = next_stack
else:
next_img_stack = current_img_stack
next_img_stack.append(blackscreen)
next_stack = np.stack(next_img_stack, axis=2)
next_stack = np.ascontiguousarray(next_stack, dtype=np.float32) / 255
next_stack = np.transpose(next_stack, (1, 0, 2))
next_stack = np.expand_dims(next_stack, axis=0)
step += 1
episode_reward += reward
ep_rewards.append(episode_reward)
if SHOW_PREVIEW:
env.render(preview, prediction)
if training is True:
agent.update_replay_memory((stacked_state, action, reward, next_stack, done))
# print(episode_reward)
if done is True:
ep_reward_final.append(episode_reward)
print(' Epsilon(' + str(epsilon) + ') EPtimes(' + str(time.time() - start_time) + ') done('
+ str(done) + ') step(' + str(step) + ') EPreward(' + str(episode_reward) +
') best_reward_this_session(' + str(max(ep_reward_final)) + ') fps(' +
str(step/(time.time() - start_time)) + ')')
# plot(ep_reward_final)
if training is True:
agent.train(done)
# Decay epsilon
if show_info is False and epsilon <= MIN_EPSILON:
print(f"\nEPS_DECAY ended on episode {episode} - epsilon {epsilon}")
epsilon = MIN_EPSILON
show_info = True
elif start_epsilon is True:
epsilon += EPS_DECAY

How to create my own loss function in Pytorch?

I'd like to create a model that predicts parameters of a circle (coordinates of center, radius).
Input is an array of points (of arc with noise):
def generate_circle(x0, y0, r, start_angle, phi, N, sigma):
theta = np.linspace(start_angle*np.pi/180, (start_angle + phi)*np.pi/180, num=N)
x = np.array([np.random.normal(r*np.cos(t) + x0 , sigma, 1)[0] for t in theta])
y = np.array([np.random.normal(r*np.sin(t) + y0 , sigma, 1)[0] for t in theta])
return x, y
n_x = 1000
start_angle = 0
phi = 90
N = 100
sigma = 0.005
x_full = []
for i in range(n_x):
x0 = np.random.normal(0 , 10, 1)[0]
y0 = np.random.normal(0 , 10, 1)[0]
r = np.random.normal(0 , 10, 1)[0]
x, y = generate_circle(x0, y0, r, start_angle, phi, N, sigma)
x_full.append(np.array([ [x[i], y[i]] for i in range(len(x))]))
X = torch.from_numpy(np.array(x_full))
print(X.size()) # torch.Size([1000, 100, 2])
Output: [x_c, y_c, r]
As a loss function I need to use this one:
I tried to implement something like the following:
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden)
self.predict = torch.nn.Linear(n_hidden, n_output)
def forward(self, x):
x = F.relu(self.hidden(x))
x = self.predict(x)
return x
# It doesn't work, it's just an idea
def my_loss(point, params):
arr = ((point[:, 0] - params[:, 0])**2 + (point[:, 1] - params[:, 1])**2 - params[:, 2]**2)**2
loss = torch.sum(arr)
return loss
# For N pairs (x, y) model predicts parameters of circle
net = Net(n_feature=N*2, n_hidden=10, n_output=3)
optimizer = torch.optim.SGD(net.parameters(), lr=1e-4)
for t in range(1000):
prediction = net(X.view(n_x, N*2).float())
loss = my_loss(X, prediction)
print(f"loss: {loss}")
optimizer.zero_grad()
loss.backward()
optimizer.step()
So, the question is how to correctly implement my own loss function in terms of Pytorch in this case?
Or how to change the model's structure to get expected results?
You're trying to create a loss between the predicted outputs and the inputs instead of between the predicted outputs and the true outputs. To do this you need to save the true values of x0, y0, and r when you generate them.
n_x = 1000
start_angle = 0
phi = 90
N = 100
sigma = 0.005
x_full = []
targets = [] # <-- Here
for i in range(n_x):
x0 = np.random.normal(0 , 10, 1)[0]
y0 = np.random.normal(0 , 10, 1)[0]
r = np.random.normal(0 , 10, 1)[0]
targets.append(np.array([x0, y0, r])) # <-- Here
x, y = generate_circle(x0, y0, r, start_angle, phi, N, sigma)
x_full.append(np.array([ [x[i], y[i]] for i in range(len(x))]))
X = torch.from_numpy(np.array(x_full))
Y = torch.from_numpy(np.array(targets)) # <-- Here
print(X.size()) # torch.Size([1000, 100, 2])
print(Y.size()) # torch.Size([1000, 3])
Now, when you call my_loss you should use:
loss = my_loss(Y, prediction)
You are passing in all your data points every iteration of your for loop, I would split your data into smaller sections so that your model doesn't just learn to output the same values every time. e.g. you have generated 1000 points so pass in a random selection of 100 in each iteration using something like random.sample(...)
Your input numbers are pretty large which means your loss will be huge, so generate inputs between 0 and 1 and then if you need the value to be between 0 and 10 you can just multiply by 10.

TensorFlow, losses after training the model are different than losses printed during the last Epoch of Stochastic Gradient Descent.

I'm trying to do binary classification on two spirals. For testing, I am feeding my neural network the exact spiral data with no noise, and the model seems to work as the losses near 0 during SGD. However, after using my model to infer the exact same data points after SGD has completed, I get completely different losses than what was printed during the last epoch of SGD.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
np.set_printoptions(threshold=np.nan)
# get the spiral points
t_p = np.linspace(0, 4, 1000)
x1_p = t_p * np.cos(t_p*2*np.pi)
y1_p = t_p * np.sin(t_p*2*np.pi)
x2_p = t_p * np.cos(t_p*2*np.pi + np.pi)
y2_p = t_p * np.sin(t_p*2*np.pi + np.pi)
plt.plot(x1_p, y1_p, x2_p, y2_p)
# generate data points
x1_dat = x1_p
y1_dat = y1_p
x2_dat = x2_p
y2_dat = y2_p
def model_variable(shape, name, initializer):
variable = tf.get_variable(name=name,
dtype=tf.float32,
shape=shape,
initializer=initializer
)
tf.add_to_collection('model_variables', variable)
return variable
class Model():
#layer specifications includes bias nodes
def __init__(self, sess, data, nEpochs, learning_rate, layer_specifications):
self.sess = sess
self.data = data
self.nEpochs = nEpochs
self.learning_rate = learning_rate
if layer_specifications[0] != 2 or layer_specifications[-1] != 1:
raise ValueError('First layer only two nodes, last layer only 1 node')
else:
self.layer_specifications = layer_specifications
self.build_model()
def build_model(self):
# x is the two nodes that will be layer one, will input an x, y coordinate
# and need to classify which spiral is it on, the non phase shifted or the phase
# shifted one.
# y is the output of the model
self.x = tf.placeholder(tf.float32, shape=[2, 1])
self.y = tf.placeholder(tf.float32, shape=[])
self.thetas = []
self.biases = []
for i in range(1, len(self.layer_specifications)):
self.thetas.append(model_variable([self.layer_specifications[i], self.layer_specifications[i-1]], 'theta'+str(i), tf.random_normal_initializer(stddev=0.1)))
self.biases.append(model_variable([self.layer_specifications[i], 1], 'bias'+str(i), tf.constant_initializer()))
#forward propagation
intermediate = self.x
for i in range(0, len(self.layer_specifications)-1):
if i != (len(self.layer_specifications) - 2):
intermediate = tf.nn.elu(tf.add(tf.matmul(self.thetas[i], intermediate), self.biases[i]))
else:
intermediate = tf.add(tf.matmul(self.thetas[i], intermediate), self.biases[i])
self.yhat = tf.squeeze(intermediate)
self.loss = tf.nn.sigmoid_cross_entropy_with_logits(self.yhat, self.y);
def train_init(self):
model_variables = tf.get_collection('model_variables')
self.optim = (
tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)
.minimize(self.loss, var_list=model_variables)
)
self.check = tf.add_check_numerics_ops()
self.sess.run(tf.initialize_all_variables())
# here is where x and y combine to get just x in tf with shape [2, 1] and where label becomes y in tf
def train_iter(self, x, y):
loss, _, _ = sess.run([self.loss, self.optim, self.check],
feed_dict = {self.x: x, self.y: y})
print('loss: {0} on:{1}'.format(loss, x))
# here x and y are still x and y coordinates, label is separate
def train(self):
for _ in range(self.nEpochs):
for x, y, label in self.data():
print(label)
self.train_iter([[x], [y]], label)
print("NEW ONE:\n")
# here x and y are still x and y coordinates, label is separate
def infer(self, x, y, label):
return self.sess.run((tf.sigmoid(self.yhat), self.loss), feed_dict={self.x : [[x], [y]], self.y : label})
def data():
#so first spiral is label 0, second is label 1
for _ in range(len(x1_dat)-1, -1, -1):
for dat in range(2):
if dat == 0:
yield x1_dat[_], y1_dat[_], 0
else:
yield x2_dat[_], y2_dat[_], 1
layer_specifications = [2, 100, 100, 100, 1]
sess = tf.Session()
model = Model(sess, data, nEpochs=10, learning_rate=1.1e-2, layer_specifications=layer_specifications)
model.train_init()
model.train()
inferrences_1 = []
inferrences_2 = []
losses = 0
for i in range(len(t_p)-1, -1, -1):
infer, loss = model.infer(x1_p[i], y1_p[i], 0)
if infer >= 0.5:
print('loss: {0} on point {1}, {2}'.format(loss, x1_p[i], y1_p[i]))
losses = losses + 1
inferrences_1.append('r')
else:
inferrences_1.append('g')
for i in range(len(t_p)-1, -1, -1):
infer, loss = model.infer(x2_p[i], y2_p[i], 1)
if infer >= 0.5:
inferrences_2.append('r')
else:
print('loss: {0} on point {1}, {2}'.format(loss, x2_p[i], y2_p[i]))
losses = losses + 1
inferrences_2.append('g')
print('total losses: {}'.format(losses))
plt.scatter(x1_p, y1_p, c=inferrences_1)
plt.scatter(x2_p, y2_p, c=inferrences_2)
plt.show()

How to get trainable weights for a manual run of session in Keras?

Because I'm manually running a session, I can't seem to collect the trainable weights of a specific layer.
x = Convolution2D(16, 3, 3, init='he_normal', border_mode='same')(img)
for i in range(0, self.blocks_per_group):
nb_filters = 16 * self.widening_factor
x = residual_block(x, nb_filters=nb_filters, subsample_factor=1)
for i in range(0, self.blocks_per_group):
nb_filters = 32 * self.widening_factor
if i == 0:
subsample_factor = 2
else:
subsample_factor = 1
x = residual_block(x, nb_filters=nb_filters, subsample_factor=subsample_factor)
for i in range(0, self.blocks_per_group):
nb_filters = 64 * self.widening_factor
if i == 0:
subsample_factor = 2
else:
subsample_factor = 1
x = residual_block(x, nb_filters=nb_filters, subsample_factor=subsample_factor)
x = BatchNormalization(axis=3)(x)
x = Activation('relu')(x)
x = AveragePooling2D(pool_size=(8, 8), strides=None, border_mode='valid')(x)
x = tf.reshape(x, [-1, np.prod(x.get_shape()[1:].as_list())])
# Readout layer
preds = Dense(self.nb_classes, activation='softmax')(x)
loss = tf.reduce_mean(categorical_crossentropy(labels, preds))
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
with sess.as_default():
for i in range(10):
batch = self.next_batch(self.batch_num)
_, l = sess.run([optimizer, loss],
feed_dict={img: batch[0], labels: batch[1]})
print(l)
print(type(weights))
I'm trying to get the weights of the last convolution layer.
I tried get_trainable_weights(layer) and layer.get_weights()but I did not manage to get anywhere.
The error
AttributeError: 'Tensor' object has no attribute 'trainable_weights'
From looking at the source* it seems like your looking for layer.trainable_weights (it's a list not a member function). Please note this returns tensors.
If you want to get their actual values, you need to evaluate them in a session:
weights1, weights2 = sess.run([weight_tensor_1, weight_tensor_2])
*https://github.com/fchollet/keras/blob/master/keras/layers/convolutional.py#L401

finding optimum lambda and features for polynomial regression

I am new to Data Mining/ML. I've been trying to solve a polynomial regression problem of predicting the price from given input parameters (already normalized within range[0, 1])
I'm quite close as my output is in proportion to the correct one, but it seems a bit suppressed, my algorithm is correct, just don't know how to reach to an appropriate lambda, (regularized parameter) and how to decide to what extent I should populate features as the problem says : "The prices per square foot, are (approximately) a polynomial function of the features. This polynomial always has an order less than 4".
Is there a way we could visualize data to find optimum value for these parameters, like we find optimal alpha (step size) and number of iterations by visualizing cost function in linear regression using gradient descent.
Here is my code : http://ideone.com/6ctDFh
from numpy import *
def mapFeature(X1, X2):
degree = 2
out = ones((shape(X1)[0], 1))
for i in range(1, degree+1):
for j in range(0, i+1):
term1 = X1**(i-j)
term2 = X2 ** (j)
term = (term1 * term2).reshape( shape(term1)[0], 1 )
"""note that here 'out[i]' represents mappedfeatures of X1[i], X2[i], .......... out is made to store features of one set in out[i] horizontally """
out = hstack(( out, term ))
return out
def solve():
n, m = input().split()
m = int(m)
n = int(n)
data = zeros((m, n+1))
for i in range(0, m):
ausi = input().split()
for k in range(0, n+1):
data[i, k] = float(ausi[k])
X = data[:, 0 : n]
y = data[:, n]
theta = zeros((6, 1))
X = mapFeature(X[:, 0], X[:, 1])
ausi = computeCostVect(X, y, theta)
# print(X)
print("Results usning BFGS : ")
lamda = 2
theta, cost = findMinTheta(theta, X, y, lamda)
test = [0.05, 0.54, 0.91, 0.91, 0.31, 0.76, 0.51, 0.31]
print("prediction for 0.31 , 0.76 (using BFGS) : ")
for i in range(0, 7, 2):
print(mapFeature(array([test[i]]), array([test[i+1]])).dot( theta ))
# pyplot.plot(X[:, 1], y, 'rx', markersize = 5)
# fig = pyplot.figure()
# ax = fig.add_subplot(1,1,1)
# ax.scatter(X[:, 1],X[:, 2], s=y) # Added third variable income as size of the bubble
# pyplot.show()
The current output is:
183.43478288
349.10716957
236.94627602
208.61071682
The correct output should be:
180.38
1312.07
440.13
343.72

Resources