TensorFlow 2.0 : ValueError - No Gradients Provided (After Modifying DDPG Actor) - python-3.x

Background
I'm currently trying to implement a DDPG framework to control a simple car agent. At first, the car agent would only need to learn how to reach the end of a straight path as quickly as possible by adjusting its acceleration. This task was simple enough, so I decided to introduce an additional steering action as well. I updated my observation and action spaces accordingly.
The lines below are the for loop that runs each episode:
for i in range(episodes):
observation = env.reset()
done = False
score = 0
while not done:
action = agent.choose_action(observation, evaluate)
observation_, reward, done, info = env.step(action)
score += reward
agent.remember(observation, action, reward, observation_, done)
if not load_checkpoint:
agent.learn()
observation = observation_
The lines below are my choose_action and learn functions:
def choose_action(self, observation, evaluate=False):
state = tf.convert_to_tensor([observation], dtype=tf.float32)
actions = self.actor(state)
if not evaluate:
actions += tf.random.normal(shape=[self.n_actions],
mean=0.0, stddev=self.noise)
actions = tf.clip_by_value(actions, self.min_action, self.max_action)
return actions[0]
def learn(self):
if self.memory.mem_cntr < self.batch_size:
return
state, action, reward, new_state, done = \
self.memory.sample_buffer(self.batch_size)
states = tf.convert_to_tensor(state, dtype=tf.float32)
states_ = tf.convert_to_tensor(new_state, dtype=tf.float32)
rewards = tf.convert_to_tensor(reward, dtype=tf.float32)
actions = tf.convert_to_tensor(action, dtype=tf.float32)
with tf.GradientTape() as tape:
target_actions = self.target_actor(states_)
critic_value_ = tf.squeeze(self.target_critic(
states_, target_actions), 1)
critic_value = tf.squeeze(self.critic(states, actions), 1)
target = reward + self.gamma*critic_value_*(1-done)
critic_loss = keras.losses.MSE(target, critic_value)
critic_network_gradient = tape.gradient(critic_loss,
self.critic.trainable_variables)
self.critic.optimizer.apply_gradients(zip(
critic_network_gradient, self.critic.trainable_variables))
with tf.GradientTape() as tape:
new_policy_actions = self.actor(states)
actor_loss = -self.critic(states, new_policy_actions)
actor_loss = tf.math.reduce_mean(actor_loss)
actor_network_gradient = tape.gradient(actor_loss,
self.actor.trainable_variables)
self.actor.optimizer.apply_gradients(zip(
actor_network_gradient, self.actor.trainable_variables))
self.update_network_parameters()
And finally, my ActorNetwork is as follows:
class ActorNetwork(keras.Model):
def __init__(self, fc1_dims=512, fc2_dims=512, n_actions=2, name='actor',
chkpt_dir='tmp/ddpg'):
super(ActorNetwork, self).__init__()
self.fc1_dims = fc1_dims
self.fc2_dims = fc2_dims
self.n_actions = n_actions
self.model_name = name
self.checkpoint_dir = chkpt_dir
self.checkpoint_file = os.path.join(self.checkpoint_dir,
self.model_name+'_ddpg.h5')
self.fc1 = Dense(self.fc1_dims, activation='relu')
self.fc2 = Dense(self.fc2_dims, activation='relu')
self.mu = Dense(self.n_actions, activation='tanh')
def call(self, state):
prob = self.fc1(state)
prob = self.fc2(prob)
mu = self.mu(prob) * 3.5
return mu
Note: The code I'm working with is just building off of the code from this tutorial
The Problem
Up until now, I hadn't faced any issues with the code but I did want to adjust the maximum/minimum values of my actions. When I was only considering the acceleration action, I simply multiplied mu by 3.5. However, I wanted the steering actions to exist within a range of -30 to 30 degrees, but I couldn't just multiply mu as I had before. To try to adjust the desired steering range, I made the following (not so elegant) changes to my ActorNetwork
def call(self, state):
prob = self.fc1(state)
prob = self.fc2(prob)
mu = self.mu(prob)# * 3.5
mu_ = []
mu_l = mu.numpy().tolist()
for i, elem1 in enumerate(mu_l):
temp_ = []
for j, elem2 in enumerate(elem1):
if j-1 == 0:
temp_.append(float(elem2 * 3.5))
else:
temp_.append(float(elem2 * math.radians(30)))
mu_.append(temp_)
mu = tf.convert_to_tensor(mu_, dtype=tf.float32)
return mu
The new lines that I added were meant to:
Convert the mu tensor into a list
Iterate through the elements in the mu list (mu_l) and if a value had an index of 0 (acceleration) then multiply by 3.5; otherwise, multiply the value at index=1 (steering) by the radians conversion of 30 degrees.
Append each adjusted value into a new list (mu_)
Set mu to be equal to a tensor conversion of mu_
It was at this point that I ran into the following error:
ValueError: No gradients provided for any variable: ['actor_network/dense/kernel:0', 'actor_network/dense/bias:0', 'actor_network/dense_1/kernel:0', 'actor_network/dense_1/bias:0', 'actor_network/dense_2/kernel:0', 'actor_network/dense_2/bias:0'].
I have tried to find solutions provided within StackOverflow and from outside sources (e.g. including watch, checking to make sure that I am using model() instead of model.predict() while in GradientTape(), making sure I'm not performing calculations outside of the Tape context) but I haven't had any luck resolving the issue. I suspect that my issue is similar to the one presented in this previous post but I'm not sure how to diagnose whether my problem stems from also overwritting mu with a tensor. Does anyone have any insight regarding this problem?

The issue has been resolved thanks to some simple but helpful advice I received on Reddit. I was disrupting the tracking of my variables by making changes using my custom for-loop. I should have used a TensorFlow function instead. The following changes fixed the problem for me:
def call(self, state):
prob = self.fc1(state)
prob = self.fc2(prob)
mu = self.mu(prob)
mult = tf.convert_to_tensor([3.5, math.radians(30)], dtype=tf.float32)
mu = tf.math.multiply(mu, mult)
return mu

Related

Some weights of Actor Critic model not updating

I am working on an Actor-Critic model in Pytorch. The model first receives the input in an RNN and then the policy net comes into play. The code for Policy net is:
class Policy(nn.Module):
"""
implements both actor and critic in one model
"""
def __init__(self):
super(Policy, self).__init__()
self.fc1 = nn.Linear(state_size+1, 128)
self.fc2 = nn.Linear(128, 64)
# actor's layer
self.action_head = nn.Linear(64, action_size)
self.mu = nn.Sigmoid()
self.var = nn.Softplus()
# critic's layer
self.value_head = nn.Linear(64, 1)
def forward(self, x):
"""
forward of both actor and critic
"""
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
# actor: choses action to take from state s_t
# by returning probability of each action
action_prob = self.action_head(x)
mu = self.mu(action_prob)
var = self.var(action_prob)
# critic: evaluates being in the state s_t
state_values = self.value_head(x)
return mu, var, state_values
policy = Policy()
In model class, we are calling this policy after the rnn. And in agent class’s act method, we are calling the model to get the action like this:
def act(self, some_input, state):
mu, var, state_value = self.model(some_input, state)
mu = mu.data.cpu().numpy()
sigma = torch.sqrt(var).data.cpu().numpy()
action = np.random.normal(mu, sigma)
action = np.clip(action, 0, 1)
action = torch.from_numpy(action/1000)
return action, state_value
I must mention that in optimizer, we are calling the model.parameters. When we print all the trainable parameters in each epoch, we see that everything else is changing except for the policy.action_head. Any idea why this is happening? I must also mention how the losses are calculated now:
advantage = reward - Value
Lp = -math.log(pdf_prob_now)*advantage
policy_losses.append(Lp)
#similar for value_losses
#after all the runs in the epoch is done
loss = torch.stack(policy_losses).sum() + alpha*torch.stack(value_losses).sum()
loss.backward()
Here Value is the state_value (the 2nd output from agent.act) and the pdf_prob_now is the probability of the action from all possible actions which is calculated like this:
def find_pdf(policy, action, rnn_output):
mu, var, _ = policy(rnn_output)
mu = mu.data.cpu().numpy()
sigma = torch.sqrt(var).data.cpu().numpy()
pdf_probability = stats.norm.pdf(action.cpu(), loc=mu, scale=sigma)
return pdf_probability
Is there some logical error here?
the bug is in act function
def act(self, some_input, state):
# mu contains info required for gradient
mu, var, state_value = self.model(some_input, state)
# mu is detached and now has forgot all the operations performed
# in self.action_head
mu = mu.data.cpu().numpy()
sigma = torch.sqrt(var).data.cpu().numpy()
action = np.random.normal(mu, sigma)
action = np.clip(action, 0, 1)
action = torch.from_numpy(action/1000)
return action, state_value
for the further process, if loss is calculated using tensor operations performed on action, it can not be traced back to update self.action_head weights, as you detached the tensor mu which removes it from the computation graph and so you do not see any updates in self.action_head.

Cannot use model.train_on_batch when using #tf.function

I'm trying to understand how to use #tf.function properly in a A2C problem.
I constantly get the following error:
Cannot convert a symbolic Keras input/output to a numpy array. This error may indicate that you're trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.
The agent is built as follows:
class Agent():
learning_rate = 0.0001
CLIP_EDGE = 1e-8
entropy = 0.0001
critic_weight = 0.95
def __init__(self,state_shape,action_size,hidden_neurons,memory,learning_rate = learning_rate, CLIP_EDGE = CLIP_EDGE, entropy = entropy,
critic_weight = critic_weight, actor_name = "actor",critic_name = "critic", policy_name = "policy",main_folder = "main_folder"):
self.state_shape = state_shape
self.action_size = action_size
self.hidden_neurons = hidden_neurons
self.memory = memory
self.learning_rate = learning_rate
self.CLIP_EDGE = CLIP_EDGE
self.entropy = entropy
self.critic_weight = critic_weight
self.actor_name = actor_name
self.critic_name = critic_name
self.policy_name = policy_name
self.main_folder = main_folder
self.actor, self.critic, self.policy = self.build_networks()
def act(self, state):
"""Selects an action for the agent to take given a game state.
Args:
state (list of numbers): The state of the environment to act on.
traning (bool): True if the agent is training.
Returns:
(int) The index of the action to take.
"""
# If not acting randomly, take action with highest predicted value.
state_batch = np.expand_dims(state, axis=0)
probabilities = self.policy.predict(state_batch)[0]
action = np.random.choice(self.action_size, p=probabilities)
return action
def learn(self, print_variables=False):
"""Trains the Deep Q Network based on stored experiences."""
gamma = self.memory.gamma
experiences = self.memory.sample()
state_mb, action_mb, reward_mb, dones_mb, next_value = experiences
# One hot enocde actions
actions = np.zeros([len(action_mb), self.action_size])
actions[np.arange(len(action_mb)), action_mb] = 1
#Apply TD(0)
discount_mb = reward_mb + next_value * gamma * (1 - dones_mb)
state_values = self.critic.predict([state_mb])
advantages = discount_mb - np.squeeze(state_values)
if print_variables:
print("discount_mb", discount_mb)
print("next_value", next_value)
print("state_values", state_values)
print("advantages", advantages)
else:
self.actor.train_on_batch(
[state_mb, advantages], [actions, discount_mb])
def build_networks(self):
"""Creates Actor Critic Neural Networks.
Creates a hidden-layer Policy Gradient Neural Network. The loss
function is altered to be a log-likelihood function weighted
by an action's advantage.
"""
state_input = Input(shape=self.state_shape, name='frames')
advantages = Input((1,), name='advantages') # PG, A instead of G
# PG
actor_1 = Dense(units=self.hidden_neurons, activation="relu",name='actor1')(state_input)
actor_3 = Dense(units=int(self.hidden_neurons), activation="relu",name='actor3')(actor_1)
adrop_1 = Dropout(0.2,name='actor_drop_1')(actor_3)
actor_4 = Dense(units = self.hidden_neurons, activation="relu")(adrop_1)
probabilities = Dense(self.action_size, activation='softmax',name='actor_output')(actor_4)
# DQN
critic_1 = Dense(units = self.hidden_neurons,activation="relu",name='critic1')(state_input)
critic_3 = Dense(units = int(self.hidden_neurons), activation="relu",name='critic3')(critic_1)
cdrop_1 = Dropout(0.2,name='critic_drop_1')(critic_3)
critic_4 = Dense(units = self.hidden_neurons, activation="relu")(cdrop_1) #activation era relu por error... se cambio a elu, MONITOREAR
values = Dense(1, activation='linear',name='critic_output')(critic_4)
def actor_loss(y_true, y_pred): # PG
y_pred_clipped = K.clip(y_pred, self.CLIP_EDGE, 1-self.CLIP_EDGE)
log_lik = y_true*K.log(y_pred_clipped)
entropy_loss = y_pred * K.log(K.clip(y_pred, self.CLIP_EDGE, 1-self.CLIP_EDGE)) # New
return K.sum(-log_lik * advantages) - (self.entropy * K.sum(entropy_loss))
# Train both actor and critic at the same time.
actor = Model(
inputs=[state_input, advantages], outputs=[probabilities, values])
actor.compile(
loss=[actor_loss, 'mean_squared_error'], # [PG, DQN]
loss_weights=[1, self.critic_weight], # [PG, DQN]
optimizer=Adam(learning_rate=self.learning_rate))#,clipnorm=1.0))
critic = Model(inputs=[state_input], outputs=[values])
policy = Model(inputs=[state_input], outputs=[probabilities])
tf.keras.utils.plot_model(actor,f"{self.main_folder}/Agents/{self.actor_name}.png",show_shapes=True)
tf.keras.utils.plot_model(critic,f"{self.main_folder}/Agents/{self.critic_name}.png",show_shapes=True)
tf.keras.utils.plot_model(policy,f"{self.main_folder}/Agents/{self.policy_name}.png",show_shapes=True)
return actor, critic, policy
The loop where the agent interacts with the environment is this:
with tf.Graph().as_default():
agent = Agent()
environment = Environment()
state = environment.reset()
done = False
while not done:
acion = agent.act(state)
state,reward,done,info = environment.step(action)
next_value = agent.critic.predict([[state]])
agent.memory.add((state,action,reward,done,next_value))
if agent.memory.full():
agent.learn()
This works fine. My problem comes when I try to switch to use #tf.function because it seems (afaik) that increases training speed (also did a little of benchmark in a jupyter notebook and it is actually faster).
The "refactored" code is this:
The main loop:
agent = Agent()
environment = Environment()
state = environment.reset()
done = False
while not done:
acion = agent.act(state)
state,reward,done,info = environment.step(action)
next_value = agent.model_predict(agent.critic,[[state]]).numpy() #REMOVED .predict FROM MODEL
agent.memory.add((state,action,reward,done,next_value))
if agent.memory.full():
agent.learn()
The modified functions in the Agent class:
#tf.function #NEW FUNCTION ADDED USING #tf.function
def model_predict(self,model,x):
return model(x)
def act(self, state): #MODIFIED FUNCTION, NOW USES self.model_predict
"""Selects an action for the agent to take given a game state.
Args:
state (list of numbers): The state of the environment to act on.
traning (bool): True if the agent is training.
Returns:
(int) The index of the action to take.
"""
# If not acting randomly, take action with highest predicted value.
state_batch = np.expand_dims(state, axis=0)
probabilities = self.model_predict(self.policy,state_batch).numpy()[0]
action = np.random.choice(self.action_size, p=probabilities)
return action
def learn(self, print_variables=False): #MODIFIED FUNCTION, NOW USES self.model_predict
"""Trains the Deep Q Network based on stored experiences."""
gamma = self.memory.gamma
experiences = self.memory.sample()
state_mb, action_mb, reward_mb, dones_mb, next_value = experiences
# One hot enocde actions
actions = np.zeros([len(action_mb), self.action_size])
actions[np.arange(len(action_mb)), action_mb] = 1
#Apply TD(0)
discount_mb = reward_mb + next_value * gamma * (1 - dones_mb)
state_values = self.model_predict(self.critic,[state_mb]).numpy()
advantages = discount_mb - np.squeeze(state_values)
if print_variables:
print("discount_mb", discount_mb)
print("next_value", next_value)
print("state_values", state_values)
print("advantages", advantages)
else:
self.actor.train_on_batch(
[state_mb, advantages], [actions, discount_mb])
The error is triggered when self.actor.train_on_batch is executed, giving me the error mentioned above. Why this happens and what I'm doing wrong?

problem changing input type in pytorch reinforcement learning

I'm trying to tun a code I found on github, but keep crashing in one part and getting TypeError: only size-1 arrays can be converted to Python scalars error. I've been trying to solve it for two days now. If I understand the problem correctly I have incompatible dtypes of tensors yet I've no idea how to fix this. Everytime I try to change my tensor type to say DoubleTensor, or float64 I get other errors. I simply haven't got a clue now which part needs to be changed and how.
This is my model defined:
class Policy(nn.Module):
def __init__(self):
super(Policy, self).__init__()
self.input_layer = nn.Linear(11, 128)
self.hidden_1 = nn.Linear(128, 128)
self.hidden_2 = nn.Linear(32,31)
self.hidden_state = torch.tensor(torch.zeros(2,1,32)).cuda()
self.rnn = nn.GRU(128, 32, 2)
self.action_head = nn.Linear(31, 5)
self.value_head = nn.Linear(31, 1)
self.saved_actions = []
self.rewards = []
def reset_hidden(self):
self.hidden_state = torch.tensor(torch.zeros(2,1,32)).cuda()
def forward(self, x):
x = torch.tensor(x).cuda()
x = torch.sigmoid(self.input_layer(x))
x = torch.tanh(self.hidden_1(x))
x, self.hidden_state = self.rnn(x.view(1,-1,128), self.hidden_state.data)
x = F.relu(self.hidden_2(x.squeeze()))
action_scores = self.action_head(x)
state_values = self.value_head(x)
return F.softmax(action_scores, dim=-1), state_values
def forward(self, x):
conv_out = self.conv(x).view(x.size()[0], -1)
val = self.fc_val(conv_out)
adv = self.fc_adv(conv_out)
return val + (adv - adv.mean(dim=1, keepdim=True))
def act(self, state):
probs, state_value = self.forward(state)
m = Categorical(probs)
action = m.sample()
if action == 1 and env.state[0] < 1: action = torch.LongTensor([2]).squeeze().cuda.DoubleTensor()
if action == 4 and env.state[1] < 1: action = torch.LongTensor([2]).squeeze().cuda.DoubleTensor()
if action == 6 and env.state[2] < 1: action = torch.LongTensor([2]).squeeze().cuda.DoubleTensor()
self.saved_actions.append((m.log_prob(action), state_value))
return action.item()
This is where it crashes
env.reset()
# In case you're running this a second time with the same model, delete the gradients
del model.rewards[:]
del model.saved_actions[:]
gamma = 0.9
log_interval = 60
def finish_episode():
R = 0
saved_actions = model.saved_actions
policy_losses = []
value_losses = []
rewards = []
for r in model.rewards[::-1]:
R = r + (gamma * R)
rewards.insert(0, R)
rewards = torch.tensor(rewards)
epsilon = (torch.rand(1) / 1e4) - 5e-5
# With different architectures, I found the following standardization step sometimes
# helpful, sometimes unhelpful.
#
rewards = (rewards - rewards.mean()) / (rewards.std(unbiased=False) + epsilon)
# Alternatively, comment it out and use the following line instead:
rewards += epsilon
for (log_prob, value), r in zip(saved_actions, rewards):
reward = torch.tensor(r - value.item()).cuda()
policy_losses.append(-log_prob * reward)
value_losses.append(F.smooth_l1_loss(value, torch.tensor([r]).cuda()))
optimizer.zero_grad()
loss = torch.stack(policy_losses).sum() + torch.stack(value_losses).sum()
loss = torch.clamp(loss, -1e-5, 1e5)
loss.backward()
optimizer.step()
del model.rewards[:]
del model.saved_actions[:]
running_reward = 0
for episode in range(0, 4000):
state = env.reset()
state = state.type(torch.float32)
reward = 0
done = False
msg = None
while not done:
action = model.act(state)
state, reward, done, msg = env.step(action)
model.rewards.append(reward)
if done:
break
running_reward = running_reward * (1 - 1/log_interval) + reward * (1/log_interval)
finish_episode()
# Resetting the hidden state seems unnecessary - it's effectively random from the previous
# episode anyway, more random than a bunch of zeros.
# model.reset_hidden()
if msg["msg"] == "done" and env.portfolio_value() > env.starting_portfolio_value * 1.1 and running_reward > 500:
print("Early Stopping: " + str(int(reward)))
break
if episode % log_interval == 0:
print("""Episode {}: started at {:.1f}, finished at {:.1f} because {} # t={}, \
last reward {:.1f}, running reward {:.1f}""".format(episode, env.starting_portfolio_value, \
env.portfolio_value(), msg["msg"], env.cur_timestep, reward, running_reward))
This is the error I get
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:18: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-74-21b617d2e36f> in <module>()
46 msg = None
47 while not done:
---> 48 action = model.act(state)
49 state, reward, done, msg = env.step(action)
50 model.rewards.append(reward)
4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1751 if has_torch_function_variadic(input, weight):
1752 return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1753 return torch._C._nn.linear(input, weight, bias)
1754
1755
RuntimeError: expected scalar type Double but found Float
This is the github I'm using:
https://github.com/tomgrek/RL-stocktrading/blob/master/Finance%20final.ipynb
I was suggested that teh problem is because when I use "state" as a parameter torch throws an error cause it expects a numeric type, but I cannot change state to any float, because I get another error that list can't be changed to float32.
I'll be super grateful if you could show me idiot-proof what I'm doing wrong.
First are you using the same environment "env" and/or dataset ?
Second, you added this line state = state.type(torch.float32) and it didn't throw an error so I suppose state was already a tensor (which is a bit weird). If you had to change the type to float32 then in the next while loop you may have forgotten to change the type.
while not done:
action = model.act(state)
state, reward, done, msg = env.step(action)
state = state.float() # To add as I think env.step(action) returns a long tensor for some reason
model.rewards.append(reward)
if done:
break
Good luck.

Evaluating Parzen window log-likelihood for GAN

I am trying to reimplement the original GAN paper by Ian Goodfellow et al. And I need to show that my implementation achieves same or similar results as the authors achieved. But I am not sure how to evaluate this metric. I took a look at their implementation but I got some funny results. In the paper they report 225 +- 2 on the MNIST for this metric, while the results I get are bellow -400000000. I thought that maybe the model is bad, but it generates really good images of MNIST digits.
Can someone tell me what am I doing wrong?
Bellow is the code which I used. I copied the part of the code from the official implementation.
Note: valid variable are images taken from the MNIST dataset.
def get_nll(x, parzen, batch_size=10):
"""
Credit: Yann N. Dauphin
"""
inds = range(x.shape[0])
n_batches = int(numpy.ceil(float(len(inds)) / batch_size))
print("N batches:", n_batches)
times = []
nlls = []
for i in range(n_batches):
begin = time.time()
nll = parzen(x[inds[i::n_batches]])
end = time.time()
times.append(end-begin)
nlls.extend(nll)
if i % 10 == 0:
print(i, numpy.mean(times), numpy.mean(nlls))
return numpy.array(nlls)
def log_mean_exp(a):
"""
Credit: Yann N. Dauphin
"""
max_ = a.max(1)
return max_ + T.log(T.exp(a - max_.dimshuffle(0, 'x')).mean(1))
def cross_validate_sigma(samples, data, sigmas, batch_size):
lls = []
for sigma in sigmas:
print("Sigma:", sigma)
parzen = theano_parzen(samples, sigma)
tmp = get_nll(data, parzen, batch_size = batch_size)
lls.append(numpy.asarray(tmp).mean())
del parzen
gc.collect()
ind = numpy.argmax(lls)
print(max(lls))
return sigmas[ind]
noise = torch.randn((10000, 100), device=device)
gen_model.eval()
gan_out = gen_model(noise)
sigma_range = numpy.logspace(-1., 0., num=10)
sigma = cross_validate_sigma(gan_out.reshape(10000,-1), valid[0:10000], sigma_range, 100)

SessionRunHook returning empty SessionRunValues after run

I'm trying to write a hook that will allow me to compute some global metrics (rather than batch-wise metrics). To prototype, I thought I'd get a simple hook up and running that would capture and remember true positives. It looks like this:
class TPHook(tf.train.SessionRunHook):
def after_create_session(self, session, coord):
print("Starting Hook")
tp_name = 'metrics/f1_macro/TP'
self.tp = []
self.args = session.graph.get_operation_by_name(tp_name)
print(f"Got Args: {self.args}")
def before_run(self, run_context):
print("Starting Before Run")
return tf.train.SessionRunArgs(self.args)
def after_run(self, run_context, run_values):
print("After Run")
print(f"Got Values: {run_values.results}")
However, the values returned in the "after_run" part of the hook are always None. I tested this in both the train and evaluation phase. Am I misunderstanding something about how the SessionRunHooks are supposed to work?
Maybe relevant information:
The model was build in keras and converted to an estimator with the keras.estimator.model_to_estimator() function. The model has been tested and works fine, and the op that I'm trying to retrieve in the hook is defined in this code block:
def _f1_macro_vector(y_true, y_pred):
"""Computes the F1-score with Macro averaging.
Arguments:
y_true {tf.Tensor} -- Ground-truth labels
y_pred {tf.Tensor} -- Predicted labels
Returns:
tf.Tensor -- The computed F1-Score
"""
y_true = K.cast(y_true, tf.float64)
y_pred = K.cast(y_pred, tf.float64)
TP = tf.reduce_sum(y_true * K.round(y_pred), axis=0, name='TP')
FN = tf.reduce_sum(y_true * (1 - K.round(y_pred)), axis=0, name='FN')
FP = tf.reduce_sum((1 - y_true) * K.round(y_pred), axis=0, name='FP')
prec = TP / (TP + FP)
rec = TP / (TP + FN)
# Convert NaNs to Zero
prec = tf.where(tf.is_nan(prec), tf.zeros_like(prec), prec)
rec = tf.where(tf.is_nan(rec), tf.zeros_like(rec), rec)
f1 = 2 * (prec * rec) / (prec + rec)
# Convert NaN to Zero
f1 = tf.where(tf.is_nan(f1), tf.zeros_like(f1), f1)
return f1
In case anyone runs into the same problem, I found out how to restructure the program so that it worked. Although the documentation makes it sound like I can pass raw ops into the SessionRunArgs, it seems like it requires actual tensors (maybe this is a misreading on my part).
This is pretty easy to accomplish - I just changed the after_create_session code to what's shown below.
def after_create_session(self, session, coord):
tp_name = 'metrics/f1_macro/TP'
self.tp = []
tp_tensor = session.graph.get_tensor_by_name(tp_name+':0')
self.args = [tp_tensor]
And this successfully runs.

Resources