Stable Baselines3 - Setting "manually" the q_values

Stable Baselines3 - Setting "manually" the q_values - pytorch

What I have done
I'm using the DQN Algorithm in Stable Baselines 3 for a two players board type game. In this game, 40 moves are available, but once one is made, it can't be done again.
I trained my first model with an opponent which would choose randomly its move. If an invalid move is made by the model, I give a negative reward equal to the max score one can obtain and stop the game.
The issue
Once it's was done, I trained a new model against the one I obtained with the first run. Unfortunately, ultimately, the training process gets blocked as the opponent seems to loop an invalid move. Which means that, with all I've tried in the first training, the first model still predicts invalid moves. Here's the code for the "dumb" opponent :
while(self.dumb_turn):
#The opponent chooses a move
chosen_line, _states = model2.predict(self.state, deterministic=True)
#We check if the move is valid or not
while(line_exist(chosen_line, self.state)):
chosen_line, _states = model2.predict(self.state, deterministic=True)
#Once a good move is made, we registered it as a move and add it to the space state
self.state[chosen_line]=1
What I would like to do but don't know how
A solution would be to set manually the Q-values to -inf for the invalid moves so that the opponent avoid those moves, and the training algorithm does not get stuck. I've been told how to access to these values :
import torch as th
from stable_baselines3 import DQN
model = DQN("MlpPolicy", "CartPole-v1")
env = model.get_env()
obs = env.reset()
with th.no_grad():
obs_tensor, _ = model.q_net.obs_to_tensor(obs)
q_values = model.q_net(obs_tensor)
But I don't know how to set them to -infinity.
If somebody could help me, I would be very grateful.

I recently had a similar problem in which I needed to directly alter the q-values produced by the RL model during training in order to influence its actions.
To do this I overwritten some methods of the library:
# Imports
from stable_baselines3.dqn.policies import QNetwork, DQNPolicy
# Override some methods of the class QNetwork used by the DQN model in order to set to a negative value the q-values of
# some actions
# Two possibile methods to override:
# Override _predict ---> alter q-values only during predictions but not during training
# Override forward ---> alter q-values also during training (Attention: here we are working with batches of q-values)
class QNetwork_modified(QNetwork):
def forward(self, obs: th.Tensor) -> th.Tensor:
"""
Predict the q-values.
:param obs: Observation
:return: The estimated Q-Value for each action.
"""
# Compute the q-values using the QNetwork
q_values = self.q_net(self.extract_features(obs))
# For each observation in the training batch:
for i in range(obs.shape[0]):
# Here you can alter q_values[i]
return q_values
# Override the make_q_net method of the DQN policy used by the DQN model to make it use the new DQN network
class DQNPolicy_modified(DQNPolicy):
def make_q_net(self) -> DQNPolicy:
# Make sure we always have separate networks for features extractors etc
net_args = self._update_features_extractor(self.net_args, features_extractor=None)
return QNetwork_modified(**net_args).to(self.device)
model = DQN(DQNPolicy_modified, env, verbose=1)
Personally I don’t like too much this approach, and I would suggest you to try first some “more natural” alternatives, like for examples giving in input to your model also some kind of history of what actions have been already selected, in order to help the model learn that pre-selected actions should be avoided.
For example you could enrich the input for the RL model with an additional binary mask where the moves already chosen have their corresponding bit set to 1. (In this case you should modify the gym environment).

Related

In a Pytorch LSTM, does the base class take care of a hidden layer on its own, or must it be defined (additional questions)

Let's take the following code:
'''
LSTM class
'''
import torch
import pandas as pd
import numpy as np
import torch.nn.CrossEntropyLoss
class LSTM(nn.Module):
def __init__(self, input_size, hidden_size, num_layers):
super (LSTM, self).__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, num_layers)
def forward(self, x):
# receive an input, create a new hidden state, return output?
# reset the hidden state?
hidden = (torch.zeros(num_layers, hidden_size), torch.zeros(num_layers, hidden_size))
x, hidden = self.lstm(x, hidden)
#since our observation has several sequences, we only want the output after the last sequence of the observation'''
x = x[:, -1]
return x
I have several questions here, and if permitted would rather ask them all at once rather than waiting 90 minutes between singular posts.
I've seen and followed quite a few examples of LSTMs in pytorch and each example seems to treat different pieces a bit differently. Since i'm not in expert in either python, or neural networks, this has lead me to a lot of confusion. I'll ask my question sequentially in order of how they appear in the code above.
I've seen the hidden layer both defined, zeroed, left out and ignored entirely in a few different implementations. I know what its for, but in the implementation i've produced (which is itself an amalgamation of several tutorials) the hidden layer doesn't appear to be connected to anything. in the forward function we take a single input pass it to the hidden layer (which is first zero'd) then call self.lstm on it. Is this the equivalent of letting lstm "handle" the hidden layer itself?
Will this properly produce a hidden state?
Am I correct that the optimization only occurs during the training loop? I was using this particular tutorial as an example:
https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
in the optimizer tutorial, i assume y is the true label for the observation, is that correct?
My intention is to use cross entropy loss, as that seems like the right one to define what i'm doing with my data (the labels are not discrete, and are real positive floats with a range, and there are 3 of them), so the output size should be 3. Given the optimizer tutorial, all i need to do is hand the loss function the output from the training step, and the correct label, and then backpropagate. Is that correct as well?
I know there's a lot here, so i'd appreciate answers to any of the questions from anyone inclined to help, even if you cannot answer all of them. Thank you for your time.

I've had a talk with a colleague and I've been able to answer some of my own questions, so I'll post those, since it may help others.
The hidden state zero-out is NOT modifying the hidden_layer, it is zero-ing out the hidden state at the start because the cells start empty, as you'd expect in any language object-oriented. It turns out this is unnecessary, since for quite some time the pytorch default is to zero out these values if they're not initialized manually.
This simple implementation will produce a hidden state as written.
The answer to this is yes. We don't "grade" the result of network until we get to the optimization section, or, more specifically, the loss function.
y is the true label, it was not identified in the tutorial. Also, important to note, pred is not a "prediction" but a pytorch object that points to the result of the network acting upon the observation that was fed in. In other words, printing out "pred" would not show you a vector of values that represents a prediction
This is also correct. Pytorch handles the "distance measure" between the true label and the predicted label, on its own.

fine tune causal language model using transformers and pytorch

I have some questions about fine-tuning causal language model using transformers and PyTorch.
My main goal is to fine-tune XLNet. However, I found the most of posts online was targeting at text classification, like this post. I was wondering, is there any way to fine-tune the model, without using the run_language_model.py from transformers' GitHub?
Here is a piece of my code trying to fine-tune XLNet:
model = XLNetLMHeadModel.from_pretrained("xlnet-base-cased")
tokenizer = XLNetTokenizer.from_pretrained("xlnet-base-cased", do_lower_case=True)
LOSS = torch.nn.CrossEntrypoLoss()
batch_texts = ["this is sentence 1", "i have another sentence like this", "the final sentence"]
encodings = tokenizer.encode_plus(batch_texts, add_special_tokens=True,
return_tensors=True, return_attention_mask=True)
outputs = model(encodings["input_ids"], encodings["attention_mask"])
loss = LOSS(outputs[0], target_ids)
loss.backward()
# ignoring the rest of codes...
I got stuck at the last two lines. At first, when using this LM model, it seems I don't have any labels as the supervised learning usually do; Second, as the language model which is to minimize the loss (cross-entropy here), I need a target_ids to compute the loss and perplexity with input_ids.
Here are my follow-up questions:
How should I deal with this labels during the model fitting?
Should I set something like target_ids=encodings["input_ids"].copy() to compute cross-entropy loss and perplexity?
If not, how should set this target_ids?
From the perplexity page from transformers' documentation, how should I adapt its method for non-fixed length of input text?
I saw another post from the documentation saying that it requires padding text for causal language modeling. However, from the link in 3), there is no such sign for padding text. Which one should I follow?
Any suggestions and advice will be appreciated!

When fine-tuning a model with a language-model head, the labels are the next tokens themselves (you predict the next words). Huggingface's library makes a lot of things very easy to do by hiding most of the complexity of the process within their methods, which is very nice when you want to do something standard. But if you want to do something special, or if you want to learn and understand the details, I suggest to go down implementing the training loop directly in pytorch; coding the low-level stuff is the best way to learn.
For this case, here are a draft to get started; the training loop is far from being complete, but it must be adapted to each specific case anyway, so I hope these few lines may help to start...
model = GPT2LMHeadModel.from_pretrained('distilgpt2')
tokenizer = GPT2Tokenizer.from_pretrained('distilgpt2')
# our input:
s = tokenizer.encode('In winter, the weather is',return_tensors='pt')
# we want to fine-tune to force a fake output as follows:
ss = tokenizer.encode('warm and hot',return_tensors='pt')
# forward pass:
outputs = model(s)
# check that the outout logits are given for every input token:
print(outputs.logits.size())
# we're gonna train on the token that follows the last input one
# so we extract just the last logit:
lasty = outputs.logits[0,-1].view(1,-1)
# prepare backprop:
lossfct = torch.nn.CrossEntropyLoss()
optimizer = transformers.AdamW(model.parameters(), lr=5e-5)
# just take the first next token (you should repeat this for the next ones)
labels = ss[0][0].view(1)
loss = lossfct(lasty,labels)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# finetunening done: you may check the answer is already different:
y = model.generate(s)
sy = tokenizer.decode(y[0])
print(sy)

Keras LSTM network predicts all signals belong to the same category (among 3 different ones)

I'm working on a project that involves signal classification. I'm trying different models of ANN using keras to see which one is better, for now focusing in simple networks but I'm struggling with the LSTM one, following this example: https://machinelearningmastery.com/how-to-develop-rnn-models-for-human-activity-recognition-time-series-classification/.
My inputs are 1D signals that I get from an electronic sensor that will be divided in 3 different categories. See here one signal of two different categories so you see they are quite different over time.
https://www.dropbox.com/s/9ctdegtuyjamp48/example_signals.png?dl=0
To start with a simple model, we are trying the following simple model. Since signals are of different length, a masking process has been performed on them, enlarge each one to the longest one with the masked value of -1000 (impossible value to happen in our signal). Data is correctly reshaped from 2D to 3D (since is needed in 3D for the LSTM layer) using the following command since I only have a feature:
Inputs = Inputs.reshape((Inputs.shape[0],Inputs.shape[1],1))
Then, data is divided in training and validation one and feed into the following model:
model = Sequential()
model.add(Masking(mask_value=-1000, input_shape=(num_steps,1)))
model.add(LSTM(20, return_sequences=False))
model.add(Dense(15, activation='sigmoid'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
However, for some reason, each time the network is trained, it always predicts that ALL signals are from the same category, possibly a different category each time is trained, usually the one with more cases inside the input data. If I force the network to be trained with same amount of data for each category, keeps giving me the same result.
I don't think this behaviour is normal: having bad accuracy could happen but this must have to do with some elementary error in the model that I'm not spotting since the given data in the training is correctly inputted, no errors there, rechecked multiple times. Anyone has any idea why this is happening?
Let me know if any more info could be useful to add to this post.

Just for any curious reader: at the end I could resolve it by normalizing the data.
def LSTM_input_normalize(inputs):
new_inputs = []
for in_ in inputs:
if -1000 in in_:
start_idx = np.where(in_ == -1000)[0][0] # index of the first "-1000" in the sequence
else:
start_idx = in_.shape[0]
# compute mean and std of the current sequence
curr_mean = np.mean(in_[:start_idx])
curr_std = np.std(in_[:start_idx])
# normalize the single sequence
in_[:start_idx] = (in_[:start_idx] - curr_mean) / curr_std
new_inputs.append(in_)
return np.array(new_inputs)

GradientTape losing track of variable

I have a script that performs a Gatys-like neural style transfer. It uses style loss, and a total variation loss. I'm using the GradientTape() to compute my gradients. The losses that I have implemented seem to work fine, but a new loss that I added isn't being properly accounted for by the GradientTape(). I'm using TensorFlow with eager execution enabled.
I suspect it has something to do with how I compute the loss based on the input variable. The input is a 4D tensor (batch, h, w, channels). At the most basic level, the input is a floating point image, and in order to compute this new loss I need to convert it to a binary image to compute the ratio of one pixel color to another. I don't want to actually go and change the image like that during every iteration, so I just make a copy of the tensor(in numpy form) and operate on that to compute the loss. I do not understand the limitations of the GradientTape, but I believe it is "losing the thread" of how the input variable is used to get to the loss when it's converted to a numpy array.
Could I make a copy of the image tensor and perform binarizing operations & loss computation using that? Or am I asking tensorflow to do something that it just can not do?
My new loss function:
def compute_loss(self, **kwargs):
loss = 0
image = self.model.deprocess_image(kwargs['image'].numpy())
binarized_image = self.image_decoder.binarize_image(image)
volume_fraction = self.compute_volume_fraction(binarized_image)
loss = np.abs(self.volume_fraction_target - volume_fraction)
return loss
My implementation using the GradientTape:
def compute_grads_and_losses(self, style_transfer_state):
"""
Computes gradients with respect to input image
"""
with tf.GradientTape() as tape:
loss = self.loss_evaluator.compute_total_loss(style_transfer_state)
total_loss = loss['total_loss']
return tape.gradient(total_loss, style_transfer_state['image']), loss
An example that I believe might illustrate my confusion. The strangest thing is that my code doesn't have any problem running; it just doesn't seem to minimize the new loss term whatsoever. But this example won't even run due to an attribute error: AttributeError: 'numpy.float64' object has no attribute '_id'.
Example:
import tensorflow.contrib.eager as tfe
import tensorflow as tf
def compute_square_of_value(x):
a = turn_to_numpy(x['x'])
return a**2
def turn_to_numpy(arg):
return arg.numpy() #just return arg to eliminate the error
tf.enable_eager_execution()
x = tfe.Variable(3.0, dtype=tf.float32)
data_dict = {'x': x}
with tf.GradientTape() as tape:
tape.watch(x)
y = compute_square_of_value(data_dict)
dy_dx = tape.gradient(y, x) # Will compute to 6.0
print(dy_dx)
Edit:
From my current understanding the issue arises that my use of the .numpy() operation is what makes the Gradient Tape lose track of the variable to compute the gradient from. My original reason for doing this is because my loss operation requires me to physically change values of the tensor, and I don't want to actually change the values used for the tensor that is being optimized. Hence the use of the numpy() copy to work on in order to compute the loss properly. Is there any way around this? Or is shall I consider my loss calculation to be impossible to implement because of this constraint of having to perform essentially non-reversible operations on the input tensor?

The first issue here is that GradientTape only traces operations on tf.Tensor objects. When you call tensor.numpy() the operations executed there fall outside the tape.
The second issue is that your first example never calls tape.watche on the image you want to differentiate with respect to.

TensorFlow - Shape Mismatch

I have just started with TensorFlow. I was checking out the MusicGenerator available at https://github.com/Conchylicultor/MusicGenerator
I am getting error:
ValueError: Trying to share variable
rnn_decoder/KeyboardCell/Decoder/multi_rnn_cell/cell_0/basic_lstm_cell/kernel,
but specified shape (1024, 2048) and found shape (525, 2048).
I think this might be due to some variable shared between the encoder and decoder of the cell. The main code was written for tensorflow 0.10.0 but I am trying to run on Tensorflow 1.3
def __call__(self, prev_keyboard, prev_state, scope=None):
""" Run the cell at step t
Args:
prev_keyboard: keyboard configuration for the step t-1 (Ground truth or previous step)
prev_state: a tuple (prev_state_enco, prev_state_deco)
scope: TensorFlow scope
Return:
Tuple: the keyboard configuration and the enco and deco states
"""
# First time only (we do the initialisation here to be on the global rnn loop scope)
if not self.is_init:
with tf.variable_scope('weights_keyboard_cell'):
# TODO: With self.args, see which network we have chosen (create map 'network name':class)
self.encoder.build()
self.decoder.build()
prev_state = self.encoder.init_state(), self.decoder.init_state()
self.is_init = True
# TODO: If encoder act as VAE, we should sample here, from the previous state
# Encoder/decoder network
with tf.variable_scope(scope or type(self).__name__):
with tf.variable_scope('Encoder'):
# TODO: Should be enco_output, enco_state
next_state_enco = self.encoder.get_cell(prev_keyboard, prev_state)
with tf.variable_scope('Decoder'): # Reset gate and update gate.
next_keyboard, next_state_deco = self.decoder.get_cell(prev_keyboard, (next_state_enco, prev_state[1]))
return next_keyboard, (next_state_enco, next_state_deco)
I am completely new to RNNs and CNNs . I have been reading a bit about it as well, understood in a high level way on how these work. And understood how some parts of the code is actually working in training, modelling. But I don't think enough to debug this. And especially because I am a bit confused with the tensorflow API as well.
It would be great why this might be happenning, what I can do to fix it. And also if you can point me to some books for CNN, RNN, Back propogation and how to use tensor flow effectively to build things.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Stable Baselines3 - Setting "manually" the q_values - pytorch

Related

In a Pytorch LSTM, does the base class take care of a hidden layer on its own, or must it be defined (additional questions)

fine tune causal language model using transformers and pytorch

Keras LSTM network predicts all signals belong to the same category (among 3 different ones)

GradientTape losing track of variable

TensorFlow - Shape Mismatch

Categories

Resources