I want to iteratively quantize my model. That means that after the normal training loop which is usually implemented in training_step I would like to iteratively quantize some of the parameters and retrain the model for a couple of steps.
I don't really see how this would be done in Lightning. I could add a new instance to the class but then I would have to write the training loop myself again?
Perhaps, you could add something like this to training_step:
step = self.global_step
if step % quantization_period == 0 and step > 0:
self.submodules[step // quantization_period].half()
You can also override LightningModule.on_epoch_end(self) to do something after every training epoch.
Related
What I have done
I'm using the DQN Algorithm in Stable Baselines 3 for a two players board type game. In this game, 40 moves are available, but once one is made, it can't be done again.
I trained my first model with an opponent which would choose randomly its move. If an invalid move is made by the model, I give a negative reward equal to the max score one can obtain and stop the game.
The issue
Once it's was done, I trained a new model against the one I obtained with the first run. Unfortunately, ultimately, the training process gets blocked as the opponent seems to loop an invalid move. Which means that, with all I've tried in the first training, the first model still predicts invalid moves. Here's the code for the "dumb" opponent :
while(self.dumb_turn):
#The opponent chooses a move
chosen_line, _states = model2.predict(self.state, deterministic=True)
#We check if the move is valid or not
while(line_exist(chosen_line, self.state)):
chosen_line, _states = model2.predict(self.state, deterministic=True)
#Once a good move is made, we registered it as a move and add it to the space state
self.state[chosen_line]=1
What I would like to do but don't know how
A solution would be to set manually the Q-values to -inf for the invalid moves so that the opponent avoid those moves, and the training algorithm does not get stuck. I've been told how to access to these values :
import torch as th
from stable_baselines3 import DQN
model = DQN("MlpPolicy", "CartPole-v1")
env = model.get_env()
obs = env.reset()
with th.no_grad():
obs_tensor, _ = model.q_net.obs_to_tensor(obs)
q_values = model.q_net(obs_tensor)
But I don't know how to set them to -infinity.
If somebody could help me, I would be very grateful.
I recently had a similar problem in which I needed to directly alter the q-values produced by the RL model during training in order to influence its actions.
To do this I overwritten some methods of the library:
# Imports
from stable_baselines3.dqn.policies import QNetwork, DQNPolicy
# Override some methods of the class QNetwork used by the DQN model in order to set to a negative value the q-values of
# some actions
# Two possibile methods to override:
# Override _predict ---> alter q-values only during predictions but not during training
# Override forward ---> alter q-values also during training (Attention: here we are working with batches of q-values)
class QNetwork_modified(QNetwork):
def forward(self, obs: th.Tensor) -> th.Tensor:
"""
Predict the q-values.
:param obs: Observation
:return: The estimated Q-Value for each action.
"""
# Compute the q-values using the QNetwork
q_values = self.q_net(self.extract_features(obs))
# For each observation in the training batch:
for i in range(obs.shape[0]):
# Here you can alter q_values[i]
return q_values
# Override the make_q_net method of the DQN policy used by the DQN model to make it use the new DQN network
class DQNPolicy_modified(DQNPolicy):
def make_q_net(self) -> DQNPolicy:
# Make sure we always have separate networks for features extractors etc
net_args = self._update_features_extractor(self.net_args, features_extractor=None)
return QNetwork_modified(**net_args).to(self.device)
model = DQN(DQNPolicy_modified, env, verbose=1)
Personally I don’t like too much this approach, and I would suggest you to try first some “more natural” alternatives, like for examples giving in input to your model also some kind of history of what actions have been already selected, in order to help the model learn that pre-selected actions should be avoided.
For example you could enrich the input for the RL model with an additional binary mask where the moves already chosen have their corresponding bit set to 1. (In this case you should modify the gym environment).
Is it possible to wrap a pytorch model inside another pytorch module? I could not do it the normal way like in transfer learning (simply concatenating some more layers) because in order to get the intended value for the next 'layer', I need to wait the last layer of the first module to generate multiple outputs (say 100) and to use all those outputs to get the value for the next 'layer' (say taking the max of those outputs). I tried to define the integrated model as something like the following:
class integrated(nn.Module):
def __init__(self):
super(integrated, self)._init_()
def forward(self, x):
model = VAE(
encoder_layer_sizes=args.encoder_layer_sizes,
latent_size=args.latent_size,
decoder_layer_sizes=args.decoder_layer_sizes,
conditional=args.conditional,
num_labels=10 if args.conditional else 0).to(device)
device = torch.device('cpu')
model.load_state_dict(torch.load(r'...')) # the first model is saved somewhere else beforehand
model.eval()
temp = []
for j in range(100):
x = model(x)
temp.append(x)
y=max(temp)
return y
The reason I would like to do that is the library I need to use requires the input itself to be a pytorch module. Otherwise I could simply leave the last part outside of the module.
Yes you can definitely use a Pytorch module inside another Pytorch module. The way you are doing this in your example code is a bit unusual though, as external modules (VAE, in your case) are more often initialized in the __init__ function and then saved as attributes of the main module (integrated). Among other things, this avoids having to reload the sub-module every time you call forward.
One other thing that looks a bit funny is your for loop over repeated invocations of model(x). If there is no randomness involved in model's evaluation, then you would only need a single call to model(x), since all 100 calls will give the same value. So assuming there is some randomness, you should consider whether you can get the desired effect by batching together 100 copies of x and using a single call to model with this batched input. This ultimately depends on additional information about why you are calling this function multiple times on the same input, but either way, using a single batched evaluation will be a lot faster than using many unbatched evaluations.
I am reproducing the code of TensorFlow's Time series forecasting tutorial.
They use tf.data to shuffle, batch, and cache the dataset. More precisely they do the following:
BATCH_SIZE = 256
BUFFER_SIZE = 10000
train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate = train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()
I can't understand why they use repeat() and, even more so, why they don't specify the count argument of repeat. What is the point of making the process repeat indefinitely? And how can the algorithm read all the elements in an infinitely big dataset?
As can be seen in the tutorials of tensorflow federated for image classification the repeat method is used to use repetitions of the dataset that will also indicate the number of epochs for the training.
So use .repeat(NUM_EPOCHS) where NUM_EPOCHS is the number of epochs for the training.
Warning: I am a Deep learning noob
I train my two layer Lstm-model on a dataset of jokes (231657 jokes) and want to know 4 things:
I train it now on 50 chars per sentence if I want it to generate new jokes do I need to input 50 chars first or can I randomly pic one char to start the sentence/joke?
Is it not usefull to train it on only 50 chars for 1.8 mio. in total (vector is [10800001, 50, 1]) or is that good?
I used a class were I init my model so I can call it, unfortunately If I want to create a long sentence/mulitple senteces I have to call my predict statement more than once, the problem is that my predict statement init the model first and then predict the value, so I have to use tf.reset_default_graph(), but after a while, it takes longer.
So what should I do to prevent this problem? Should I maybe init the model in the main script or something like this?
How to solve the problem with growing text? I currently take the shape of the input and use it for my model initialization in my class, but is this a good idea?
You need to start by inputting a seed sequence of 50 characters.
I'd suggest you to increase the sequence length.
I don't understand you very well but I suggest you to structure your model properly. Read this for more: https://danijar.com/structuring-your-tensorflow-models/
Again, I suggest you to read the above link.
It's not always necessary to make your model as a class. You can just make the model once in procedural way, train it and then save it using tf.Saver()
How can I test my pytorch model on validation data during training?
I know that there is the function myNet.eval() which apparantly switches of any dropout layers, but is it also preventing the gradients from being accumulated?
Also how would I undo the myNet.eval() command in order to continue with the training?
If anyone has some code snippet / toy example I would be grateful!
How can I test my pytorch model on validation data during training?
There are plenty examples where there are train and test steps for every epoch during training. An easy one would be the official MNIST example. Since pytorch does not offer any high-level training, validation or scoring framework you have to write it yourself. Commonly this consists of
a data loader (commonly based on torch.utils.dataloader.Dataloader)
a main loop over the total number of epochs
a train() function that uses training data to optimize the model
a test() or valid() function to measure the effectiveness of the model given validation data and a metric
This is also what you will find in the linked example.
Alternatively you can use a framework that provides basic looping and validation facilities so you don't have to implement everything by yourself all the time.
tnt is torchnet for pytorch, supplying you with different metrics (such as accuracy) and abstraction of the train loop. See this MNIST example.
inferno and torchsample attempt to model things very similar to Keras and provide some tools for validation
skorch is a scikit-learn wrapper for pytorch that lets you use all the tools and metrics from sklearn
Also how would I undo the myNet.eval() command in order to continue with the training?
myNet.train() or, alternatively, supply a boolean to switch between eval and training: myNet.train(True) for train mode.
I know that there is the function myNet.eval() which apparantly switches of any dropout layers, but is it also preventing the gradients from being accumulated?
It doesn't prevent gradients from accumulating.
But I think during testing, you do want to ignore gradients. In that case, you should mark the variable input to the network as volatile=True, and it will save some time and space used in forward calculation.
Also how would I undo the myNet.eval() command in order to continue with the training?
myNet.train()