Pytorch - RuntimeError: invalid multinomial distribution (encountering probability entry < 0) - pytorch

I am using Stable Baselines 3 to train an agent to play Connect 4 game. I am trying to take the case into account when an agent starts a game as a second player.
self.env = self.ks_env.train([opponent, None])
When I am trying to run the code, I am getting the following error:
invalid multinomial distribution (encountering probability entry < 0)
/opt/conda/lib/python3.7/site-packages/torch/distributions/categorical.py in sample(self, sample_shape)
samples_2d = torch.multinomial(probs_2d, sample_shape.numel(), True).T
However, there is no problem when an agent is first player:
self.env = self.ks_env.train([None, opponent])
I think problem is related to the Pytorch library. My question is how can I fix this issue?

After checking your provided code, the problem doesn't seem to come from what agent starts the game but from not restarting the environment after a game is done.
I just changed your step function as shown:
def step(self, action):
# Check if agent's move is valid
is_valid = (self.obs['board'][int(action)] == 0)
if is_valid: # Play the move
self.obs, old_reward, done, _ = self.env.step(int(action))
reward = self.change_reward(old_reward, done)
else: # End the game and penalize agent
reward, done, _ = -10, True, {}
if done:
self.reset()
return board_flip(self.obs.mark,
np.array(self.obs['board']).reshape(1, self.rows, self.columns) / 2),
reward, done, _
With this, the model was able to train and you can check that it works as expected with the following snippet:
done = True
for step in range(500):
if done:
state = env.reset()
state, reward, done, info = env.step(env.action_space.sample())
print(reward)
Link to my version of your notebook

Related

RuntimeError: Trying to backward through the graph a second time. Saved intermediate values of the graph are freed when you call .backward()

I am trying to train SRGAN from scratch. I have read solutions for this type of problem, but it would be great if someone could help me debug my code. The exact error is: "RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad()" Here is the snippet I am trying to train:
gen_model = Generator().to(device, non_blocking=True)
disc_model = Discriminator().to(device, non_blocking=True)
opt_gen = optim.Adam(gen_model.parameters(), lr=0.01)
opt_disc = optim.Adam(disc_model.parameters(), lr=0.01)
from torch.nn.modules.loss import BCELoss
def train_model(gen, disc):
for epoch in range(20):
run_loss_disc = 0
run_loss_gen = 0
for data in train:
low_res, high_res = data[0].to(device, non_blocking=True, dtype=torch.float).permute(0, 3, 1, 2),data[1].to(device, non_blocking=True, dtype=torch.float).permute(0, 3, 1, 2)
#--------Discriminator-----------------
gen_image = gen(low_res)
gen_image = gen_image.detach()
disc_gen = disc(gen_image)
disc_real = disc(high_res)
p=nn.BCEWithLogitsLoss()
loss_gen = p(disc_real, torch.ones_like(disc_real))
loss_real = p(disc_gen, torch.zeros_like(disc_gen))
loss_disc = loss_gen + loss_real
opt_disc.zero_grad()
loss_disc.backward()
run_loss_disc+=loss_disc
#---------Generator--------------------
cont_loss = vgg_loss(high_res, gen_image)
adv_loss = 1e-3*p(disc_gen, torch.ones_like(disc_gen))
gen_loss = cont_loss+(10^-3)*adv_loss
opt_gen.zero_grad()
gen_loss.backward()
opt_disc.step()
opt_gen.step()
run_loss_gen+=gen_loss
print("Run Loss Discriminator: %d", run_loss_disc)
print("Run Loss Generator: %d", run_loss_gen)
train_model(gen_model, disc_model)
Apparently your disc_gen value was discarded by the first backward() call, as it says.
It should work if you change the discriminator part a bit:
gen_image = gen(low_res)
disc_gen = disc(gen_image.detach())
and add this at the start of the generator part:
disc_gen = disc(gen_image)

real tme keras rl DQN predictions

hello everyone I followed that tutorial
https://www.youtube.com/watch?v=hCeJeq8U0lo&list=PLgNJO2hghbmjlE6cuKMws2ejC54BTAaWV&index=2
to train a DQN Agent
everything works
env = gym.make('CartPole-v0')
states = env.observation_space.shape[0]
actions = env.action_space.n
episodes = 10
for episode in range(1, episodes+1):
state = env.reset()
done = False
score = 0
while not done:
env.render()
action = random.choice([0,1])
n_state, reward, done, info = env.step(action)
score+=reward
print('Episode:{} Score:{}'.format(episode, score))
now rather than make a random choice I want to use the DQN without having to do
dqn.test(env, steps=10)
something like dqn.predict but I did not find that in their documentation can you help please
dqn.forward(state)
it is the same function in the code of the testing in its github repo
https://github.com/taylormcnally/keras-rl2/blob/master/rl/agents/dqn.py

Token indices sequence length is longer than the specified maximum sequence length for this model (28627 > 512)

I am using BERT's Huggingface DistilBERT model as a backend for a question and answer application. The text I am using with which to train the model is one very large single text field. Even though the text field is a single string, the punctuation was left in place as a clue for BERT. When I execute the application I am getting the "Token indices sequence length error". I am using the transformer.encodeplus() method to pass the text into the model. I have tried various mechanisms to truncate the input ids to a length <= to 512.
I am currently using Windows 10 but I will also be porting the code to a Raspberry Pi 4 platform.
The code is failing at this line:
start_scores, end_scores = model(torch.tensor([input_ids]), attention_mask=torch.tensor([attention_mask]))
I am attempting to perform the truncation at this line:
encoding = tokenizer.encode_plus(question, tokenizer(context, truncation=True).input_ids)
The entire code is here:
from transformers import AutoTokenizer, DistilBertTokenizer, DistilBertForQuestionAnswering
import torch
# globals - set once used everywhere
tokenizer = None
model = None
context = ''
def establishSettings():
global tokenizer, model, context
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', return_token_type_ids=True, model_max_length=512)
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad', return_dict=False)
# context = "Some 1,500 volcanoes are still considered potentially active around the world today 161 of those over 10 percent sit within the boundaries of the United States."
# get the volcano corpus
with open('volcanic.corpus', encoding="utf8") as file:
context = file.read().replace('\n', '')
print(len(tokenizer(context, truncation=True).input_ids))
def askQuestion(question):
global tokenizer, model, context
print("\nQuestion ", question)
encoding = tokenizer.encode_plus(question, tokenizer(context, truncation=True).input_ids)
input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]
start_scores, end_scores = model(torch.tensor([input_ids]), attention_mask=torch.tensor([attention_mask]))
ans_tokens = input_ids[torch.argmax(start_scores): torch.argmax(end_scores) + 1]
answer_tokens = tokenizer.convert_ids_to_tokens(ans_tokens, skip_special_tokens=True)
#all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
return answer_tokens
def main():
# set the global itmes once
establishSettings()
# ask a question
question = "How many potentially active volcanoes are there in the world today?"
answer_tokens = askQuestion(question)
print("answer_tokens: ", answer_tokens)
if len(answer_tokens) == 0:
answer = "Sorry, I don't have an answer for that one. Ask me another question about New Mexico volcanoes."
print(answer)
else:
answer_tokens_to_string = tokenizer.convert_tokens_to_string(answer_tokens)
print("\nFinal Answer : ")
print(answer_tokens_to_string)
if __name__ == '__main__':
main()
What is the best way to truncate the input.ids to <= 512 in length.
Edit this line:
encoding = tokenizer.encode_plus(question, tokenizer(context, truncation=True).input_ids)
to
encoding = tokenizer.encode_plus(question, tokenizer(context, truncation=True, max_length=512).input_ids)

how to predict a character based on character based RNN model?

i want to create a prediction function which complete a part of "sentence"
the model used here is a character based RNN(LSTM). what are the steps we should fellow ?
i tried this but i can't give as input the sentence
def generate(self) -> Tuple[List[Token], torch.tensor]:
start_symbol_idx = self.vocab.get_token_index(START_SYMBOL, 'tokens')
# print(start_symbol_idx)
end_symbol_idx = self.vocab.get_token_index(END_SYMBOL, 'tokens')
padding_symbol_idx = self.vocab.get_token_index(DEFAULT_PADDING_TOKEN, 'tokens')
log_likelihood = 0.
words = []
state = (torch.zeros(1, 1, self.hidden_size), torch.zeros(1, 1, self.hidden_size))
word_idx = start_symbol_idx
for i in range(self.max_len):
tokens = torch.tensor([[word_idx]])
embeddings = self.embedder({'tokens': tokens})
output, state = self.rnn._module(embeddings, state)
output = self.hidden2out(output)
log_prob = torch.log_softmax(output[0, 0], dim=0)
dist = torch.exp(log_prob)
word_idx = start_symbol_idx
while word_idx in {start_symbol_idx, padding_symbol_idx}:
word_idx = torch.multinomial(
dist, num_samples=1, replacement=False).item()
log_likelihood += log_prob[word_idx]
if word_idx == end_symbol_idx:
break
token = Token(text=self.vocab.get_token_from_index(word_idx, 'tokens'))
words.append(token)
return words, log_likelihood,start_symbol_idx
Here are two tutorial on how to use machine learning libraries to generate text Tensorflow and PyTorch.
this code snippet is the part of allennlp "language model" tutorial, here the generate function is defined to compute the probability of tokens and find the best token and sequence of tokens according to the maximum likelihood of model output, the full code is in the colab notebook bellow you can refer to it: https://colab.research.google.com/github/mhagiwara/realworldnlp/blob/master/examples/generation/lm.ipynb#scrollTo=8AU8pwOWgKxE
after training the the language model for using this function you can say:
for _ in range(50):
tokens, _ = model.generate()
print(''.join(token.text for token in tokens))

Training procedure

I noticed that the sklearn.linear_model.SGDClassifier implements gradient descent for a linear model, therefore one could state that the class combines the fitting procedure (SGD) and the model (linear model) in one class.
SGD is, however, not inherent to linear models, and linear_models can be trained using many other optimizations with particular pro's (memory usage, convergence speed, local optima avoidance*, ...). One could state that such optimization techniques implement how to iterate over the training data, whether to do it online or offline, in what feature-dimension to apply an update and when to stop (possibly based on an error function of a validation set).
In particular, I implemented a model using theano and wrapped it in a fit/predict interface. Theano is cool because it allows one to define a callable which applies gradient descent on one sample, or on a set of samples, as well as a callable which returns the error on a validation set. But this coolness is not inherent to theano, a lot more models can simply define an update and error-evaluation function, which can then be used by different iteration- and stopping policies for fitting.
The theano examples often use minibatch, and the minibatch code is copy-pasted or reimplemented a lot with just minor adjustments which can easily be factored out. So I was hoping that sklearn implements something that you initialize with some parameters and an update/error callable to fit 'any model'. Or possible there is some good practice on how to do this yourself (especially w.r.t. the interface of the fitter).
Is there anything like this (in sklearn), that is Fitters which do not define the model?
*In the particular case of linear models and a l2 cost function, local optima do not exist of course, but still.
EDIT
Fair enough, this calls for a suggestion. I coded these two classes, which are not 100% clean, but they given an idea of what I mean:
import numpy
class StochasticUpdate():
def __init__(self, model, update, n_epochs, n_data_points, error=None, test_fraction=None):
self.update = update
self.n_epochs = n_epochs
self.n_data_points = n_data_points
self.error = error
self.model = model
if self.error is None and test_fraction is not None:
raise ValueError('error parameter must be specified if a test_faction (value: %s) should be used.' % test_fraction)
self.do_test = test_fraction is not None
self.n_train_samples = int(n_data_points - test_fraction) if self.do_test else n_data_points
if self.do_test:
self.test_range = numpy.arange(self.n_train_samples, n_data_points)
self.n_test_samples = int(n_data_points * test_fraction)
self.train_range = numpy.arange(0, self.n_train_samples)
def fit(self):
if self.do_test: self.test_errors = []
self.train_errors = []
self.mean_cost_values = []
for epoch in range(self.n_epochs):
order = numpy.random.permutation(self.n_train_samples)
mean_cost_value = 0
for i in range(self.n_train_samples):
mean_cost_value += self.update([order[i]])
self.mean_cost_values.append(mean_cost_value/ self.n_data_points)
if self.error is not None:
self.train_errors.append(self.error(self.train_range))
if self.do_test:
self.test_errors.append(self.error(self.test_range))
return self.model
from math import ceil
class MinibatchStochasticUpdate(StochasticUpdate):
def __init__(self, model, update, n_epochs, n_data_points, error, batch_size, patience=5000, patience_increase=2,
improvement_threshold = 0.995, validation_frequency=None, validate_faction=0.1, test_fraction=None):
super().__init__(self, model, update, n_data_points, error, test_fraction)
self.update = update
self.n_epochs = n_epochs
self.n_data_points = n_data_points
self.model = model
self.batch_size = batch_size
self.patience = patience
self.patience_increase = patience_increase
self.improvement_threshold = improvement_threshold
self.n_validation_samples = int(n_data_points * validate_faction)
self.validation_range = numpy.arange(self.n_train_samples, self.n_train_samples + self.n_validation_samples)
self.n_train_batches = int(ceil(n_data_points / batch_size))
self.n_train_batches = int(ceil(self.n_train_samples / self.batch_size))
self.train_batch_ranges = [
numpy.arange(minibatch_index * self.batch_size, min((minibatch_index+1) * self.batch_size, self.n_train_samples))
for minibatch_index in range(self.n_train_batches)
]
self.validation_frequency = min(self.n_train_batches, patience/2) if validation_frequency is None else validation_frequency
def fit(self):
self.best_validation_error = numpy.inf
best_params = None
iteration = 0
for epoch in range(self.n_epochs):
for minibatch_index in range(self.n_train_batches):
self.update(self.train_batch_ranges[minibatch_index])
if (iter + 1) % self.validation_frequency == 0:
current_validation_error = self.error(self.validation_error)
if current_validation_error < self.best_validation_error:
if current_validation_error < self.best_validation_error * self.improvement_threshold:
patience = max(self.patience, iter * self.patience_increase)
best_params = self.model.copy_parameters()
self.best_validation_error = current_validation_error
if iteration <= patience:
self.model.set_parameters(best_params)
return self.model
iteration += 1
self.model.set_parameters(best_params)
return self.model
Then for in the fit of the model one could support different training approaches and stopping criteria like this:
def fit(self, X, y):
X_shared = theano.shared(X, borrow=True)
y_shared = theano.shared(y, borrow=True)
learning_rate = self.training_method_options['learning_rate']
trainer = {
'stochastic_gradient_descent': lambda: StochasticUpdate(
self,
update=self.update_stochastic_gradient_descent_function(X_shared, y_shared, learning_rate),
n_epochs=self.training_method_options['n_epochs'],
n_data_points=X.shape[0],
error=self.evaluation_function(X_shared, y_shared),
),
'minibatch_gradient_descent': lambda: MinibatchStochasticUpdate(
self,
update=self.update_stochastic_gradient_descent_function(X_shared, y_shared, learning_rate),
n_epochs=self.training_method_options['n_epochs'],
n_data_points=X.shape[0],
error=self.evaluation_function(X_shared, y_shared),
batch_size=self.training_method_options['batch_size']
)
}[self.training_method]()
trainer.fit()
return self
Obviosuly the hash-map part is hacky, and could be done more elegantly using a standardized interface for the two classes above (since the hashmaps are still O(N*M) in size for N fitters and M models).

Resources