nvidia_deeplearningexamples_tacotron2 :RuntimeError: CUDA error: invalid device function - pytorch

Set up runtime: python3 and GPU.
Run the code step by step.
I only successfully run the code at first time.
After that, when run the below part, occured "RuntimeError: CUDA error: invalid device function"
sequence = np.array(tacotron2.text_to_sequence(text, ['english_cleaners']))[None, :]
sequence = torch.from_numpy(sequence).to(device='cuda', dtype=torch.int64)
with torch.no_grad():
_, mel, _, _ = tacotron2.infer(sequence)
audio = waveglow.infer(mel)
audio_numpy = audio[0].data.cpu().numpy()
rate = 22050
Do you know the root cause? And does the pre-trained model be run on local CPU?

At the time of writing, you can solve this issue by adding
!pip install torch==1.1.0 torchvision==0.3.0
before import torch
in https://colab.research.google.com/github/pytorch/pytorch.github.io/blob/master/assets/hub/nvidia_deeplearningexamples_tacotron2.ipynb

Related

Using HuggingFace pipeline on pytorch mps device M1 pro

i want to run the pipeline abstract for zero-shot-classification task on the mps device. Here is my code
pipe = pipeline('zero-shot-classification', device = mps_device)
seq = "i love watching the office show"
labels = ['negative', 'positive']
pipe(seq, labels)
The error generated is
RuntimeError: Placeholder storage has not been allocated on MPS device!
Which my guess is because seq is on my cpu and not mps. How can i fix this ?
Is there a way to send seq to the mps device so that i can pass it to the pipe for inference?
Thanks
When I had a similar problem, it was fixed by doing model = model.to("mps") though that shouldn't have been a problem in your case.
The following code works on my machine:
import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
from transformers import pipeline
mps_device = "mps"
pipe = pipeline('zero-shot-classification', device = mps_device)
seq = "i love watching the office show"
labels = ['negative', 'positive']
pipe(seq, labels)

save model output in pytorch

dic = []
for step, batch in tqdm(enumerate(train_dataloader)):
inpt = batch[0].to(device)
msks = batch[1].to(device)
#Run the sentences through the model
outputs = model_obj(inpt, msks)
dic.append( {
'hidden_states': outputs[2],
'pooled_output': outputs[1]})
I want to save the model output in each iteration but I got the below error for a small set of datasets.
RuntimeError: CUDA out of memory.
notice that without the below code my model works correctly.
dic.append( { 'hidden_states': outputs[2], 'pooled_output': outputs[1]})
How can I save these outputs in each iteration?
First of all, you should always post the full error stacktrace. Secondly, you should move the outputs from your GPU when you want to store them to free up memory:
dic.append( {
'hidden_states': outputs[2].detach().cpu().tolist(),
'pooled_output': outputs[1].detach().cpu().tolist()
})

Running multiple inferences in parallel with PyTorch

I'm trying to implement Double DQN (not to be confused with DQN with a slightly delayed Q-target network) in PyTorch to train an agent to play an Atari OpenAI Gym game. Here I discuss the implementation of the following formula:
Update of Q-network, formula taken from Sutton & Barto.
My first implementation is:
Q_pred = self.Q_1.forward(s_now)[T.arange(batch_size), actions.long()]
Q_next_all = self.Q_1.forward(s_next)
maxA_id = T.argmax(Q_next_all, dim=1)
Q_pred2 = self.Q_2.forward(s_next)[T.arange(batch_size), maxA_id]
Q_target = (rewards + (~dones) * self.GAMMA * Q_pred2).detach()
self.Q_1.optimizer.zero_grad()
self.Q_1.loss(Q_target, Q_pred).backward()
self.Q_1.optimizer.step()
(Q_1 and Q_2 are nn.Module classes, and all of the variables involved here are already torch tensors lying in the GPU.)
I noticed that my program ran much slower than a previous implementation which used plain DQN.
I realized that I can combine the batches entering Q_1, so there will be one combined batch being forwarded in the neural network, instead of two batches in sequence. The code becomes:
s_combined = T.cat((s_now, s_next))
Q_combined = self.Q_1.forward(s_combined)
Q_pred = Q_combined[T.arange(batch_size), actions.long()]
Q_next_all = Q_combined[batch_size:]
Q_pred2_all = self.Q_2.forward(s_next)
maxA_id = T.argmax(Q_next_all, dim=1)
Q_pred2 = Q_pred2_all[T.arange(batch_size), maxA_id]
Q_target = (rewards + (~dones) * self.GAMMA * Q_pred2).detach()
self.Q_1.optimizer.zero_grad()
self.Q_1.loss(Q_target, Q_pred).backward()
self.Q_1.optimizer.step()
(This proves that I understand how to do batch training in PyTorch, so don't mark this as a duplicate of this question.)
Furthermore, I realized that Q_1 and Q_2 can process their batches in parallel. So I looked up how to do multiprocessing in PyTorch. Unfortunately, I couldn't find a good example. I tried to adapt a code that looks similar to my scenario, and my code becomes:
def spawned():
s_combined = T.cat((s_now, s_next))
Q_combined = self.Q_1.forward(s_combined)
Q_pred = Q_combined[T.arange(batch_size), actions.long()]
Q_next_all = Q_combined[batch_size:]
mp.set_start_method('spawn', force=True)
p = mp.Process(target=spawned)
p.start()
Q_pred2_all = self.Q_2.forward(s_next)
p.join()
maxA_id = T.argmax(Q_next_all, dim=1)
Q_pred2 = Q_pred2_all[T.arange(batch_size), maxA_id]
Q_target = (rewards + (~dones) * self.GAMMA * Q_pred2).detach()
self.Q_1.optimizer.zero_grad()
self.Q_1.loss(Q_target, Q_pred).backward()
self.Q_1.optimizer.step()
This crashes with the error message:
AttributeError: Can't pickle local object 'Agent.learn.<locals>.spawned'
So how do I make this work?
(Achieving this in CUDA programming is trivial. One simply launches two device kernels using a sequential host code, and the two kernels are automatically computed in parallel in the GPU.)

rnnFusedPointwise no longer available in >=PyTorch 1.0

I had a codebase that runs fine in PyTorch 0.4 but whenever I try to run it using PyTorch 1.0+, it complains about the following import.
from torch.nn._functions.thnn import rnnFusedPointwise as fusedBackend
I use fusedBackend in the following code snippet.
if input.is_cuda:
igates = F.linear(input, w_ih)
hgates = F.linear(hx, w_hh)
state = fusedBackend.LSTMFused.apply
return state(igates, hgates, cx, b_ih, b_hh)
Is there any way I can fix the import error for PyTorch 1.0+?

ImportError: No module named 'forget_mult_cuda' error while using QRNN based pretrained Language model

I am trying to use a QRNN based encoder for text classification by tuning a QRNN pretrained LM.
Here is the configuration of qrnn
emb_sz:int = 400
nh: int = 1550
nl: int = 3
qrnn_config = copy.deepcopy(awd_lstm_lm_config)
dps = dict(output_p=0.25, hidden_p=0.1, input_p=0.2, embed_p=0.02, weight_p=0.15)
qrnn_config.update({'emb_sz':emb_sz, 'n_hid':nh, 'n_layers':nl, 'pad_token':1, 'qrnn':True})
qrnn_config
I am passing configuration to lm_learner
lm_learner = language_model_learner(data_lm, AWD_LSTM, config=qrnn_config, pretrained=False,drop_mult=.1,pretrained_fnames=(pretrained_lm_fname,pretrained_itos_fname))
What I am getting is:
ImportError: No module named 'forget_mult_cuda'
Fast-ai version is: '1.0.51.dev0'
Try cleaning cuda cash using
gc.collect()
torch.cuda.empty_cache()
Use this for updating QRnn to true
language model
config = awd_lstm_lm_config.copy()
config['qrrn']=True
Classification model
config = awd_lstm_clas_config.copy()
config['qrrn']=True
config
You need not copy anything from source code.
It seems that you're missing ninja package.
Use:
pip install ninja
And restart your notebook, if you're using it.

Resources