pytorch error: index out of range in self - pytorch

I am training a BPR model following this repo https://github.com/guoyang9/BPR-pytorch
I have two experiments, and they are very similar. I preprocessed it use the same method, but one of them experiences the following error. I think it could be item size problem, but still not be able to figure out the exact reason. Could someone guide me through here? Thank you.
Traceback (most recent call last):
File , line 215, in <module>
HR, NDCG = metrics(model, test_loader, top_k)
File "t.py", line 174, in metrics
prediction_i, prediction_j = model(user, item_i, item_j)
File "anaconda3/envs/BPR/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/t.py", line 137, in forward
item_i = self.embed_item(item_i)
File "/anaconda3/envs/BPR/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda3/envs/BPR/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/anaconda3/envs/BPR/lib/python3.10/site-packages/torch/nn/functional.py", line 2183, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

Related

PyG: RuntimeError: Tensors must have same number of dimensions: got 2 and 3

I am using TransformerConv and encountered this error:
Traceback (most recent call last):
File "pipeline_model_gat.py", line 1018, in <module>
output = model(
File"/mount/arbeitsdaten61/studenten3/advanced-ml/2022/gogirlspower/nicole/conda/envs/new_gvqa/lib/python3.8/sitepackages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "pipeline_model_gat.py", line 881, in forwardquestions_encoded = self.question_encoder(question_graphs)
File "/mount/arbeitsdaten61/studenten3/advanced-ml/2022/gogirlspower/nicole/conda/envs/new_gvqa/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "pipeline_model_gat.py", line 628, in forward= self.conv1(x, question_graphs.edge_index, edge_attr)
File "/mount/arbeitsdaten61/studenten3/advanced-ml/2022/gogirlspower/nicole/conda/envs/new_gvqa/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mount/arbeitsdaten61/studenten3/advanced-ml/2022/gogirlspower/nicole/conda/envs/new_gvqa/lib/python3.8/site-packages/torch_geometric/nn/conv/transformer_conv.py", line 190, in forward
beta = self.lin_beta(torch.cat([out, x_r, out - x_r], dim=-1))
RuntimeError: Tensors must have same number of dimensions: got 2 and 3
Can someone please tell me what could have gone wrong?

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0

I have a wrapper for a huggingface model. In this wrapper I have some encoders, which are mainly a series of embeddings. In forward of the wrapped model, I want to call forward of each of encoders in a loop, but I get the error:
Traceback (most recent call last):
File "/home/pouramini/mt5-comet/comet/train/train.py", line 1275, in <module>
run()
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/click/core.py", line 716, in __call__
return self.main(*args, **kwargs)
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/home/pouramini/mt5-comet/comet/train/train.py", line 1069, in train
result = wrapped_model(**batch)
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pouramini/mt5-comet/comet/transformers_ptuning/ptuning_wrapper.py", line 135, in forward
prompt_embeds = encoder(prompt_input_ids,\
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pouramini/mt5-comet/comet/transformers_ptuning/ptuning_wrapper.py", line 238, in forward
return self.embedding(prompt_token_ids)
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/home/pouramini/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2043, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument index in method wrapper_index_select)
This the code that results the error:
for encoder in self.prompt_encoders:
#encoder = self.prompt_encoders[0]
wlog.info("********** offset: %s, length: %s", encoder.id_offset, encoder.length)
prompt_token_fn = encoder.get_prompt_token_fn()
encoder_masks = prompt_token_fn(input_ids)
wlog.info("Encoder masks: %s", encoder_masks)
if encoder_masks.any():
#find input ids for prompt tokens
prompt_input_ids = input_ids[encoder_masks]
wlog.info("Prompt Input ids: %s", prompt_input_ids)
# call forwards on prompt encoder whose outputs are prompt embeddings
prompt_embeds = encoder(prompt_input_ids,\
prompt_ids).to(device=inputs_embeds.device)
The code however runs if I just use cpu as device. Also if I have one encoder, the code is run with cuda, but when there are multiple encoders, it seems it expects that all of them are transfered to device, which I don't know how to do that.
Based on comments, I added the following code before training.
wrapped_model.to(device=device)
for encoder in wrapped_model.prompt_encoders:
encoder.to(device=device)
Interestingly, when there was a single encoder or a list of encoders including one encoder, I didn't need to explicitly put it on device, but for the list of encoders it seems I must.
The reason could be that I put the single encoder on device in the forward function.

Pytorch to ONNX export function fails and causes legacy function error

I am trying to convert the pytorch model in this link to onnx model using the code below :
device=t.device('cuda:0' if t.cuda.is_available() else 'cpu')
print(device)
faster_rcnn = FasterRCNNVGG16()
trainer = FasterRCNNTrainer(faster_rcnn).cuda()
#trainer = FasterRCNNTrainer(faster_rcnn).to(device)
trainer.load('./checkpoints/model.pth')
dummy_input = t.randn(1, 3, 300, 300, device = 'cuda')
#dummy_input = dummy_input.to(device)
t.onnx.export(faster_rcnn, dummy_input, "model.onnx", verbose = True)
But I get the following error (Sorry for the block quote below stackoverflow wouldn't let the whole trace be in code format and wouldn't let the question be posted otherwise):
Traceback (most recent call last):
small_object_detection_master_samirsen\onnxtest.py", line 44, in <module>
t.onnx.export(faster_rcnn, dummy_input, "fasterrcnn_10120119_06025842847785781.onnx", verbose = True)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\onnx\__init__.py",
line 132, in export
strip_doc_string, dynamic_axes)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\onnx\utils.py",
line 64, in export
example_outputs=example_outputs, strip_doc_string=strip_doc_string, dynamic_axes=dynamic_axes)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\onnx\utils.py",
line 329, in _export
_retain_param_name, do_constant_folding)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\onnx\utils.py",
line 213, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\onnx\utils.py",
line 171, in _trace_and_get_graph_from_model
trace, torch_out = torch.jit.get_trace_graph(model, args, _force_outplace=True)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\jit__init__.py",
line 256, in get_trace_graph
return LegacyTracedModule(f, _force_outplace, return_inputs)(*args, **kwargs)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py",
line 547, in call
result = self.forward(*input, **kwargs)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\jit__init__.py",
line 323, in forward
out = self.inner(*trace_inputs)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py",
line 545, in call
result = self._slow_forward(*input, **kwargs)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py",
line 531, in _slow_forward
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py",
line 531, in _slow_forward
result = self.forward(*input, **kwargs)
File "D:\smallobject2\export test s\small_object_detection_master_samirsen\model\faster_rcnn.py", line
133, in forward
h, rois, roi_indices)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py",
line 545, in call
result = self._slow_forward(*input, **kwargs)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py",
line 531, in _slow_forward
result = self.forward(*input, **kwargs)
File "D:\smallobject2\export test s\small_object_detection_master_samirsen\model\faster_rcnn_vgg16.py",
line 142, in forward
pool = self.roi(x, indices_and_rois)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py",
line 545, in call
result = self._slow_forward(*input, **kwargs)
File "C:\Users\HP\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py",
line 531, in _slow_forward
result = self.forward(*input, **kwargs)
File "D:\smallobject2\export test s\small_object_detection_master_samirsen\model\roi_module.py", line
85, in forward
return self.RoI(x, rois)
RuntimeError: Attempted to trace RoI, but tracing of legacy functions is not supported
This is because ONNX does not support torch.grad.Function. The issue is because ROI class Refer this
To overcome the issue, you have to implement the forward and backward function as a separate function definition rather than a member of ROI class.
The function call to ROI in FasterRCNNVGG16 is supposed to be altered to explicit call forward and backward functions.

PyTorch runtime error: expected argument to have type long, but got CPUType instead

I'm new to PyTorch and going through this tutorial on the transformer model. I'm using PyCharm on Win10.
For now, I've basically just copy-pasted the example code, but I'm getting the following error:
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CPUType instead (while checking arguments for embedding)
It seems to be coming from this line
def encode(self, src, src_mask):
return self.encodder(self.src_embed(src), src_mask)
Tbh, I'm not even sure what this means, let alone how I should go about fixing it.
What's a CPUType? When did I create a variable of such type? From looking at the code I'm only using tensors (or numpy arrays)
here's the full error message:
C:...\Python\Python37\lib\site-packages\torch\nn_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
C:/.../PycharmProjects/Transformer/all_the_code.py:263: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
nn.init.xavier_uniform(p)
Traceback (most recent call last):
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 421, in
SimpleLossCompute(model.generator, criterion, model_opt))
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 297, in run_epoch
batch.src_mask, batch.trg_mask)
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 30, in forward
return self.decode(self.encode(src, src_mask), src_mask,
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 34, in encode
return self.encoder(self.src_embed(src), src_mask)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
input = module(input)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 218, in forward
return self.lut(x) * math.sqrt(self.d_model)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\sparse.py", line 117, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "C:...\Python\Python37\lib\site-packages\torch\nn\functional.py", line 1506, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

Pytorch model weight type conversion

I'm trying to do inference on FlowNet2-C model loading from file.
However, I met some data type problem. How can I resolve it?
Source code
FlowNet2-C pre-trained model
$ python main.py
Initializing Datasets
[0.000s] Loading checkpoint '/notebooks/data/model/FlowNet2-C_checkpoint.pth.tar'
[1.293s] Loaded checkpoint '/notebooks/data/model/FlowNet2-C_checkpoint.pth.tar' (at epoch 0)
(1L, 6L, 384L, 512L)
<class 'torch.autograd.variable.Variable'>
[1.642s] Operation failed
Traceback (most recent call last):
File "main.py", line 102, in <module>
main()
File "main.py", line 98, in main
summary(input_size, model)
File "main.py", line 61, in summary
model(x)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/notebooks/data/vinet/FlowNetC.py", line 75, in forward
out_conv1a = self.conv1(x1)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: Input type (CUDAFloatTensor) and weight type (CPUFloatTensor) should be the same
Maybe that is because your model and input x to the model has different data types. It seems that the model parameters have been moved to GPU, but your input x is on GPU.
You can try to use model.cuda() after line 94, which will put the model on the GPU. Then the error should disappear.

Resources