My loss function is NLL loss,which takes in as inputs shape as [108416, 3] and targets shape as [108416] and i get a resulting loss value of 2.2623 , but after the loss computation when i do the optimizer.step() call. I get
AND THIS IS LOSS Variable containing:
2.2623
[torch.cuda.FloatTensor of size 1 (GPU 0)]
Traceback (most recent call last):
File "/mnt/sdc1/project/training/fpr4x_liver_1x_2channel.py", line 336, in <module>
train_fpr4x_liver_1x_2channel_model()
File "/mnt/sdc1/project/training/fpr4x_liver_1x_2channel.py", line 245, in train_fpr4x_liver_1x_2channel_model
est.run_experiment(opts.num_epochs, 5000,50)
File "/media/redible/sdc/project/training/expt_utils.py", line 236, in run_experiment
self.trainer.train()
File "/media/redible/sdc/project/training/expt_utils.py", line 75, in train
loss, outputs = self.net_mgr._forward_backward(network_inputs, loss_inputs)
File "/media/redible/sdc/project/training/network_manager.py", line 19, in _forward_backward
self.optimizer.step()
File "/usr/local/lib/python3.5/dist-packages/torch/optim/adam.py", line 69, in step
exp_avg.mul_(beta1).add_(1 - beta1, grad)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:271
Not sure whats the cause of the error any help would be appreciated,Thanks in advance.
Related
I'm trying to reproduce meshed memory transformer https://github.com/aimagelab/meshed-memory-transformer, but I get this error: RuntimeError: gather_out_cuda(): Expected dtype int64 for index.
The detailed error information are as follows:
Traceback (most recent call last):
File "/home/ai/data/meshed-memory-transformer/train.py", line 252, in <module>
scores = evaluate_metrics(model, dict_dataloader_val, text_field)
File "/home/ai/data/meshed-memory-transformer/train.py", line 55, in evaluate_metrics
out, _ = model.beam_search(images, 20, text_field.vocab.stoi['<eos>'], 5, out_size=1)
File "/home/ai/data/meshed-memory-transformer/models/captioning_model.py", line 70, in beam_search
return bs.apply(visual, out_size, return_probs, **kwargs)
File "/home/ai/data/meshed-memory-transformer/models/beam_search/beam_search.py", line 82, in apply
visual, outputs = self.iter(t, visual, outputs, return_probs, **kwargs)
File "/home/ai/data/meshed-memory-transformer/models/beam_search/beam_search.py", line 132, in iter
self.model.apply_to_states(self._expand_state(selected_beam, cur_beam_size))
File "/home/ai/data/meshed-memory-transformer/models/containers.py", line 30, in apply_to_states
self._buffers[name] = fn(self._buffers[name])
File "/home/ai/data/meshed-memory-transformer/models/beam_search/beam_search.py", line 38, in fn
beam.expand(*([self.b_s, self.beam_size] + shape[1:])))
RuntimeError: gather_out_cuda(): Expected dtype int64 for index
GPU version:RTX3090
Cuda compilation tools, release 11.2, V11.2.142
Build cuda_11.2.r11.2/compiler.29558016_0
PyTorch version:1.9.1 build:py3.7_cuda11.1_cudnn8.0.5_0
im running Nicholas Rennote's TFODCourse.
when i execute the Evaluate the model code:
python Tensorflow\models\research\object_detection\model_main_tf2.py --model_dir=Tensorflow\workspace\models\my_ssd_mobnet --pipeline_config_path=Tensorflow\workspace\models\my_ssd_mobnet\pipeline.config --checkpoint_dir=Tensorflow\workspace\models\my_ssd_mobnet
error occurs like this
Traceback (most recent call last):
File "Tensorflow\models\research\object_detection\model_main_tf2.py", line 115, in <module>
tf.compat.v1.app.run()
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\All_Nighter\miniconda3\envs\TF\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\All_Nighter\miniconda3\envs\TF\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "Tensorflow\models\research\object_detection\model_main_tf2.py", line 82, in main
model_lib_v2.eval_continuously(
File "C:\Users\All_Nighter\miniconda3\envs\TF\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\model_lib_v2.py", line 1151, in eval_continuously
eager_eval_loop(
File "C:\Users\All_Nighter\miniconda3\envs\TF\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\model_lib_v2.py", line 928, in eager_eval_loop
for i, (features, labels) in enumerate(eval_dataset):
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 761, in __next__
return self._next_internal()
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 744, in _next_internal
ret = gen_dataset_ops.iterator_get_next(
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2727, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\framework\ops.py", line 6897, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: input must be 4-dimensional[1,1,371,300,3]
[[{{node ResizeImage/resize/ResizeBilinear}}]] [Op:IteratorGetNext]
I can't understand what is input must be 4-dimensional[1,1,371,300,3] means.
i tried Labeling again, and downgrade TF to 2.4.0. but still happend.
ssd_mobilenet model expects input
A three-channel image of variable size - the model does NOT support
batching. The input tensor is a tf.uint8 tensor with shape [1, height,
width, 3] with values in [0, 255]
In this case you are giving 4-dimensional input[1,1,371,300,3],
Reshape your input data as [1,371,300,3].
I try to run PFE model. It works well when I run eval_lfw with tensorflow 2.1 and tensorflow 1.x but when I tried to run it with tensorflow 2.2 and more I have this error :
ValueError: Node 'gradients/UncertaintyModule/fc_log_sigma_sq/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512].
It happens when the model is loading when it does saver = tf.compat.v1.train.import_meta_graph(meta_file, clear_devices=True, import_scope=scope) to import the meta file of my model
To reproduce the error : download https://drive.google.com/drive/folders/10RnChjxtSAUc1lv7jbm3xkkmhFYyZrHP?usp=sharing
and run eval_lfw with parameters --model_dir pretrained/PFE_sphere64_msarcface_am --dataset_path data/Dataset --protocol_path ./proto/pairs_dataset.txt
Thank you for your help
Traceback (most recent call last):
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/evaluation/eval_lfw.py", line 78, in
main(args)
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/evaluation/eval_lfw.py", line 51, in main
network.load_model(args.model_dir)
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/network.py", line 169, in load_model
saver = tf.compat.v1.train.import_meta_graph(meta_file, clear_devices=True, import_scope=scope)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1462, in import_meta_graph
**kwargs)[0]
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1486, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 799, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 501, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: Node 'gradients/UncertaintyModule/fc_log_sigma_sq/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512].
I am stuck in a simple looking problem in Tensorflow.
Traceback (most recent call last):
File op_def_library.py, line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File ops.py, line 1040, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File ops.py, line 883, in _TensorTensorConversionFunction
(dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype int64 for Tensor with dtype float32: 'Tensor("sequence_sparse_softmax_cross_entropy/zeros_like:0", shape=(?, ?, 10004), dtype=float32)'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "red.py", line 281, in <module>
main()
File "red.py", line 99, in main
sequence_length=lengths)
File loss.py, line 225, in sequence_sparse_softmax_cross_entropy
losses = xloss(labels=labels, logits=logits)
File loss.py", line 48, in loss
post = array_ops.where(target_tensor > zeros, target_tensor - sigmoid_p, zeros)
gen_math_ops.py, line 2924, in greater
"Greater", x=x, y=y, name=name)
op_def_library.py, line 546, in _apply_op_helper
inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Greater' Op has type float32 that does not match type int64 of argument 'x'
Using as type also does not work.
I just defined another function to be used. I defined it and tried to use it. What should I do to make it work? I just want to define a function that takes tensors as input just like tf cross entropy function. Please suggest how to do that.
In particular, how can I resolve the error?
I was trying to build a 3D convolutional layer using keras. It works fine, but when I added a subsample parameter it crashed. The code:
l_1 = Convolution3D(2, 10,10,10,
border_mode='same',
name = 'l_1',
activation='relu',
subsample = (5,5,5)
)(inputs)
the error is:
Traceback (most recent call last):
File "image_proc_09.py", line 244, in <module>
)(inputs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in __call__
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/convolutional.py", line 1234, in call
filter_shape=self.W_shape)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 1627, in conv3d
dim_ordering, volume_shape, filter_shape)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 1686, in _old_theano_conv3d
assert(strides == (1, 1, 1))
AssertionError
I am using theano 0.8.2.
Thanks
You cannot use the subsample parameter with border_mode='same'. Use 'valid' or 'full'
Check out the line of code where the assertion error happens