ValueError: Dimension 0 in both shapes must be equal, but are 0 and 512 when use of tensorflow 2.2 - python-3.x

I try to run PFE model. It works well when I run eval_lfw with tensorflow 2.1 and tensorflow 1.x but when I tried to run it with tensorflow 2.2 and more I have this error :
ValueError: Node 'gradients/UncertaintyModule/fc_log_sigma_sq/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512].
It happens when the model is loading when it does saver = tf.compat.v1.train.import_meta_graph(meta_file, clear_devices=True, import_scope=scope) to import the meta file of my model
To reproduce the error : download https://drive.google.com/drive/folders/10RnChjxtSAUc1lv7jbm3xkkmhFYyZrHP?usp=sharing
and run eval_lfw with parameters --model_dir pretrained/PFE_sphere64_msarcface_am --dataset_path data/Dataset --protocol_path ./proto/pairs_dataset.txt
Thank you for your help
Traceback (most recent call last):
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/evaluation/eval_lfw.py", line 78, in
main(args)
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/evaluation/eval_lfw.py", line 51, in main
network.load_model(args.model_dir)
File "/home/jordan/Bureau/Probabilistic-Face-Embeddings_new/network.py", line 169, in load_model
saver = tf.compat.v1.train.import_meta_graph(meta_file, clear_devices=True, import_scope=scope)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1462, in import_meta_graph
**kwargs)[0]
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1486, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 799, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/jordan/.local/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 501, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: Node 'gradients/UncertaintyModule/fc_log_sigma_sq/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512].

Related

python Tensorflow 2.4.0 'input must be 4-dimensional[1,1,371,300,3]' ERROR

im running Nicholas Rennote's TFODCourse.
when i execute the Evaluate the model code:
python Tensorflow\models\research\object_detection\model_main_tf2.py --model_dir=Tensorflow\workspace\models\my_ssd_mobnet --pipeline_config_path=Tensorflow\workspace\models\my_ssd_mobnet\pipeline.config --checkpoint_dir=Tensorflow\workspace\models\my_ssd_mobnet
error occurs like this
Traceback (most recent call last):
File "Tensorflow\models\research\object_detection\model_main_tf2.py", line 115, in <module>
tf.compat.v1.app.run()
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\All_Nighter\miniconda3\envs\TF\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\All_Nighter\miniconda3\envs\TF\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "Tensorflow\models\research\object_detection\model_main_tf2.py", line 82, in main
model_lib_v2.eval_continuously(
File "C:\Users\All_Nighter\miniconda3\envs\TF\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\model_lib_v2.py", line 1151, in eval_continuously
eager_eval_loop(
File "C:\Users\All_Nighter\miniconda3\envs\TF\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\model_lib_v2.py", line 928, in eager_eval_loop
for i, (features, labels) in enumerate(eval_dataset):
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 761, in __next__
return self._next_internal()
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 744, in _next_internal
ret = gen_dataset_ops.iterator_get_next(
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2727, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "C:\Users\All_Nighter\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\framework\ops.py", line 6897, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: input must be 4-dimensional[1,1,371,300,3]
[[{{node ResizeImage/resize/ResizeBilinear}}]] [Op:IteratorGetNext]
I can't understand what is input must be 4-dimensional[1,1,371,300,3] means.
i tried Labeling again, and downgrade TF to 2.4.0. but still happend.
ssd_mobilenet model expects input
A three-channel image of variable size - the model does NOT support
batching. The input tensor is a tf.uint8 tensor with shape [1, height,
width, 3] with values in [0, 255]
In this case you are giving 4-dimensional input[1,1,371,300,3],
Reshape your input data as [1,371,300,3].

TensorFlow 2.1 using TPUEstimator: RuntimeError: All tensors outfed from TPU should preserve batch size dimension, but got scalar Tensor

I just converted an existing project from TF 1.14 to TF 2.1 which uses the TPUEstimator API. After making the conversion, testing locally (i.e. use_tpu=False) runs successfully. However, I am getting errors when running on Google Cloud TPU (i.e. use_tpu=True).
Note: This is in the context of the AdaNet AutoML framework (v0.8.0), although I suspect this may be a general TPUEstimator-related error, as the errors appear to originate in the tpu_estimator.py and error_handling.py scripts seen in the Traceback below:
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3032, in train
rendezvous.record_error('training_loop', sys.exc_info())
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 81, in record_error
if value and value.op and value.op.type == _CHECK_NUMERIC_OP_NAME:
AttributeError: 'RuntimeError' object has no attribute 'op'
During handling of the above exception, another exception occurred:
File "workspace/trainer/train.py", line 331, in <module>
main(args=parsed_args)
File "workspace/trainer/train.py", line 177, in main
run_config=run_config)
File "workspace/trainer/train.py", line 68, in run_experiment
estimator.train(input_fn=train_input_fn, max_steps=total_train_steps)
File "/usr/local/lib/python3.6/site-packages/adanet/core/estimator.py", line 853, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
rendezvous.raise_errors()
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 143, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 374, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1164, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1194, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
config)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1152, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3186, in _model_fn
host_ops = host_call.create_tpu_hostcall()
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2226, in create_tpu_hostcall
'dimension, but got scalar {}'.format(dequeue_ops[i][0]))
RuntimeError: All tensors outfed from TPU should preserve batch size dimension, but got scalar Tensor("OutfeedDequeueTuple:1", shape=(), dtype=int64, device=/job:tpu_worker/task:0/device:CPU:0)'
The previous version of the project using TF 1.14 runs both locally and on TPU using TPUEstimator without issues. Is there something obvious I am potentially missing for the conversion over to TF 2.1 when using TPUEstimator API?
Have you applied the following:
dataset = ...
dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(batch_size))
this potentially drops the last few samples from a file to ensure that every batch has a static shape of batch_size, which is required when training on TPUs.

keras model.fit_generator error.how do i solve this issue?

I have checked documentation for keras.fit_generator function still not able to find the problem
Libraries are working fine in my laptop
My code:
# train the network
print("training network...")
sys.stdout.flush()
#class_mode ='categorical', # 2D one-hot encoded labels
H = model.fit_generator(aug.flow(Xtrain, trainY,batch_size=BS),
validation_data=(Xval, valY),
steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS, verbose=1)
# save the model to disk
print("Saving model to disk")
sys.stdout.flush()
model.save("/tmp/mymodel")
i am getting following error for my code :
Traceback (most recent call last):File
"C:\Users\user\AppData\Local\conda\conda\envs\my_root\lib\site-
packages\IPython\core\interactiveshell.py", line 3267, in run_code
File "<ipython-input-80-935b20410c11>", line 8, in <module>
epochs=EPOCHS, verbose=1)
File "C:\Users\user\AppData\Local\conda\conda\envs\my_root\lib\site-
packages\keras\legacy\interfaces.py", line 91, in wrapper
File "C:\Users\user\AppData\Local\conda\conda\envs\my_root\lib\site-
packages\keras\engine\training.py", line 1418, in fit_generator
File "C:\Users\user\AppData\Local\conda\conda\envs\my_root\lib\site-
packages\keras\engine\training_generator.py", line 162, in fit_generator
File "C:\Users\user\AppData\Local\conda\conda\envs\my_root\lib\site-
packages\keras\utils\data_utils.py", line 647, in __init__
File "C:\Users\user\AppData\Local\conda\conda\envs\my_root\lib\site-
packages\keras\utils\data_utils.py", line 433, in __init__
File"C:\Users\user\AppData\Local\conda\conda\envs\
my_root\lib\multiprocessing\context.py", line 133, in Value
File "C:\Users\user\AppData\Local\conda\conda\envs\
my_root\lib\multiprocessing\sharedctypes.py", line 182
exec template % ((name,)*7) in d
^
SyntaxError: invalid syntax

Tensorflow Greater Operator Giving an Error

I am stuck in a simple looking problem in Tensorflow.
Traceback (most recent call last):
File op_def_library.py, line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File ops.py, line 1040, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File ops.py, line 883, in _TensorTensorConversionFunction
(dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype int64 for Tensor with dtype float32: 'Tensor("sequence_sparse_softmax_cross_entropy/zeros_like:0", shape=(?, ?, 10004), dtype=float32)'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "red.py", line 281, in <module>
main()
File "red.py", line 99, in main
sequence_length=lengths)
File loss.py, line 225, in sequence_sparse_softmax_cross_entropy
losses = xloss(labels=labels, logits=logits)
File loss.py", line 48, in loss
post = array_ops.where(target_tensor > zeros, target_tensor - sigmoid_p, zeros)
gen_math_ops.py, line 2924, in greater
"Greater", x=x, y=y, name=name)
op_def_library.py, line 546, in _apply_op_helper
inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Greater' Op has type float32 that does not match type int64 of argument 'x'
Using as type also does not work.
I just defined another function to be used. I defined it and tried to use it. What should I do to make it work? I just want to define a function that takes tensors as input just like tf cross entropy function. Please suggest how to do that.
In particular, how can I resolve the error?

Facing RuntimeError: invalid argument 3: sizes do not match

My loss function is NLL loss,which takes in as inputs shape as [108416, 3] and targets shape as [108416] and i get a resulting loss value of 2.2623 , but after the loss computation when i do the optimizer.step() call. I get
AND THIS IS LOSS Variable containing:
2.2623
[torch.cuda.FloatTensor of size 1 (GPU 0)]
Traceback (most recent call last):
File "/mnt/sdc1/project/training/fpr4x_liver_1x_2channel.py", line 336, in <module>
train_fpr4x_liver_1x_2channel_model()
File "/mnt/sdc1/project/training/fpr4x_liver_1x_2channel.py", line 245, in train_fpr4x_liver_1x_2channel_model
est.run_experiment(opts.num_epochs, 5000,50)
File "/media/redible/sdc/project/training/expt_utils.py", line 236, in run_experiment
self.trainer.train()
File "/media/redible/sdc/project/training/expt_utils.py", line 75, in train
loss, outputs = self.net_mgr._forward_backward(network_inputs, loss_inputs)
File "/media/redible/sdc/project/training/network_manager.py", line 19, in _forward_backward
self.optimizer.step()
File "/usr/local/lib/python3.5/dist-packages/torch/optim/adam.py", line 69, in step
exp_avg.mul_(beta1).add_(1 - beta1, grad)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:271
Not sure whats the cause of the error any help would be appreciated,Thanks in advance.

Resources