XLA in TF2 IteratorGetNext: unsupported op error - python-3.x

I am trying to simply run a .pb tensorflow 2 model with XLA.
However, I get the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Function invoked by the following node is not compilable: {{node __inference_predict_function_3130}} = __inference_predict_function_3130[_XlaMustCompile=true, config_proto="\n\007\n\003CPU\020\001\n\007\n\003GPU\020\0002\002J\0008\001\202\001\000", executor_type=""](dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, ...).
Uncompilable nodes:
IteratorGetNext: unsupported op: No registered 'IteratorGetNext' OpKernel for XLA_CPU_JIT devices compatible with node {{node IteratorGetNext}}
Stacktrace:
Node: __inference_predict_function_3130, function:
Node: IteratorGetNext, function: __inference_predict_function_3130
[Op:__inference_predict_function_3130]
The error occurs independent of the model and also when I directly apply a model after I trained it. I think, I am doing something fundamentally wrong or XLA is not properly supported for TF2. The same code without TF XLA running.
Does anyone have any idea how to fix this issue?
I am working in an Ubuntu 18.04 with python 3.8 in anaconda and TF 2.4.1
My code:
import tensorflow as tf
import numpy as np
import h5py
import sys
model_path_compile= 'model_Input/pbFolder'
data_inference_mat ='model_Input/data_inference/XXXX.MAT'
with h5py.File(data_inference_mat, 'r') as dataset:
try:
image_set = dataset['polar'][()].astype(np.uint16).T
image = np.cast[np.float32](image_set)
image /= 16384
except KeyError:
print('-----------------------ERROR--------------')
x = np.expand_dims(image, axis=0)
model_compile = tf.keras.models.load_model(model_path_compile)
with tf.device("device:XLA_CPU:0"):
y_pred = model_compile.predict(x)`
The full error:
2021-07-19 16:09:02.521211: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-07-19 16:09:02.521416: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-19 16:09:02.522638: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-07-19 16:09:03.357078: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-07-19 16:09:03.378059: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2400000000 Hz
Traceback (most recent call last):
File "/media/ric/DATA/Software_Workspaces/MasterThesisWS/AI_HW_deploy/XLA/Tf2ToXLA_v2/TF2_RunModel.py", line 24, in <module>
y_pred = model_compile.predict(x)
File "/home/ric/anaconda3/envs/TfToXLA/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1629, in predict
tmp_batch_outputs = self.predict_function(iterator)
File "/home/ric/anaconda3/envs/TfToXLA/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/home/ric/anaconda3/envs/TfToXLA/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 894, in _call
return self._concrete_stateful_fn._call_flat(
File "/home/ric/anaconda3/envs/TfToXLA/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/ric/anaconda3/envs/TfToXLA/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 555, in call
outputs = execute.execute(
File "/home/ric/anaconda3/envs/TfToXLA/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Function invoked by the following node is not compilable: {{node __inference_predict_function_3130}} = __inference_predict_function_3130[_XlaMustCompile=true, config_proto="\n\007\n\003CPU\020\001\n\007\n\003GPU\020\0002\002J\0008\001\202\001\000", executor_type=""](dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, dummy_input, ...).
Uncompilable nodes:
IteratorGetNext: unsupported op: No registered 'IteratorGetNext' OpKernel for XLA_CPU_JIT devices compatible with node {{node IteratorGetNext}}
Stacktrace:
Node: __inference_predict_function_3130, function:
Node: IteratorGetNext, function: __inference_predict_function_3130
[Op:__inference_predict_function_3130]

After some days of work and all kinds of approaches, I finally found a workaround for my purposes.
As I only want the LLVM IR of one execution of the model, I can use an alternative function of TensorFlow, model.predict_step. It only runs once and thus does not utilise the IteratorGetNext method avoiding the initial error.

Related

Unsuccessful TensorSliceReader constructor error on Tensorflow while loading saved model

I am following this tutorial. All my tensorflow and CUDA setups are complete and tested to be fully working.
My test of the setup
import torch
import tensorflow as tf
import tensorflow.keras as ks
print(tf)
print(ks)
print(torch.cuda.is_available())
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
X_train = torch.FloatTensor([0., 1., 2.])
X_train = X_train.to(device)
print(X_train.is_cuda)
print(torch.cuda.current_device())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))
Gives
<module 'tensorflow' from 'C:\\Venv\\time_series_forecast\\lib\\site-packages\\tensorflow\\__init__.py'>
<module 'tensorflow.keras' from 'C:\\Venv\\time_series_forecast\\lib\\site-packages\\keras\\api\\_v2\\keras\\__init__.py'>
True
True
0
1
NVIDIA GeForce GTX 1070 Ti
In my main.py file, I have these imports
import torch
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
from tensorflow.keras.models import load_model
when I try to load the saved classifier, I am getting error in these But the following lines with load_model gives an error
classifier_path = '../saved_classifiers/bury_pnas_21/len500/best_model_1_1_len500.pkl'
classifier = load_model(classifier_path)
I am getting this error
Traceback (most recent call last):
File "G:\My Drive\Working_Dir\\project_and_initiative\forecast\code_v8\_test_p9.py", line 46, in <module>
classifier = load_model(final_classifier_path)
File "C:\Venv\time_series_forecast\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Venv\time_series_forecast\lib\site-packages\tensorflow\python\saved_model\load.py", line 948, in load_partial
raise FileNotFoundError(
FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for G:\My Drive\Working_Dir\project_and_initiative\forecast\code_v8\saved_dl_classifiersv\bury_pnas_21\len500\best_model_1_1_len500.pkl\variables\variables
You may be trying to load on a different device from the computational device. Consider setting the `experimental_io_device` option in `tf.saved_model.LoadOptions` to the io_device such as '/job:localhost'.
Process finished with exit code 1
I downloaded the .pkl from here including its content from the variables folder, and placed it the same folder structure as suggested in the tutorial. What is the cause and what should I try?

TypeError: 'str' object is not callable | FastAi

Goal: instantiate unet_learner() using weights.
weights is a str that I bring in from a user-defined .yaml file; hence eval().
file_path and training are classes that hold parameters.
Code:
import numpy as np
from fastai.vision.all import *
def train(dls, file_path, training):
labels = np.loadtxt(file_path.labels, dtype=str)
weights = torch.tensor(eval(training.weights))
print('#################')
print(weights)
print(type(weights))
print('#################')
learner = unet_learner(dls, training.architecture,loss_func=CrossEntropyLossFlat(
axis=1,
weight=weights)
)
return learner.load(file_path.weights)
Placing torch.tensor() around weights again in the parameter line doesn't help. Same error.
Traceback:
(venv) me#ubuntu-pcs:~/PycharmProjects/project$ python pdl1_lung_train/main.py
/home/me/miniconda3/envs/venv/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /opt/conda/conda-bld/pytorch_1607370156314/work/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
#################
tensor([0.4000, 0.9000])
<class 'torch.Tensor'>
#################
Traceback (most recent call last):
File "pdl1_lung_train/main.py", line 27, in <module>
main(ROOT)
File "pdl1_lung_train/main.py", line 19, in main
learner = train(dls, file_path, training)
File "/home/me/PycharmProjects/project/pdl1_lung_train/train.py", line 16, in train
weight=weights))
File "/home/me/miniconda3/envs/venv/lib/python3.7/site-packages/fastai/vision/learner.py", line 267, in unet_learner
model = create_unet_model(arch, n_out, img_size, pretrained=pretrained, **kwargs)
File "/home/me/miniconda3/envs/venv/lib/python3.7/site-packages/fastai/vision/learner.py", line 243, in create_unet_model
model = arch(pretrained)
TypeError: 'str' object is not callable
Please let me know if I need to add other info. to post.
I might be wrong but I think your training.architecture is a string. But according to unet_learner documentation it has to be callable.

tensorRT with grpc multi threading error, how to fix it?

Description
Environment
TensorRT Version: 8.2.3.0
NVIDIA GPU: gtx 1080ti
NVIDIA Driver Version: 470.103.01
CUDA Version: 11.4
CUDNN Version: 8.2
Operating System: Linux 18.06
Python Version (if applicable): 3.8.0
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 1.10
Baremetal or Container (if so, version):
grpc server code
server = grpc.server(
futures.ThreadPoolExecutor(),
options=[
("grpc.max_send_message_length", -1),
("grpc.max_receive_message_length", -1),
("grpc.so_reuseport", 1),
("grpc.use_local_subchannel_pool", 1),
],
)
grpc stub init
grpcObject(encoder=trt_model, decoder=decoder)
trt_model init code
def __init__(self):
cuda_ctx = cuda.Device(0).make_context()
self.cuda_ctx = cuda_ctx
if self.cuda_ctx:
self.cuda_ctx.push()
...
Hello.
I'm using TensorRT via grpc.
However, after setting max_worker in the multi-threading function of grpc, the following error occurs when requests come in from multiple clients.
In case of max_worker=1, no error occurs. Can you help?
infer method
def infer(self, wav_path):
input_signal = preprocess_stt(wav_path)
if self.cuda_ctx:
self.cuda_ctx.push()
self.context.set_binding_shape(0, input_signal.shape)
assert self.context.all_binding_shapes_specified
h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
h_input_signal = cuda.register_host_memory(np.ascontiguousarray(to_numpy(input_signal)))
cuda.memcpy_htod_async(self.d_input, h_input_signal, self.stream)
self.context.execute_async(bindings=[int(self.d_input), int(self.d_output)], stream_handle=self.stream.handle)
cuda.memcpy_dtoh_async(h_output, self.d_output, self.stream)
self.stream.synchronize()
if self.cuda_ctx:
self.cuda_ctx.pop()
return h_output
error
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered
E0228 17:02:30.063214 140249774667520 _server.py:471] Exception iterating responses: cuMemHostAlloc failed: an illegal memory access was encountered
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/grpc/_server.py", line 461, in _take_response_from_response_iterator
return next(response_iterator), True
File "/data/grpc/stt_grpc/grpc_class/dummy_grpc_core.py", line 116, in getStream
stt_result = trt_inference(self.trt_model, 'aaa.wav', self.decoder)
File "/data/grpc/stt_grpc/stt_package/stt_func.py", line 525, in trt_inference
model_output = actor.infer('aaa.wav')
File "/data/grpc/stt_grpc/grpc_class/tensorrt_stt.py", line 153, in infer
h_output = cuda.pagelocked_empty(tuple(self.context.get_binding_shape(1)), dtype=np.float32)
pycuda._driver.LogicError: cuMemHostAlloc failed: an illegal memory access was encountered

cuDNN_STATUS_ALLOC_FAILED when trying to run a tutorial CNN with tensorflow

I am trying to run a simple python script with a convolutional neural network (CNN). Every time I run the script I come across the following error message
2021-03-10 19:47:03.832061: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
Traceback (most recent call last):
File "CNN_trial.py", line 17, in <module>
outputs = tf.nn.conv2d(images,filters,strides = 1,padding = "SAME")
File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2158, in conv2d_v2
return conv2d(input, # pylint: disable=redefined-builtin
File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2264, in conv2d
return gen_nn_ops.conv2d(
File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 942, in conv2d
return conv2d_eager_fallback(
File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1031, in conv2d_eager_fallback
_result = _execute.execute(b"Conv2D", 1, inputs=_inputs_flat, attrs=_attrs,
File "D:\miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
My system is as follows
Windows 10
AMD Ryzen 7 3700x
16GB RAM
Nvidia RTX 2060
Python 3.8.5
Tensorflow 2.4.1
my full code:
from sklearn.datasets import load_sample_image
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
china = load_sample_image("china.jpg")/255
flower = load_sample_image("flower.jpg")/255
images = np.array([china,flower])
batch_size, height,width,channels = images.shape
filters = np.zeros(shape=(7,7,channels,2),dtype=np.float32)
filters[:,3,:,0] = 1
filters[3,:,:,1] = 1
outputs = tf.nn.conv2d(images,filters,strides = 1,padding = "SAME")
plt.imshow(outputs[0,:,:,1],cmap = "gray")
plt.show()
It seems that I need to set the memory growth. By adding the following two lines to the beginning of the script. I got it to at least run.
devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(devices[0],True)

TensorFlow AlphaDropout: rank undefined

I am trying to set-up a neural network using TensorFlow's tf.contrib.nn.alpha_dropout (as implemented in TensorFlow version 1.12.0). Please consider the following example:
import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
from tensorflow.contrib.nn import alpha_dropout
import numpy as np
N_data = 100
x_in = tf.placeholder(tf.float32, shape=[None, N_data], name="x_in")
keep_prob = tf.placeholder(tf.float32)
fc = fully_connected(inputs=x_in, num_outputs=N_data)
drop = alpha_dropout(fc, keep_prob=keep_prob)
x_out = fully_connected(inputs=drop, num_outputs=N_data)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
fd = {
x_in: np.random.rand(2, N_data),
keep_prob: 0.5,
}
output = x_out.eval(feed_dict=fd)
When evaluating the output of the dropout layer, everything seems normal, but when the output from the dropout layer is linked to a second dense layer, I get the following error message:
Traceback (most recent call last):
File "/***/problem_alpha_dropout.py", line 14, in <module>
x_out = fully_connected(inputs=drop, num_outputs=N_data)
File "/***/anaconda3/envs/TensorFlow/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/***/anaconda3/envs/TensorFlow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1854, in fully_connected
outputs = layer.apply(inputs)
File "/***/anaconda3/envs/TensorFlow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 817, in apply
return self.__call__(inputs, *args, **kwargs)
File "/***/anaconda3/envs/TensorFlow/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 374, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/***/anaconda3/envs/TensorFlow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 730, in __call__
self._assert_input_compatibility(inputs)
File "/***/anaconda3/envs/TensorFlow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1465, in _assert_input_compatibility
self.name + ' is incompatible with the layer: '
ValueError: Input 0 of layer fully_connected_1 is incompatible with the layer: its rank is undefined, but the layer requires a defined rank.
This behaviour does not emerge when tf.contrib.nn.alpha_dropout is replaced by tf.nn.dropout (same usage).
Additional information:
TensorFlow version: 1.12.0 (GPU)
Python version: 3.6 (through Anaconda)
OS: Linux Mint
Just specify the shape of the keep_prob placeholder:
keep_prob = tf.placeholder(tf.float32, shape=())

Resources