I am trying to load a tensorflow meta graph from a saved checkpoint using Tensorflow version 1.15 to convert it to a SavedModel for tensorflow serving. It is a Speech Recognition Model with Local attention and unidirectional LSTM implemented using the Returnn Toolkit with Tensorflow Backend. I am using the following code.
import tensorflow as tf
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
import sys
if len(sys.argv)!=2:
print("Usage:" + sys.argv[0] + "save_dir")
exit(1)
export_dir=sys.argv[1]
builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
sigs={}
with tf.Session(graph=tf.Graph()) as sess:
new_saver=tf.train.import_meta_graph("./serv_test/model.238.meta")
new_saver.restore(sess, tf.train.latest_checkpoint("./serv_test"))
graph=tf.get_default_graph()
input_audio=graph.get_tensor_by_name('inference/default/wav:0')
output_hyps=graph.get_tensor_by_name('inference/default/Reshape_7:0')
sigs[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = tf.saved_model.signature_def_utils.predict_signature_def({"in":input_audio},{"out":output_hyps})
builder.add_meta_graph_and_variables(sess, [tag_constants.SERVING], signature_def_map=sigs,)
builder.save()
But I am getting the following error in the import_meta_graph line:
Traceback (most recent call last):
File "xport.py", line 16, in <module>
new_saver=tf.train.import_meta_graph("./serv_test/model.238.meta")
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1453, in import_meta_graph
**kwargs)[0]
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered
'NativeLstm2' in binary running on ip-10-1-21-241. Make sure the Op and Kernel
are registered in the binary running in this process. Note that if you are loading a
saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler`
should be done before importing the graph, as contrib ops are lazily registered when
the module is first accessed.
Is there any way to get around this error? Is it because of the custom built layers used in Returnn? Is there any way to make a Returnn Model tensorflow servable?
Thanks.
You should remove the graph=tf.Graph(), otherwise your import_meta_graph will import it into the wrong graph.
Just see some official TF examples how to use import_meta_graph.
Related
I'm using Jupyter to learn and practice machine learning. I created a Pipeline object with many classes from Scikit Learn and custom classes that I wrote. After that I saved this Pipeline object in a file 'classif_pipeline.pkl.z' using
joblib.dump(pipeline, 'classif_pipeline.pkl.z').
The problem is when I try to load this file in a different computer I get the error message bellow.
Code first:
import joblib
full_pipeline = joblib.load('classif_pipeline.pkl.z')
Error message. Also, I have the same version of Scikit Learn and joblib in this pc too.
Traceback (most recent call last):
File "/media/backup/programming/python/jupyter/classification/main.py", line 3, in <module>
full_pipeline = joblib.load('classif_pipeline.pkl.z')
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
obj = unpickler.load()
File "/usr/lib/python3.10/pickle.py", line 1213, in load
dispatch[key[0]](self)
File "/usr/lib/python3.10/pickle.py", line 1538, in load_stack_global
self.append(self.find_class(module, name))
File "/usr/lib/python3.10/pickle.py", line 1582, in find_class
return _getattribute(sys.modules[module], name)[0]
File "/usr/lib/python3.10/pickle.py", line 331, in _getattribute
raise AttributeError("Can't get attribute {!r} on {!r}"
AttributeError: Can't get attribute 'DependentsImputer' on <module '__main__' from '/media/backup/programming/python/jupyter/classification/main.py'>
DependentsImputer is one of the many other classes I implemented in the Jupyter notebook.
How can I load this file?
I am following this tutorial. All my tensorflow and CUDA setups are complete and tested to be fully working.
My test of the setup
import torch
import tensorflow as tf
import tensorflow.keras as ks
print(tf)
print(ks)
print(torch.cuda.is_available())
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
X_train = torch.FloatTensor([0., 1., 2.])
X_train = X_train.to(device)
print(X_train.is_cuda)
print(torch.cuda.current_device())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))
Gives
<module 'tensorflow' from 'C:\\Venv\\time_series_forecast\\lib\\site-packages\\tensorflow\\__init__.py'>
<module 'tensorflow.keras' from 'C:\\Venv\\time_series_forecast\\lib\\site-packages\\keras\\api\\_v2\\keras\\__init__.py'>
True
True
0
1
NVIDIA GeForce GTX 1070 Ti
In my main.py file, I have these imports
import torch
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
from tensorflow.keras.models import load_model
when I try to load the saved classifier, I am getting error in these But the following lines with load_model gives an error
classifier_path = '../saved_classifiers/bury_pnas_21/len500/best_model_1_1_len500.pkl'
classifier = load_model(classifier_path)
I am getting this error
Traceback (most recent call last):
File "G:\My Drive\Working_Dir\\project_and_initiative\forecast\code_v8\_test_p9.py", line 46, in <module>
classifier = load_model(final_classifier_path)
File "C:\Venv\time_series_forecast\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Venv\time_series_forecast\lib\site-packages\tensorflow\python\saved_model\load.py", line 948, in load_partial
raise FileNotFoundError(
FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for G:\My Drive\Working_Dir\project_and_initiative\forecast\code_v8\saved_dl_classifiersv\bury_pnas_21\len500\best_model_1_1_len500.pkl\variables\variables
You may be trying to load on a different device from the computational device. Consider setting the `experimental_io_device` option in `tf.saved_model.LoadOptions` to the io_device such as '/job:localhost'.
Process finished with exit code 1
I downloaded the .pkl from here including its content from the variables folder, and placed it the same folder structure as suggested in the tutorial. What is the cause and what should I try?
I'm trying to use the pretrained tf-hub elmo model by integrating it into a keras layer.
Keras Layer:
class ElmoEmbeddingLayer(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(ElmoEmbeddingLayer, self).__init__(**kwargs)
self.dimensions = 1024
self.trainable = True
self.elmo = None
def build(self, input_shape):
url = 'https://tfhub.dev/google/elmo/2'
self.elmo = hub.Module(url)
self._trainable_weights += trainable_variables(
scope="^{}_module/.*".format(self.name))
super(ElmoEmbeddingLayer, self).build(input_shape)
def call(self, x, mask=None):
result = self.elmo(
x,
signature="default",
as_dict=True)["elmo"]
return result
def compute_output_shape(self, input_shape):
return input_shape[0], self.dimensions
When I run the code I get the following error:
Traceback (most recent call last):
File "D:/Google Drive/Licenta/Gemini/Emotion Analysis/nn/trainer/model.py", line 170, in <module>
validation_steps=validation_dataset.size())
File "D:/Google Drive/Licenta/Gemini/Emotion Analysis/nn/trainer/model.py", line 79, in train_gpu
model = build_model(self.config, self.embeddings, self.sequence_len, self.out_classes, summary=True)
File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\models.py", line 8, in build_model
return my_model(embeddings, config, sequence_length, out_classes, summary)
File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\models.py", line 66, in my_model
inputs, embedding = resolve_inputs(embeddings, sequence_length, model_config, input_type)
File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\models.py", line 19, in resolve_inputs
return elmo_input(model_conf)
File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\models.py", line 58, in elmo_input
embedding = ElmoEmbeddingLayer()(input_text)
File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 616, in __call__
self._maybe_build(inputs)
File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1966, in _maybe_build
self.build(input_shapes)
File "D:\Google Drive\Licenta\Gemini\Emotion Analysis\nn\architectures\custom_layers.py", line 21, in build
self.elmo = hub.Module(url)
File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow_hub\module.py", line 156, in __init__
abs_state_scope = _try_get_state_scope(name, mark_name_scope_used=False)
File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow_hub\module.py", line 389, in _try_get_state_scope
"name_scope was already taken." % abs_state_scope)
RuntimeError: variable_scope module/ was unused but the corresponding name_scope was already taken.
It seems to be due to the eager execution behaviour. If I disable eager execution I have to surround the model.fit function within a tensorflow session and initialize the variables by using sess.run(global_variables_initializer()) to avoid the next error:
Traceback (most recent call last):
File "D:/Google Drive/Licenta/Gemini/Emotion Analysis/nn/trainer/model.py", line 168, in <module>
validation_steps=validation_dataset.size().eval(session=Session()))
File "D:/Google Drive/Licenta/Gemini/Emotion Analysis/nn/trainer/model.py", line 90, in train_gpu
class_weight=weighted)
File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\training.py", line 643, in fit
use_multiprocessing=use_multiprocessing)
File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 664, in fit
steps_name='steps_per_epoch')
File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py", line 294, in model_iteration
batch_outs = f(actual_inputs)
File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\keras\backend.py", line 3353, in __call__
run_metadata=self.run_metadata)
File "D:\Apps\Anaconda\envs\tf2.0\lib\site-packages\tensorflow\python\client\session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
(0) Failed precondition: Error while reading resource variable module/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/module/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias/class tensorflow::Var does not exist.
[[{{node elmo_embedding_layer/module_apply_default/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias/Read/ReadVariableOp}}]]
(1) Failed precondition: Error while reading resource variable module/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/module/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias/class tensorflow::Var does not exist.
[[{{node elmo_embedding_layer/module_apply_default/bilm/RNN_0/RNN/MultiRNNCell/Cell1/rnn/lstm_cell/bias/Read/ReadVariableOp}}]]
[[metrics/f1_micro/Identity/_223]]
0 successful operations.
0 derived errors ignored.
My solution:
with Session() as sess:
sess.run(global_variables_initializer())
history = model.fit(self.train_data.repeat(),
epochs=self.config['epochs'],
validation_data=self.validation_data.repeat(),
steps_per_epoch=steps_per_epoch,
validation_steps=validation_steps,
callbacks=self.__callbacks(monitor_metric),
class_weight=weighted)
The main question is if there is another way to use elmo tf-hub module in a keras custom layer and train my model. Another question is if my current solution is not affecting the training performances or give the OOM GPU error (I get the OOM error after a few epochs with a higher batch size, which I've found to be related to sessions not closed or memory leaks).
If you wrap your model in Session() field, you will also have to wrap all another code that uses your model in Session() field. It takes a lot times and efforts. I have another way to deal with it:
firstly, create a elmo module, add a session to keras:
elmo_model = hub.Module("https://tfhub.dev/google/elmo/3", trainable=True,
name='elmo_module')
sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
K.set_session(sess)
Instead of create elmo module directly in your ElmoEmbeddinglayer
self.elmo = hub.Module(url)
self._trainable_weights += trainable_variables(
scope="^{}_module/.*".format(self.name))
You can do the following, i think it works normally!
self.elmo = elmo_model
self._trainable_weights += trainable_variables(
scope="^elmo_module/.*")
Here is a simple solution that I used in my case:
That thing happened to me while I was using a separated python script to create the module.
To solve it I passed the tf.Session() in the main script to the tf.keras.backend in the other script by creating an entry point to pass it before calling the Layer.init
Example:
Main file:
import tensorflow.compat.v1 as tf
from ModuleFile import ModuleLayer
def __main__():
init_args = [...]
input = ...
sess= tf.keras.backend.get_session()
Module_layer.__init_session___(sess)
module_layer = ModuleLayer(init_args)(input)
Module file:
import tensorflow.compat.v1 as tf
class ModuleLayer(tf.keras.layers.Layer):
#staticmethod
def __init_session__(session):
tf.keras.backend.set_session(session)
def __init__(*args):
...
Hope that helps :)
I installed anaconda and all the packages I needed. I tried to run this mnist example from keras It works fine. Now I'm trying to visualize the Graph with keras manual. but it does not work.
When I run the mnist example I receive this error Message:
Traceback (most recent call last):
File "dummy.py", line 77, in <module>
plot_model(model, to_file='model.png')
File "C:\Users\***\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\utils\vis_utils.py", line 132, in plot_model
dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
File "C:\Users\***\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\utils\vis_utils.py", line 55, in model_to_dot
_check_pydot()
File "C:\Users\***\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\utils\vis_utils.py", line 26, in _check_pydot
pydot.Dot.create(pydot.Dot())
File "C:\Users\***\AppData\Local\Continuum\anaconda3\lib\site-packages\pydot.py", line 1885, in create
assert p.returncode == 0, p.returncode
AssertionError: 1
I also tried the convnet_drawer. It failed too.
Any Ideas?
Im trying to convert my Keras model (mobilenet + dence layers). The problem is that when I want to use coremltools for conversion I faced with the following problem:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-
packages/IPython/core/interactiveshell.py", line 3265, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-7905693382e5>", line 1, in <module>
coreml_model = coremltools.converters.keras.convert(loaded_model)
File "/usr/local/lib/python3.6/dist-packages/coremltools/converters/keras/_keras_converter.py", line 752, in convert
custom_conversion_functions=custom_conversion_functions)
File "/usr/local/lib/python3.6/dist-packages/coremltools/converters/keras/_keras_converter.py", line 550, in convertToSpec
custom_objects=custom_objects)
File "/usr/local/lib/python3.6/dist-packages/coremltools/converters/keras/_keras2_converter.py", line 206, in _convert
graph.build()
File "/usr/local/lib/python3.6/dist-packages/coremltools/converters/keras/_topology2.py", line 687, in build
self._remove_old_edges(layer)
File "/usr/local/lib/python3.6/dist-packages/coremltools/converters/keras/_topology2.py", line 429, in _remove_old_edges
self._remove_edge(layer, succ)
File "/usr/local/lib/python3.6/dist-packages/coremltools/converters/keras/_topology2.py", line 365, in _remove_edge
self.edge_map[src].remove(snk)
ValueError: list.remove(x): x not in list
Im trying to do this conversion by the following code:
js_file = open(args.ddir + args.mdl +'.json','r')
loaded_json_model = js_file.read()
js_file.close()
from keras.applications import mobilenet
from keras.utils.generic_utils import CustomObjectScope
from keras.models import model_from_json
with CustomObjectScope({'relu6': mobilenet.mobilenet.relu6}):
loaded_model = model_from_json(loaded_json_model)
loaded_model.load_weights(args.ddir + args.mdl + '.h5')
coreml_model = coremltools.converters.keras.convert(loaded_model,
input_names="image",
image_input_names="image"
)
I solved the problem by using proper version of the Keras which includes both Mobilenet (feature extractor) and at the same time "relu6". The only version (up to now) that worked for me is version "2.1.6". By this version I successfully did the conversion.
coremltools does not supporting some layers for the moment (including relu6). This issue can be handled by "CustomObjectScope" an it was shown in the provided code.
Note that, the network should be trained at this version (2.1.6) again.