Torch onnx.export error for instance norm - pytorch

I try to convert a pytorch model to the onnx format using torch.onnx.export().
Hower I get the following error:
.
.
.
File "C:\Users\Markus\miniconda3\envs\ic-move\lib\site-packages\torch\onnx\symbolic_opset9.py", line 1395, in instance_norm
raise RuntimeError("Unsupported: ONNX export of instance_norm for unknown "
RuntimeError: Unsupported: ONNX export of instance_norm for unknown channel size.
this is how I call the function:
torch.onnx.export(net, # model being run
x, # model input (or a tuple for multiple inputs)
ONNX_PATH, # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=12, # the ONNX version to export the model to
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['input'], # the model's input names
output_names = ['output'], # the model's output names
dynamic_axes={'input' : {0 : 'batch_size'}, # variable length axes
'output' : {0 : 'batch_size'}})
and here is an example of how I call the instance norm in my model:
class Residual(nn.Module):
def __init__(self, inp_dim, out_dim):
super(Residual, self).__init__()
self.relu = nn.ReLU()
self.bn1 = nn.InstanceNorm2d(inp_dim)
self.conv1 = Conv(inp_dim, int(out_dim/2), 1, relu=False)
self.bn2 = nn.InstanceNorm2d(int(out_dim/2))
self.conv2 = Conv(int(out_dim/2), int(out_dim/2), 3, relu=False)
self.bn3 = nn.InstanceNorm2d(int(out_dim/2))
self.conv3 = Conv(int(out_dim/2), out_dim, 1, relu=False)
self.skip_layer = Conv(inp_dim, out_dim, 1, relu=False)
if inp_dim == out_dim:
self.need_skip = False
else:
self.need_skip = True
With the input_dim = 64. Aren't this the channels I set?

Related

Onnx RuntimeError NOT_IMPLEMENTED Trilu

This model works in PyTorch however, after exporting it with PyTorch to Onnx format, the onnx runtime crashes with a 'Trilu NOT_IMPLEMENTED error' when loading it in. (I do not have this issue for my other models that use torch.tril() )
How do I make this model run in the Onnxruntime?
This is a visualisation of the Onnx graph of the Model.
The Model in PyTorch
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
def forward(self, item_seq):
attention_mask = item_seq < 100
tril_mask = torch.tril(attention_mask)
query_layer = torch.rand((1, 2, 2, 32))
key_layer = torch.rand((1, 2, 32, 2))
attention_scores = torch.matmul(query_layer, key_layer)
return attention_scores + tril_mask
model = MyModel()
model.eval()
x_train = torch.ones([1, 2], dtype=torch.long)
# demonstrate that eager works
print(model.forward(x_train))
bigmodel_onnx_filename = 'mymodel.onnx'
torch.onnx.export(
model,
x_train,
bigmodel_onnx_filename,
input_names=['x'],
output_names=['output'],
)
onnx.load(bigmodel_onnx_filename)
# Onnxruntime crashes when loading in the model
ort_sess = ort.InferenceSession(bigmodel_onnx_filename, providers=['CPUExecutionProvider'])
key = {'x': x_train.numpy()}
print(ort_sess.run(None, key))
This results in the following error for ort.InferenceSession():
NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for Trilu(14) node with name '/net/Trilu'
How can I make this model run in the Onnxruntime?
[github: code to reproduce the error and the model.onnx file]
(https://github.com/bkersbergen/pytorch_onnx_runtime_error/blob/main/main.py)
I'm using python 3.9, these are the project requirements
torch==1.13.1
jupyter==1.0.0
onnxruntime==1.13.1
onnx==1.13.0
Torch nightly version 2.0.0.dev20230205 gave the same error
I then decided to implement my own tril function.
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
def forward(self, item_seq):
attention_mask = item_seq < 100
tril_mask = self.my_tril(attention_mask)
query_layer = torch.rand((1, 2, 2, 32))
key_layer = torch.rand((1, 2, 32, 2))
attention_scores = torch.matmul(query_layer, key_layer)
return attention_scores + tril_mask
def my_tril(self, x):
l = x.size(-1)
arange = torch.arange(l)
mask = arange.expand(l, l)
arange = arange.unsqueeze(-1)
mask = torch.le(mask, arange)
return x.masked_fill(mask == 0, 0)
but then I get a Where(9) node with name '/Where_1' NOT_IMPLEMENTED error. (?!)
The boolean output of torch.lt() as input for torch.tril() works with PyTorch's Eager and LIT mode. However it breaks the Onnx runtime with the "TRILU not implemented error".
I was able to work around it by casting the torch.tril() input to float():
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
def forward(self, item_seq):
attention_mask = torch.lt(item_seq, 100).float()
tril_mask = torch.tril(attention_mask)
query_layer = torch.rand((1, 2, 2, 32))
key_layer = torch.rand((1, 2, 32, 2))
attention_scores = torch.matmul(query_layer, key_layer)
return attention_scores + tril_mask
Based on this experience, my hypothesis is that the TRILU NOT_IMPLEMENTED error is only applicable when having BOOLEAN Tensors as input. The Onnxruntime then throws this generic TRILU NOT_IMPLEMENTED error making me believe that Onnx has no TRILU support at all, which is clearly not the case.

Keras: saving model defined as a class raises NotImplementedError

I am writing this post after reading similar questions and answers that didn't work in my case. You may notice that I defined the input shape in the first layer.
I created a very small CNN in Keras, as follows:
import tensorflow as tf
class MyNet(tf.keras.Model):
def __init__(self):
super(MyNet, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(32, 5, strides = (2,2), data_format = 'channels_first', input_shape = (3,224,224))
self.bn1 = tf.keras.layers.BatchNormalization(axis = 1)
self.fc1 = tf.keras.layers.Dense(10)
self.globalavg = tf.keras.layers.GlobalAveragePooling2D(data_format = 'channels_first')
def call(self, inputs):
x = self.conv1(inputs)
x = self.bn1(x)
x = tf.keras.activations.relu(x)
x = self.globalavg(x)
return self.fc1(x)
Then I fed something into it and printed the result successfully (the weights are probably random at the moment, but that's ok):
image = tf.ones(shape = (1, 3, 224, 224)) # Defined "channels first" when created the layers
mynet = MyNet()
outputs = mynet(image)
print(tf.keras.backend.eval(outputs))
The result I saw at this step was the 10 outputs of the fc1 layer:
[[-1.1747773 -0.21640654 -0.16266493 -0.44879064 -0.642066 0.78132695 -0.03920581 -0.30874395 -0.04169023 -0.10409291]]
Then I tried to save the model with its weights, by calling mynet.save('mynet.hdf5'), and got the following error:
NotImplementedError: Currently `save` requires model to be a graph network. Consider using `save_weights`, in order to save the weights of the model.
Note that I am new to Keras and that most of my experience is with PyTorch.
What am I doing wrong?
Update:
Following #ikibir's answer, I redefined the network as a sequential network:
myNetAsSeq = tf.keras.models.Sequential()
myNetAsSeq.add(tf.keras.layers.Conv2D(32, 5, strides = (2,2), data_format = 'channels_first', input_shape = (3,224,224)))
myNetAsSeq.add(tf.keras.layers.BatchNormalization(axis = 1))
myNetAsSeq.add(tf.keras.layers.Activation('relu'))
myNetAsSeq.add(tf.keras.layers.GlobalAveragePooling2D(data_format = 'channels_first'))
myNetAsSeq.add(tf.keras.layers.Dense(10))
This time calling myNetAsSeq.save('mynet.hdf5') succeeded.
I am not sure about my answer but i believe you don't create a model you are just creating each layer individually, when you run 'call' function you just pass the variables to this layers.
In keras you should use
model = models.Sequential()
for create model and you should use
model.add()
to add layers
then you can save this model

Tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable

I am trying to replicate the code from here and apply the bert model to another dataset but after I create my own test and train I stumble upon this problem.
here's my full file
import tensorflow as tf
import pandas as pd
import tensorflow_hub as hub
import os
import json
import re
import numpy as np
from bert.tokenization import FullTokenizer
from tqdm import tqdm
#from keras.backend.tensorflow_backend import set_session
import keras.backend as K
#To make tf 2.0 compatible with tf1.0 code, we disable the tf2.0 functionalities
tf.compat.v1.disable_eager_execution()
# Initialize session
sess = tf.compat.v1.Session()
# Params for bert model and tokenization
bert_path = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
max_seq_length = 1024
#Load all files from a directory in a DataFrame.
def load_dataset(directory):
data = {}
data["text"] = []
data["label"] = []
with open(directory) as json_file:
temp = json.load(json_file)
for p in temp['Outputs']:
data["text"].append(p["text"])
data["label"].append(p["class"])
return pd.DataFrame.from_dict(data)
class PaddingInputExample(object):
"""Fake example so the num input examples is a multiple of the batch size.
When running eval/predict on the TPU, we need to pad the number of examples
to be a multiple of the batch size, because the TPU requires a fixed batch
size. The alternative is to drop the last batch, which is bad because it means
the entire output data won't be generated.
We use this class instead of `None` because treating `None` as padding
battches could cause silent errors.
"""
class InputExample(object):
"""A single training/test example for simple sequence classification."""
def __init__(self, guid, text_a, text_b=None, label=None):
"""Constructs a InputExample.
Args:
guid: Unique id for the example.
text_a: string. The untokenized text of the first sequence. For single
sequence tasks, only this sequence must be specified.
text_b: (Optional) string. The untokenized text of the second sequence.
Only must be specified for sequence pair tasks.
label: (Optional) string. The label of the example. This should be
specified for train and dev examples, but not for test examples.
"""
self.guid = guid
self.text_a = text_a
self.text_b = text_b
self.label = label
def create_tokenizer_from_hub_module(bert_path):
"""Get the vocab file and casing info from the Hub module."""
bert_module = hub.Module(bert_path)
tokenization_info = bert_module(signature="tokenization_info", as_dict=True)
vocab_file, do_lower_case = sess.run(
[
tokenization_info["vocab_file"],
tokenization_info["do_lower_case"],
]
)
return FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)
def convert_single_example(tokenizer, example, max_seq_length=256):
"""Converts a single `InputExample` into a single `InputFeatures`."""
if isinstance(example, PaddingInputExample):
input_ids = [0] * max_seq_length
input_mask = [0] * max_seq_length
segment_ids = [0] * max_seq_length
label = 0
return input_ids, input_mask, segment_ids, label
tokens_a = tokenizer.tokenize(example.text_a)
if len(tokens_a) > max_seq_length - 2:
tokens_a = tokens_a[0 : (max_seq_length - 2)]
tokens = []
segment_ids = []
tokens.append("[CLS]")
segment_ids.append(0)
for token in tokens_a:
tokens.append(token)
segment_ids.append(0)
tokens.append("[SEP]")
segment_ids.append(0)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
# The mask has 1 for real tokens and 0 for padding tokens. Only real
# tokens are attended to.
input_mask = [1] * len(input_ids)
# Zero-pad up to the sequence length.
while len(input_ids) < max_seq_length:
input_ids.append(0)
input_mask.append(0)
segment_ids.append(0)
assert len(input_ids) == max_seq_length
assert len(input_mask) == max_seq_length
assert len(segment_ids) == max_seq_length
return input_ids, input_mask, segment_ids, example.label
def convert_examples_to_features(tokenizer, examples, max_seq_length=256):
"""Convert a set of `InputExample`s to a list of `InputFeatures`."""
input_ids, input_masks, segment_ids, labels = [], [], [], []
for example in tqdm(examples, desc="Converting examples to features"):
input_id, input_mask, segment_id, label = convert_single_example(
tokenizer, example, max_seq_length
)
input_ids.append(input_id)
input_masks.append(input_mask)
segment_ids.append(segment_id)
labels.append(label)
return (
np.array(input_ids),
np.array(input_masks),
np.array(segment_ids),
np.array(labels).reshape(-1, 1),
)
def convert_text_to_examples(texts, labels):
"""Create InputExamples"""
InputExamples = []
for text, label in zip(texts, labels):
InputExamples.append(
InputExample(guid=None, text_a=" ".join(text), text_b=None, label=label)
)
return InputExamples
class BertLayer(tf.keras.layers.Layer):
def __init__(
self,
n_fine_tune_layers=10,
pooling="mean",
bert_path="https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1",
**kwargs,
):
self.n_fine_tune_layers = n_fine_tune_layers
self.trainable = True
self.output_size = 768
self.pooling = pooling
self.bert_path = bert_path
if self.pooling not in ["first", "mean"]:
raise NameError(
f"Undefined pooling type (must be either first or mean, but is {self.pooling}"
)
super(BertLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.bert = hub.Module(
self.bert_path, trainable=self.trainable, name=f"{self.name}_module"
)
# Remove unused layers
trainable_vars = self.bert.variables
if self.pooling == "first":
trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name]
trainable_layers = ["pooler/dense"]
elif self.pooling == "mean":
trainable_vars = [
var
for var in trainable_vars
if not "/cls/" in var.name and not "/pooler/" in var.name
]
trainable_layers = []
else:
raise NameError(
f"Undefined pooling type (must be either first or mean, but is {self.pooling}"
)
# Select how many layers to fine tune
for i in range(self.n_fine_tune_layers):
trainable_layers.append(f"encoder/layer_{str(11 - i)}")
# Update trainable vars to contain only the specified layers
trainable_vars = [
var
for var in trainable_vars
if any([l in var.name for l in trainable_layers])
]
# Add to trainable weights
for var in trainable_vars:
self._trainable_weights.append(var)
for var in self.bert.variables:
if var not in self._trainable_weights:
self._non_trainable_weights.append(var)
super(BertLayer, self).build(input_shape)
def call(self, inputs):
inputs = [K.cast(x, dtype="int32") for x in inputs]
input_ids, input_mask, segment_ids = inputs
bert_inputs = dict(
input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids
)
if self.pooling == "first":
pooled = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)[
"pooled_output"
]
elif self.pooling == "mean":
result = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)[
"sequence_output"
]
mul_mask = lambda x, m: x * tf.expand_dims(m, axis=-1)
masked_reduce_mean = lambda x, m: tf.reduce_sum(mul_mask(x, m), axis=1) / (
tf.reduce_sum(m, axis=1, keepdims=True) + 1e-10)
input_mask = tf.cast(input_mask, tf.float32)
pooled = masked_reduce_mean(result, input_mask)
else:
raise NameError(f"Undefined pooling type (must be either first or mean, but is {self.pooling}")
return pooled
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_size)
# Build model
def build_model(max_seq_length):
in_id = tf.keras.layers.Input(shape=(max_seq_length,), name="input_ids")
in_mask = tf.keras.layers.Input(shape=(max_seq_length,), name="input_masks")
in_segment = tf.keras.layers.Input(shape=(max_seq_length,), name="segment_ids")
bert_inputs = [in_id, in_mask, in_segment]
bert_output = BertLayer(n_fine_tune_layers=3)
bert_output = (bert_output)(bert_inputs)
dense = tf.keras.layers.Dense(256, activation="relu")(bert_output)
pred = tf.keras.layers.Dense(1, activation="sigmoid")(dense)
model = tf.keras.models.Model(inputs=bert_inputs, outputs=pred)
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()
return model
def initialize_vars(sess):
sess.run(tf.compat.v1.local_variables_initializer())
sess.run(tf.compat.v1.global_variables_initializer())
sess.run(tf.compat.v1.tables_initializer())
K.tensorflow_backend.set_session(sess)
def main():
# Params for bert model and tokenization
bert_path = "https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
max_seq_length = 1024
train_df = load_dataset('ShuffledDatasetTrain.jsonl')
test_df = load_dataset('ShuffledDatasetTest.jsonl')
# Create datasets (Only take up to max_seq_length words for memory)
train_text = train_df["text"].tolist()
train_text = [" ".join(t.split()[0:max_seq_length]) for t in train_text]
train_text = np.array(train_text, dtype=object)[:, np.newaxis]
train_label = train_df["label"].tolist()
test_text = test_df["text"].tolist()
test_text = [" ".join(t.split()[0:max_seq_length]) for t in test_text]
test_text = np.array(test_text, dtype=object)[:, np.newaxis]
test_label = test_df["label"].tolist()
# Instantiate tokenizer
tokenizer = create_tokenizer_from_hub_module(bert_path)
# Convert data to InputExample format
train_examples = convert_text_to_examples(train_text, train_label)
test_examples = convert_text_to_examples(test_text, test_label)
# Convert to features
(
train_input_ids,
train_input_masks,
train_segment_ids,
train_labels,
) = convert_examples_to_features(
tokenizer, train_examples, max_seq_length=max_seq_length
)
(
test_input_ids,
test_input_masks,
test_segment_ids,
test_labels,
) = convert_examples_to_features(
tokenizer, test_examples, max_seq_length=max_seq_length
)
model = build_model(max_seq_length)
# Instantiate variables
initialize_vars(sess)
model.fit(
[train_input_ids, train_input_masks, train_segment_ids],
train_labels,
validation_data=(
[test_input_ids, test_input_masks, test_segment_ids],
test_labels,
),
epochs=1,
batch_size=8,
)
if __name__ == "__main__":
main()
and here's the error
Using TensorFlow backend.
Converting examples to features: 100%|██████████| 13000/13000 [03:32<00:00, 61.29it/s]
Converting examples to features: 100%|██████████| 2000/2000 [00:32<00:00, 61.83it/s]
WARNING:tensorflow:From C:\Users\Nitish_2\Miniconda3\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From C:\Users\Nitish_2\Miniconda3\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) [(None, 1024)] 0
__________________________________________________________________________________________________
input_masks (InputLayer) [(None, 1024)] 0
__________________________________________________________________________________________________
segment_ids (InputLayer) [(None, 1024)] 0
__________________________________________________________________________________________________
bert_layer (BertLayer) (None, 768) 110104890 input_ids[0][0]
input_masks[0][0]
segment_ids[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 256) 196864 bert_layer[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 1) 257 dense[0][0]
==================================================================================================
Total params: 110,302,011
Trainable params: 21,460,737
Non-trainable params: 88,841,274
__________________________________________________________________________________________________
Train on 13000 samples, validate on 2000 samples
2019-12-30 00:45:54.780164: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at resource_variable_ops.cc:660 : Not found: Resource localhost/bert_layer_module/bert/embeddings/word_embeddings/class tensorflow::Var does not exist.
Traceback (most recent call last):
File "C:/Users/Nitish_2/PycharmProjects/GPT-detection/Model.py", line 323, in <module>
main()
File "C:/Users/Nitish_2/PycharmProjects/GPT-detection/Model.py", line 319, in main
batch_size=8,
File "C:\Users\Nitish_2\Miniconda3\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 728, in fit
use_multiprocessing=use_multiprocessing)
File "C:\Users\Nitish_2\Miniconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 674, in fit
steps_name='steps_per_epoch')
File "C:\Users\Nitish_2\Miniconda3\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 393, in model_iteration
batch_outs = f(ins_batch)
File "C:\Users\Nitish_2\Miniconda3\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3580, in __call__
run_metadata=self.run_metadata)
File "C:\Users\Nitish_2\Miniconda3\lib\site-packages\tensorflow_core\python\client\session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable bert_layer_module/bert/encoder/layer_10/attention/self/query/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/bert_layer_module/bert/encoder/layer_10/attention/self/query/kernel/class tensorflow::Var does not exist.
[[{{node bert_layer/bert_layer_module_apply_tokens/bert/encoder/layer_10/attention/self/query/MatMul/ReadVariableOp}}]]
any insight as to why this is happening would be appreciated, and please tell me if I didn't provide the appropriate info, this is my first time asking a question here.

Is loading in eager TensorFlow broken right now?

Weights in classes inheriting from tf.keras.Model seem unable to load at the moment. I am unable to load the weights from Example() outside of the class using checkpointing, so I tried to do it within, which by all accounts should work. Its able to save the weights, as it can when just saving Example(), however it still can't load them. This is my model code:
class Example(tf.keras.Model):
def __init__(self, cfg):
super(Example, self).__init__()
self.model = tf.keras.Sequential([
........layers.......
])
# Create saver
self.save_path = cfg.save_dir + cfg.extension
self.ckpt_prefix = self.save_path + '/ckpt'
self.saver = tf.train.Checkpoint(model=self.model)
def call(self, x_in):
x_out = self.model(x_in)
return x_out
def save(self):
self.saver.save(file_prefix=self.ckpt_prefix)
def load(self):
self.saver.restore(tf.train.latest_checkpoint(self.save_path))
And this is what I use to check if it loads:
example = Example()
if Path(self.example.save_path).is_dir():
print(self.example.weights)
print(self.example.model.weights)
self.example.load()
print(self.example.weights)
print(self.example.model.weights)
Output:
[]
[]
[]
[]
This was tested on both tensorflow 1.3 and 2.0, and I can confirm that the weights are not empty after the first batch, as well as that it is checkpointing/saving.
As it turns out, there are three different ways TensorFlow does checkpointing, depending on what is being checkpointed.
The checkpointed object is just a variable. This is restored immediately upon calling checkpoint.restore(tf.train.latest_checkpoint(checkpoint_path)).
The checkpointed object is a model with input shape defined. This is also restored immediately.
The checkpointed object is a model without input shape defined. This is where the behaviour changes, as TensorFlow does a "delayed" restore, and will NOT restore the model weights until input is passed to the model.
Here is an example:
import os
import tensorflow as tf
import numpy as np
# Disable logging
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.logging.set_verbosity(tf.logging.ERROR)
tf.enable_eager_execution()
# Create model
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(256, 3, padding="same"),
tf.keras.layers.Conv2D(3, 3, padding="same")
])
print("Are weights empty before training?", model.weights == [])
# Create optim, checkpoint
optimizer = tf.train.AdamOptimizer(0.001)
checkpoint = tf.train.Checkpoint(model=model)
# Make fake data
img = np.random.uniform(0, 255, (1, 32, 32, 3)).astype(np.float32)
truth = np.random.uniform(0, 255, (1, 32, 32, 3)).astype(np.float32)
# Train
with tf.GradientTape() as tape:
logits = model(img)
loss = tf.losses.mean_squared_error(truth, logits)
# Compute/apply gradients
grads = tape.gradient(loss, model.trainable_weights)
grads_and_vars = zip(grads, model.trainable_weights)
optimizer.apply_gradients(grads_and_vars)
# Save model
checkpoint_path = './ckpt/'
checkpoint.save('./ckpt/')
# Check if weights update
print("Are weights empty after training?", model.weights == [])
# Reset model
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(256, 3, padding="same"),
tf.keras.layers.Conv2D(3, 3, padding="same")
])
print("Are weights empty when resetting model?", model.weights == [])
# Update checkpoint pointer
checkpoint = tf.train.Checkpoint(model=model)
# Restore values from the checkpoint
status = checkpoint.restore(tf.train.latest_checkpoint(checkpoint_path))
# This next line is REQUIRED to restore
#model(img)
print("Are weights empty after restoring from checkpoint?", model.weights == [])
print(status)
status.assert_existing_objects_matched()
status.assert_consumed()
With output:
Are weights empty before training? True
Are weights empty after training? False
Are weights empty when resetting model? True
Are weights empty after restoring from checkpoint? True
<tensorflow.python.training.checkpointable.util.CheckpointLoadStatus object at 0x7f6256b4ddd8>
Traceback (most recent call last):
File "test.py", line 58, in <module>
status.assert_consumed()
File "/home/jpatts/.local/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/util.py", line 1013, in assert_consumed
raise AssertionError("Unresolved object in checkpoint: %s" % (node,))
AssertionError: Unresolved object in checkpoint: attributes {
name: "VARIABLE_VALUE"
full_name: "sequential/conv2d/kernel"
checkpoint_key: "model/layer-0/kernel/.ATTRIBUTES/VARIABLE_VALUE"
}
However, uncommenting the line model(img) will produce the following output:
Are weights empty before training? True
Are weights empty after training? False
Are weights empty when resetting model? True
Are weights empty after restoring from checkpoint? False
<tensorflow.python.training.checkpointable.util.CheckpointLoadStatus object at 0x7ff62320fe48>
So input data needs to be passed to properly restore a shape invariant model.
References:
https://www.tensorflow.org/alpha/guide/checkpoints#delayed_restorations
https://github.com/tensorflow/tensorflow/issues/27937

Keras pre-trained model switching to varying input size

Very much similar to this question except I am wondering how I could take my pre-trained model which had an input size of (128, 128, 3) images, keep its weights, and use it to predict on images of varying input size.
I get this, as it is, when I try to input an image of arbitrary size:
Traceback (most recent call last):
File "arg_test.py", line 127, in <module>
predict(args)
File "arg_test.py", line 71, in predict
predictions.append(model.predict(input_img)[0]) # returns a list of lists, one for each image in the batch
File "C:\Users\payne\Anaconda3\envs\ml-gpu\lib\site-packages\keras\engine\training.py", line 1147, in predict
x, _, _ = self._standardize_user_data(x)
File "C:\Users\payne\Anaconda3\envs\ml-gpu\lib\site-packages\keras\engine\training.py", line 749, in _standardize_user_data
exception_prefix='input')
File "C:\Users\payne\Anaconda3\envs\ml-gpu\lib\site-packages\keras\engine\training_utils.py", line 137, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected input_1 to have shape (128, 128, 3) but got array with shape (2736, 3648, 3)
Here is my model:
def setUpModel(x_train, y_train):
filters = 256
kernel_size = 3
strides = 1
# Head module
input = Input(shape=(img_height//scale_fact, img_width//scale_fact, img_depth))
conv0 = Conv2D(filters, kernel_size, strides=strides, padding='same')(input)
# Body module
res = Conv2D(filters, kernel_size, strides=strides, padding='same')(conv0)
act = ReLU()(res)
res = Conv2D(filters, kernel_size, strides=strides, padding='same')(act)
res_rec = Add()([conv0, res])
for i in range(res_blocks):
res1 = Conv2D(filters, kernel_size, strides=strides, padding='same')(res_rec)
act = ReLU()(res1)
res2 = Conv2D(filters, kernel_size, strides=strides, padding='same')(act)
res_rec = Add()([res_rec, res2])
conv = Conv2D(filters, kernel_size, strides=strides, padding='same')(res_rec)
add = Add()([conv0, conv])
# Tail module
conv = Conv2D(filters, kernel_size, strides=strides, padding='same')(add)
act = ReLU()(conv)
up = UpSampling2D(size=scale_fact if scale_fact != 4 else 2)(act) # TODO: try "Conv2DTranspose"
# mul = Multiply([np.zeros((img_width,img_height,img_depth)).fill(0.1), up])(up)
# When it's a 4X factor, we want the upscale split in two procedures
if(scale_fact == 4):
conv = Conv2D(filters, kernel_size, strides=strides, padding='same')(up)
act = ReLU()(conv)
up = UpSampling2D(size=2)(act) # TODO: try "Conv2DTranspose"
output = Conv2D(filters=3,
kernel_size=1,
strides=1,
padding='same')(up)
model = Model(inputs=input, outputs=output)
This was only the architecture of the model that was used during training, but tha training is behind: I have my model.h5 file obtained through model.save().
Here is how I get predictions:
import argparse
import numpy as np
import matplotlib.pyplot as plt
import skimage.io
from keras.models import load_model
from keras.optimizers import Adam
from keras.optimizers import Adadelta
from constants import save_dir
from constants import model_name
from constants import crops_p_img
from constants import tests_path
from constants import img_height
from constants import img_width
from constants import scale_fact
from utils import float_im
from utils import crop_center
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('-a', '--amount', type=int, default=crops_p_img,
help='how many (cropped to 128x128) samples to predict from within the image')
parser.add_argument('image', type=str,
help='image name (example: "bird.png") that must be inside the "./input/" folder')
parser.add_argument('-m', '--model', type=str, default=model_name,
help='model name (in the "./save/" folder), followed by ".h5"')
parser.add_argument('-r', '--random', action="store_true", # if var is in args, set to TRUE, else, set to FALSE
help='flag that will select a random 128x128 area in the input image instead of the center')
parser.add_argument('-f', '--full', action="store_true", # if var is in args, set to TRUE, else, set to FALSE
help='(WIP) flag that will get the whole image to be processed by the network')
args = parser.parse_args()
def predict(args):
model = load_model(save_dir + '/' + args.model)
# Setting up the proper optimizer TODO: needed?
if args.model == "my_full_model.h5":
optimizer = Adadelta(lr=1.0,
rho=0.95,
epsilon=None,
decay=0.0)
else:
optimizer = Adam(lr=0.001,
beta_1=0.9,
beta_2=0.999,
epsilon=None,
decay=0.0,
amsgrad=False)
model.compile(optimizer=optimizer,
loss='mean_squared_error')
image = skimage.io.imread(tests_path + args.image)
if image.shape[0] == 128:
args.amount = 1
predictions = []
images = []
# TODO: integrate FULL IMAGE
# if args.full:
# images.append(image)
# # Hack because GPU can only handle one image at a time
# input_img = (np.expand_dims(images[0], 0)) # Add the image to a batch where it's the only member
# predictions.append(model.predict(input_img)[0]) # returns a list of lists, one for each image in the batch
# else:
if True:
for i in range(args.amount):
# Cropping to fit input size
if (args.random or args.amount > 1) and image.shape[0] > 128:
images.append(random_crop(image))
else:
images.append(crop_center(image, img_width//scale_fact, img_height//scale_fact))
input_img = (np.expand_dims(images[i], 0))
predictions.append(model.predict(input_img)[0])
for i in range(len(predictions)):
show_pred_output(images[i], predictions[i])
# adapted from: https://stackoverflow.com/a/52463034/9768291
def random_crop(img):
crop_h, crop_w = img_width//scale_fact, img_height//scale_fact
print("Shape of input image to crop:", img.shape[0], img.shape[1])
if (img.shape[0] >= crop_h) and (img.shape[1] >= crop_w):
# Cropping a random part of the image
rand_h = np.random.randint(0, img.shape[0]-crop_h)
rand_w = np.random.randint(0, img.shape[1]-crop_w)
print("Random position for the crop:", rand_h, rand_w)
tmp_img = img[rand_h:rand_h+crop_h, rand_w:rand_w+crop_w]
new_img = float_im(tmp_img) # From [0,255] to [0.,1.]
else:
return img
return new_img
def show_pred_output(input, pred):
plt.figure(figsize=(20, 20))
plt.suptitle("Results")
plt.subplot(1, 2, 1)
plt.title("Input: 128x128")
plt.imshow(input, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.subplot(1, 2, 2)
plt.title("Output: 512x512")
plt.imshow(pred, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.show()
if __name__ == '__main__':
print(" - ", args)
predict(args)
You should replace this line:
input = Input(shape=(None, None, img_depth))
None in an shape means variable size. Since the model is just convolutions it should work with images of any size.
After training your model with a specific input shape you can save the trained model weights by using model.save_weights() and then assign those weights to the model that has unknown shape by using model1.load_weights().
For example, i have trained the model with input-shape (28,28,1)
model=keras.Sequential([
keras.Input(shape=(28,28,1)),
keras.layers.Conv2D(32,kernel_size=(3,3),activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2,2)),
keras.layers.Conv2D(64,kernel_size=(3,3),activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2,2)),
keras.layers.GlobalAveragePooling2D(),
keras.layers.Dropout(0.5),
keras.layers.Dense(10,activation='softmax')
])
After training you can save model weights by
model.save_weights('model-weights')
Then define a model with unknown input shape
model2=keras.Sequential([
keras.Input(shape=(None,None,1)),
keras.layers.Conv2D(32,kernel_size=(3,3),activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2,2)),
keras.layers.Conv2D(64,kernel_size=(3,3),activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2,2)),
keras.layers.GlobalAveragePooling2D(),
keras.layers.Dropout(0.5),
keras.layers.Dense(10,activation='softmax')
])
Then assign the saved weights by
model2.load_weights('/content/model-weights')
Now you can predict with model 2 without training it. For more details, please refer to this gist. Thank You!

Resources