Cannot save a Keras model, is this a bug? - keras

I'm trying to do a simple save of a resnet50 model and I'm getting an error. My code to reproduce the error:
from tensorflow import keras
import keras_resnet
inputs = keras.layers.Input(shape=(None, None, 3))
resnet = keras_resnet.models.ResNet50(inputs, include_top=False, freeze_bn=True)
resnet.save("my-model")
I get the error: "KeyError: 'inputs'". Is this a bug or is there something I'm missing with the keras save command? I tried the command on MacOS and in my ubuntu container. Same result.
EDIT: it's working with the official keras implementation of resnet. With this implementation though I have to change the code in resnet.py file of fizyr implementation of keras-retinanet. Specifically, having defined resnet with:
from keras.applications.resnet import ResNet50,ResNet101,ResNet152
resnet = ResNet50(input_tensor=inputs, include_top=False)
I have to change the code of the backbone layers from :
backbone_layers = {
'C2': resnet.outputs[0],
'C3': resnet.outputs[1],
'C4': resnet.outputs[2],
'C5': resnet.outputs[3]
}
to:
backbone_layers = {
'C2': resnet.layers[-137].output,
'C3': resnet.layers[-95].output,
'C4': resnet.layers[-33].output,
'C5': resnet.outputs[0]
}
I didn't test it yet but I think it should work.
The only caveat I see is that I don't have the freeze_bn parameter anymore. See https://github.com/fizyr/keras-retinanet/issues/974 for the reason of this parameter. I hope it will not adversely affect the training of my network.

You need to save the model with a format, e.g. h5.
I reproduced your error, fixed it with:
resnet.save("mymodel.h5")

Related

resize_token_embeddings on the a pertrained model with different embedding size

I would like to ask about the way to change the embedding size of the trained model.
I have a trained model models/BERT-pretrain-1-step-5000.pkl.
Now I am adding a new token [TRA]to the tokeniser and try to use the resize_token_embeddings to the pertained one.
from pytorch_pretrained_bert_inset import BertModel #BertTokenizer
from transformers import AutoTokenizer
from torch.nn.utils.rnn import pad_sequence
import tqdm
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model_bert = BertModel.from_pretrained('bert-base-uncased', state_dict=torch.load('models/BERT-pretrain-1-step-5000.pkl', map_location=torch.device('cpu')))
#print(tokenizer.all_special_tokens) #--> ['[UNK]', '[SEP]', '[PAD]', '[CLS]', '[MASK]']
#print(tokenizer.all_special_ids) #--> [100, 102, 0, 101, 103]
num_added_toks = tokenizer.add_tokens(['[TRA]'], special_tokens=True)
model_bert.resize_token_embeddings(len(tokenizer)) # --> Embedding(30523, 768)
print('[TRA] token id: ', tokenizer.convert_tokens_to_ids('[TRA]')) # --> 30522
But I encountered the error:
AttributeError: 'BertModel' object has no attribute 'resize_token_embeddings'
I assume that it is because the model_bert(BERT-pretrain-1-step-5000.pkl) I had has the different embedding size.
I would like to know if there is any way to fit the embedding size of my modified tokeniser and the model I would like to use as the initial weights.
Thanks a lot!!
resize_token_embeddings is a huggingface transformer method. You are using the BERTModel class from pytorch_pretrained_bert_inset which does not provide such a method. Looking at the code, it seems like they have copied the BERT code from huggingface some time ago.
You can either wait for an update from INSET (maybe create a github issue) or write your own code to extend the word_embedding layer:
from torch import nn
embedding_layer = model.embeddings.word_embeddings
old_num_tokens, old_embedding_dim = embedding_layer.weight.shape
num_new_tokens = 1
# Creating new embedding layer with more entries
new_embeddings = nn.Embedding(
old_num_tokens + num_new_tokens, old_embedding_dim
)
# Setting device and type accordingly
new_embeddings.to(
embedding_layer.weight.device,
dtype=embedding_layer.weight.dtype,
)
# Copying the old entries
new_embeddings.weight.data[:old_num_tokens, :] = embedding_layer.weight.data[
:old_num_tokens, :
]
model.embeddings.word_embeddings = new_embeddings

Cannot export PyTorch model to ONNX

I am trying to convert a pre-trained torch model to ONNX, but recive the following error:
RuntimeError: step!=1 is currently not supported
I'm trying this on a pre-trained colorization model: https://github.com/richzhang/colorization
Here is the code I ran in Google Colab:
!git clone https://github.com/richzhang/colorization.git
cd colorization/
import colorizers
model = colorizer_siggraph17 = colorizers.siggraph17(pretrained=True).eval()
input_names = [ "input" ]
output_names = [ "output" ]
dummy_input = torch.randn(1, 1, 256, 256, device='cpu')
torch.onnx.export(model, dummy_input, "test_converted_model.onnx", verbose=True,
input_names=input_names, output_names=output_names)
I appreciate any help :)
UPDATE 1: #Proko suggestion solved the ONNX export issue. Now I have a new possibly related problem when I try to convert the ONNX to TensorRT. I get the following error:
[TensorRT] ERROR: Network must have at least one output
Here is the code I used:
import torch
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
import onnx
TRT_LOGGER = trt.Logger()
def build_engine(onnx_file_path):
# initialize TensorRT engine and parse ONNX model
builder = trt.Builder(TRT_LOGGER)
builder.max_workspace_size = 1 << 25
builder.max_batch_size = 1
if builder.platform_has_fast_fp16:
builder.fp16_mode = True
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)
# parse ONNX
with open(onnx_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
parser.parse(model.read())
print('Completed parsing of ONNX file')
# generate TensorRT engine optimized for the target platform
print('Building an engine...')
engine = builder.build_cuda_engine(network)
context = engine.create_execution_context()
print("Completed creating Engine")
return engine, context
ONNX_FILE_PATH = 'siggraph17.onnx' # Exported using the code above
engine,_ = build_engine(ONNX_FILE_PATH)
I tried to force the build_engine function to use the output of the network by:
network.mark_output(network.get_layer(network.num_layers-1).get_output(0))
but it did not work.
I appropriate any help!
Like I have mentioned in a comment, this is because slicing in torch.onnx supports only step = 1 but there are 2-step slicing in the model:
self.model2(conv1_2[:,:,::2,::2])
Your only option as for now is to rewrite slicing to be some other ops. You can do it by using range and reshape to obtain proper indices. Consider the following function "step-less-arange" (I hope it is generic enough for anyone with similar problem):
def sla(x, step):
diff = x % step
x += (diff > 0)*(step - diff) # add length to be able to reshape properly
return torch.arange(x).reshape((-1, step))[:, 0]
usage:
>> sla(11, 3)
tensor([0, 3, 6, 9])
Now you can replace every slice like this:
conv2_2 = self.model2(conv1_2[:,:,self.sla(conv1_2.shape[2], 2),:][:,:,:, self.sla(conv1_2.shape[3], 2)])
NOTE: you should optimize it. Indices are calculated for every call so it might be wise to pre-compute it.
I have tested it with my fork of the repo and I was able to save the model:
https://github.com/prokotg/colorization
What works for me was to add the opset_version=11 on torch.onnx.export
First I had tried use opset_version=10, but the API suggest 11 so it works.
So your function should be:
torch.onnx.export(model, dummy_input, "test_converted_model.onnx", verbose=True,opset_version=11,
input_names=input_names, output_names=output_names)

Model parallelism in Keras

I am trying to implement model parallelism in Keras.
I am using Keras-2.2.4
Tensorflor-1.13.1
Rough structure of my code is :
import tensorflow as tf
import keras
def model_definition():
input0 = Input(shape = (None, None))
input1 = Input(shape = (None, None))
with tf.Session(config=tf.ConfigProto(allow_soft_placement=False, log_device_placement=True)):
model = get_some_CNN_model()
with tf.device(tf.DeviceSpec(device_type="GPU", device_index=0)):
op0 = model(input0)
with tf.device(tf.DeviceSpec(device_type="GPU", device_index=1)):
op1 = model(input1)
with tf.device(tf.DeviceSpec(device_type="CPU", device_index=0)):
concatenated_ops = concatenate([op0, op1],axis=-1, name = 'check_conc1')
mixmodel = Model(inputs=[input0, input1], outputs = concatenated_ops)
return mixmodel
mymodel = model_definition()
mymodel.fit_generator()
Expected result: While training, computations for op0 and op1 should be done on gpu0 and gpu1, respectively.
Problem 1: When I have 2 gpus available, the training works fine. nvidia-smi shows that both gpus are being used. Though I am not sure if both gpus are doing their intended work. How to confirm that?
Because, even I set log_device_placement as True, I don't see any task allocated to gpu 1
Problem 2: When I run this code on a machine with 1 GPU available, still it runs fine. It is expected show an error because GPU 1 is not available.
The example shown here works fine as expected. It doesn't show the problem 2, i.e. on single GPU, it raises an error.
So I think some manipulation is happening inside keras.
I have also tried using import tensorflow.python.keras instead of import keras, in case it is causing any conflict.
However, the both problems persist.
Would appreciate any clue about this issue. Thank you.

Cannot clone object <tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object

This is with regards to TF 2.0.
Please find below my code that performs GridSearch along with Cross Validation using sklearn.model_selection.GridSearchCV for the mnist dataset that works perfectly fine.
# Build Function to create model, required by KerasClassifier
def create_model(optimizer_val='RMSprop',hidden_layer_size=16,activation_fn='relu',dropout_rate=0.1,regularization_fn=tf.keras.regularizers.l1(0.001),kernel_initializer_fn=tf.keras.initializers.glorot_uniform,bias_initializer_fn=tf.keras.initializers.zeros):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(units=hidden_layer_size, activation=activation_fn,kernel_regularizer=regularization_fn,kernel_initializer=kernel_initializer_fn,bias_initializer=bias_initializer_fn),
tf.keras.layers.Dropout(dropout_rate),
tf.keras.layers.Dense(units=hidden_layer_size,activation='softmax',kernel_regularizer=regularization_fn,kernel_initializer=kernel_initializer_fn,bias_initializer=bias_initializer_fn)
])
optimizer_val_final=optimizer_val
model.compile(optimizer=optimizer_val, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
#Create the model with the wrapper
model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=2)
#Initialize the parameter grid
nn_param_grid = {
'epochs': [10],
'batch_size':[128],
'optimizer_val': ['Adam','SGD'],
'hidden_layer_size': [128],
'activation_fn': ['relu'],
'dropout_rate': [0.2],
'regularization_fn':['l1','l2','L1L2'],
'kernel_initializer_fn':['glorot_normal', 'glorot_uniform'],
'bias_initializer_fn':[tf.keras.initializers.zeros]
}
#Perform GridSearchCV
grid = GridSearchCV(estimator=model, param_grid=nn_param_grid, verbose=2, cv=3,scoring=precision_custom,return_train_score=False,n_jobs=-1)
grid_result = grid.fit(x_train, y_train)
My idea is to pass different optimizers with different learning rates , say Adam for learning rates 0.1,0.01 and 0.001. I also want to try out SGD with different learning rates and momentum values.
In that case , when I pass 'optimizer_val': [tf.keras.optimizers.Adam(0.1)], , I get the error as given below:
Cannot clone object <tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x7fe08b210e10>, as the constructor either does not set or modifies parameter optimizer_val
Please advise as to how can I rectify this error.
This is sklearn bug. You should reduce the version of sklearn:
conda install scikit-learn==0.21.2
It's OK!
You can fix the issue with changing the list into tuple.
If there is any single valued instance then you can use list.
#Initialize the parameter grid
nn_param_grid = {
'epochs': [10],
'batch_size':[128],
'optimizer_val': ('Adam','SGD'),
'hidden_layer_size': [128],
'activation_fn': ['relu'],
'dropout_rate': [0.2],
'regularization_fn':('l1','l2','L1L2'),
'kernel_initializer_fn':('glorot_normal', 'glorot_uniform'),
'bias_initializer_fn':[tf.keras.initializers.zeros]
}
Found this comment online and it helped!
For those who are getting following error due to above statement:
Cannot clone object <keras.wrappers.scikit_learn.KerasClassifier object at 0x7f93ddc5d1d0>, as the constructor either does not set or modifies parameter layers
Change the layers from array of list to array of tuple:
layers => [(20,), (45, 30, 15), (40, 20)]
Don't forget to add comma after
(20,) otherwise another error/warning will appear - FitFailedWarning:
Estimator fit failed. The score on this train-test partition for these
parameters will be set to nan. Details: TypeError: 'int' object is
not iterable Because single tuple without comma is treated as int.
Only installing TensorFlow 2.8 helped with this issue. Notice that it is available only via pip Anaconda TensorFlow 2.7 vs. Pypi TensorFlow 2.8
To check your version of Tensorflow type: conda list tensorflow
(base) C:\Users\User> conda list tensorflow-gpu
# Name Version Build Channel
tensorflow-gpu 2.4.1 pyhd8ed1ab_3 conda-forge
To uninstall type: conda uninstall tensorflow and to install version 2.8 type:
pip install tensorflow-gpu==2.8

ValueError: Circular reference detected in LightGBM

I get the following error when I train LightGBM model:
# Train the model
import lightgbm as lgb
lgb_train = lgb.Dataset(x_train, y_train)
lgb_val = lgb.Dataset(x_test, y_test)
parameters = {
'application': 'binary',
'objective': 'binary',
'metric': 'auc',
'is_unbalance': 'true',
'boosting': 'gbdt',
'num_leaves': 31,
'feature_fraction': 0.5,
'bagging_fraction': 0.5,
'bagging_freq': 20,
'learning_rate': 0.05,
'verbose': 0
}
model = lgb.train(parameters,
train_data,
valid_sets=test_data,
num_boost_round=5000,
early_stopping_rounds=100)
y_pred = model.predict(test_data)
If you used cut or qcut functions for binning and did not encode later (one-hot encoding, label encoding ..). this may be the cause of the error. Try to use an encoding.
I hope it works.
I had what might be the same problem.
Post the whole traceback to make sure.
For me it was a problem serializing to JSON, which LightGBM does under the hood to save the booster for later use.
Check your dataset for any date/datetime columns, or anything that remotely looks like a date, and either drop it or convert to something JSON can handle.
Mine had all been converted to categorical dtype by some Pandas code I had poorly written, and I usually do the initial GBM run fairly fast-n-dirty to see what variables show up as important. LightGBM let me make the data binaries for training (i.e. it would have thrown an error if they were datetime or timedelta dtypes before letting me run anything). It would run the training just fine, report an AUC, then fail after the last training step when it was dumping the categoricals to JSON. It was maddening, with a cryptic traceback.
Hope this helps.
If you have any time delta variable in the dataset convert it into an int using the dt.days attribute. I faced the same issue it is the issue reported in Github of light gbm

Resources