Tensorflow model prediction failes when ran right after model training - python-3.x

I'm having troubles with my model prediction. The training works fine but afterwards my program fails while predicting the trained model. When I rerun my code the training is now skipped because its already done, the prediction works now fine as its supposed to. In google I find this error only with regard to model training so i guess the solutions don't work for me. I think the reason for my error is, that my video ram is not entirely freed after model training. That's why I tried the following without success.
tf.keras.backend.clear_session()
tf.compat.v1.reset_default_graph()
K.clear_session()
Error code:
prediction = model.predict(x)[:, 0]#.flatten() # flatten was needed now
File "/home/max/PycharmProjects/Masterthesis/venv3-8-12/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/max/PycharmProjects/Masterthesis/venv3-8-12/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py", line 106, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.
Do you have any ideas on how to solve the problem?
My Setup:
Python: 3.8.12
Tensorflow-gpu: 2.7.0
System: Manjaro Linux
Cuda: 11.5
GPU: NVIDIA GeForce GTX 980 Ti
My Code:
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Input, LSTM, Dense, Dropout
import tensorflow as tf
import h5py
import keras.backend as K
def loss_function(y_true, y_pred):
alpha = K.std(y_pred) / K.std(y_true)
beta = K.sum(y_pred) / K.sum(y_true)
error = K.sqrt( + K.square(1 - alpha) + K.square(1 - beta))
return error
i = Input(shape=(171, 11))
x = LSTM(100, return_sequences=True)(i)
x = LSTM(50)(x)
x = Dropout(0.1)(x)
out = Dense(1)(x)
model = Model(i, out)
model.compile(
loss=loss_function,
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001))
with h5py.File("db.hdf5", 'r') as db_:
r = model.fit(
db_["X_train"][...],
db_["Y_train"][...],
epochs=1,
batch_size=64,
verbose=1,
shuffle=True)
model.save("model.h5")
model = load_model("model.h5", compile=False)
with h5py.File("db.hdf5", 'r') as db:
x = db["X_val"][...]
y = db["Y_val"][...].flatten()
prediction = model.predict(x)[:, 0].flatten()

I found the solution to my problem. Since I'm using a custom loss function, I somehow needed to specify the custom loss function when loading the model again.
I accomplished this by modifying this line
model = load_model("model.h5", compile=False)
to this one
model = load_model("model.h5", custom_objects={"loss_function": loss_function})

Related

Keras: Using load_model on a pretrained model with custom loss to resume training (Without using the compile=False workaround)

TensorFlow version = 2.11.0
Laptop OS = Ubuntu Linux _86_64
Python version = 3.10.6
Coding on Visual Studio Code on a venv
I am trying to resume the training of my Keras model by loading it the following way:
# Loading pretrained model from .h5 file
h5_path = 'pretrained.h5'
model = tf.keras.models.load_model(h5_path, compile=False)
# Recover desired LR
LR = model.optimizer.learning_rate
pretrain_optimizer = tf.keras.optimizers.Nadam(learning_rate=LR)
# Compile model with desired qualities
model.compile(optimizer=pretrain_optimizer, loss = GxGyLoss_keras, metrics=[tf.keras.metrics.MeanAbsoluteError(),tf.keras.metrics.MeanAbsolutePercentageError()])
This is how I compiled and saved the 'pretrained.h5' model before
EPOCHS = 50
BATCH_SIZE = 30
OUT_CHANNELS = 1
IN_CHANNELS = 1
LR = 1e-3
IMG_SIZE = (256,256)
# Build model from 0
model = get_model(img_size=IMG_SIZE, in_channels=IN_CHANNELS, out_channels=OUT_CHANNELS)
# Compile model from 0
model.compile(optimizer=optimizers.Nadam(learning_rate=LR), loss=GxGyLoss_keras, metrics=[tf.keras.metrics.MeanAbsoluteError(),tf.keras.metrics.MeanAbsolutePercentageError()])
# Save best model weights
callbacks = [
# Save model every epoch regarding the best val_loss
keras.callbacks.ModelCheckpoint(
filepath = 'pretrained.h5',
save_best_only = True,
save_weights_only = False,
mode = 'min',
monitor = 'val_loss',),
# Save history data on the same csv file so that resuming training doesn't affect the plots
keras.callbacks.CSVLogger(
filename = f'training_hist_log',
separator = ',',
append = True)
]
# Train the model, doing validation at the end of each epoch.
history = model.fit(x = X_train,
y = y_train,
epochs = EPOCHS,
batch_size = BATCH_SIZE,
validation_data = (X_val, y_val),
callbacks = callbacks)
It is important to declare that this is how I defined the custom loss function:
from keras import backend as K
def GxGyLoss_keras(y_pred,y_true):
intImgSize = 256
summatory = K.sum(K.square(y_pred-y_true))
eval = K.sqrt(summatory) / intImgSize**2
return eval
I was trying to do the workaround of setting compile = False, since that seems to work on other users, but my problem is that I need to load the optimizer state to resume the training, and using that I get this apparently obvious error:
Traceback (most recent call last):
File "/home/victus-linux/Escritorio/MasterThesis_CODE/to_share/main_copy.py", line 114, in
LR = model.optimizer.learning_rate
AttributeError: 'NoneType' object has no attribute 'learning_rate'
I understand that it pops because I am not compiling the previous model neither the optimizer state, I would like to have a solution for resume the training of my model (with it's custom loss) and keeping at the same time the optimizer state (keep gradients, loss state...etc)
If I use
tensorflow.keras.load_model(h5_path, compile=True)
the problem stands in the compilation of the custom loss function. I also used:
tensorflow.keras.load_model(h5_path, compile=True,
custom_objects=GxGyLoss_keras)
but the exact same error was popping up:
File
"/home/victus-linux/Escritorio/MasterThesis_CODE/to_share/venv_master/lib/python3.10/site-packages/keras/saving/legacy/serialization.py",
line 557, in deserialize_keras_object
raise ValueError( ValueError: Unknown loss function: 'GxGyLoss_keras'. Please ensure you are using a
keras.utils.custom_object_scope and that this object is included in
the scope. See
https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object
for details.
[DISCLAIMER: This is my first time asking around here, I am sorry in advance for the possible mistakes over the format of my question]

NameError: name 'Model is not defined'-how to resolve this?

I am trying to classify 2 categories with transfer learning. After preprocessing my data I want to apply 'InceptionResNetV2'. Where I want to remove the last layer of this Keras application and want to add a layer.
The following script I wrote to do this:
irv2 = tf.keras.applications.inception_resnet_v2.InceptionResNetV2()
irv2.summary()
x = irv2.layers[-1].output
x = Dropout(0.25)(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(inputs=mobile.input, outputs=predictions)
Then an error occurred:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-40-911de74d9eaf> in <module>()
5 predictions = Dense(2, activation='softmax')(x)
6
----> 7 model = Model(inputs=mobile.input, outputs=predictions)
NameError: name 'Model' is not defined
If is there another way to remove the last layer and add a new layer(predictions = Dense(2, activation='softmax')) please let me know.
This is my full code.
You can use this code snippet to define your transfer learning model.
Here, we are using weights trained on imagenet datsaset and are ignoring the final layer (the 1000 neuron layer that was used to train 1000 classes in imagenet dataset) and adding our custom layers. In this example we are adding a GAP layer followed by a dense layer for binary classification.
from tensorflow import keras
input_layer = keras.layers.Input(shape=(224, 224, 3))
irv2 = keras.applications.Xception(weights='imagenet',include_top=False,input_tensor = input_layer)
global_avg = keras.layers.GlobalAveragePooling2D()(irv2.output)
dense_1 = keras.layers.Dense(1,activation = 'sigmoid')(global_avg)
model = keras.Model(inputs=irv2.inputs,outputs=dense_1)
model.summary()
The error you faced could possibly be due to the import changes between tf 1.x and tf 2.x
Try out any one of the below import methods depending on your tensorflow version. It should fix the error.
from tensorflow.keras.models import Model
or
from tensorflow.keras import Model
And also make sure you either import everything from tensorflow or from keras. Using the functions which are imported from either of the libraries in the same script would cause incompatibility errors.
-1 will give you the last Dense layer, but what you really what it a layer above that which is -2
Input should be the inception model input layer
import tensorflow as tf
from tensorflow.keras.layers import Dense
from keras.models import Model
irv2 = tf.keras.applications.inception_resnet_v2.InceptionResNetV2()
predictions = Dense(2, activation='softmax')(irv2.layers[-2].output)
model = Model(inputs=irv2.input, outputs=predictions)
model.summary()

Pytorch Resnet model error if FC layer is changed in Colab

If I simply import the Resnet Model from Pytorch in Colab, and use it to train my dataset, there are no issues. However, when I try to change the last FC layer to change the output features from 1000 to 9, which is the number of classes for my datasets, the following error is obtained.
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)
Working version:
import torchvision.models as models
#model = Net()
model=models.resnet18(pretrained=True)
# defining the optimizer
optimizer = Adam(model.parameters(), lr=0.07)
# defining the loss function
criterion = CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
Version with error:
import torchvision.models as models
#model = Net()
model=models.resnet18(pretrained=True)
# defining the optimizer
optimizer = Adam(model.parameters(), lr=0.07)
# defining the loss function
criterion = CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
model.fc = torch.nn.Linear(512, 9)
Error occurs in the stage where training occurs, aka
outputs = model(images)
How should I go about fixing this issue?
Simple error, the fc layer should be instantiated before declaring model as cuda.
I.e
model=models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 9)
if torch.cuda.is_available():
model = model.cuda()

tf.keras plot_model: add_node() received a non node class object

I'm getting back into python and have been trying out some stuff with tensorflow and keras. I wanted to use the plot_model function and after sorting out some graphviz issues I am now getting this error -
TypeError: add_node() received a non node class object:
I've tried to find an answer myself but have come up short, as the only answer I found with this error didn't seem to be to do with tf. Any suggestions or alternative ideas would be greatly appreciated.
Here's the code and error message - my first question on here so sorry if I missed anything, just let me know.
I'm using miniconda3 with python 3.8
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import plot_model
from tensorflow.keras.callbacks import EarlyStopping
from numpy import argmax
from matplotlib import pyplot
from random import randint
tf.keras.backend.set_floatx("float64")
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]
class mnist_model(Model):
def __init__(self):
super(mnist_model, self).__init__()
self.conv = Conv2D(32, 3, activation = tf.nn.leaky_relu, kernel_initializer = 'he_uniform', input_shape = (28, 28, 3))
self.pool = MaxPool2D((2,2))
self.flat = Flatten()
self.den1 = Dense(128, activation = tf.nn.relu, kernel_initializer = 'he_normal')
self.drop = Dropout(0.25)
self.den2 = Dense(10, activation = tf.nn.softmax)
def call(self, inputs):
n = self.conv(inputs)
n = self.pool(n)
n = self.flat(n)
n = self.den1(n)
n = self.drop(n)
return self.den2(n)
model = mnist_model()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
limit = EarlyStopping(monitor = 'val_loss', patience = 5)
history = model.fit(x_train, y_train, batch_size=64, epochs = 1, verbose = 2, validation_split = 0.15, steps_per_epoch = 100, callbacks = [limit])
print("\nTraining finished\n\nTesting 10000 samples")
model.evaluate(x_test, y_test, verbose = 1)
print("Testing finished\n")
plot_model(model, show_shapes = True, rankdir = 'LR')
##################################################################################################################################################################
## Error message: ##
Train on 51000 samples, validate on 9000 samples
Training finished
Testing 10000 samples
10000/10000 [==============================] - 7s 682us/sample - loss: 0.2447 - accuracy: 0.9242
Testing finished
Traceback (most recent call last):
File "C:\Users\Thomas\Desktop\Various Python\Tensorflow\Tensorflow_experimentation\tc_mnist.py", line 60, in <module>
plot_model(model, show_shapes = True, rankdir = 'LR')
File "C:\Users\Thomas\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\keras\utils\vis_utils.py", line 283, in plot_model
dpi=dpi)
File "C:\Users\Thomas\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\keras\utils\vis_utils.py", line 131, in model_to_dot
dot.add_node(node)
File "C:\Users\Thomas\miniconda3\envs\tensorflow\lib\site-packages\pydotplus\graphviz.py", line 1281, in add_node
'class object: {}'.format(str(graph_node))
TypeError: add_node() received a non node class object: <pydotplus.graphviz.Node object at 0x00000221C7E3E888>`
I think root-cause of the issue is with shape inference of Subclassed model where model.summary shows multiple as Output Shape. I added a model call within the subclassed model as shown below.
def model(self):
x = tf.keras.layers.Input(shape=(28, 28, 1))
return Model(inputs=[x], outputs=self.call(x))
With this modification, shape inference is automatic in Functional API. As Functional and Sequential model as static graphs of layers, we can get the shape inference easily. However, subclassed model is a piece of python code (a call method) and there is no graph of layers to infer easily. We cannot know how layers are connected to each other (because that's defined in the body of call, not as an explicit data structure), so we cannot infer input / output shapes.
Please check full code here for your reference.

Error when making predictions on un-pickled classifier

I am making a text classifying program which has input of over thousand emails, so for convenience I have decided to save the classifier in a pickled file after the training is complete, so that after further executions of the program, I wont have to retrain it.
path = 'classifier.pkl'
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
if not os.path.exists(path):
# making a classifier
clf.fit(x_train, y_train)
with open(path, 'wb') as f:
pickle.dump(clf, f)
else:
print('<classifier found!>')
input_file = open(path, 'rb')
clf = pickle.load(input_file)
input_file.close()
pred = clf.predict(x_test) # the error occurs on this line
The prediction works on first run (when classifier is not a file input). But it gives me this error on next executions:
ValueError: operands could not be broadcast together with shapes
(3516,379) (376,)
shapes of x_train and x_test are as follows: (14062, 379), (3516, 379)
Any help would be appreciated
Edit: I have tried desertnaut's suggestion of pickling pred = clf.predict(x_test) and using it in further runs of the program, and accuracy score I get from those runs seem to be twice as low as the score when initially training the classifier
Could not figure out why pickling does not work. However, sklearn's joblib function seems to work just fine.
from sklearn.externals import joblib
if not os.path.exists(path):
clf = clf.fit(x_train, y_train)
joblib.dump(clf, path)
else:
clf = joblib.load(path)

Resources