How to save CNN model configuration to file - keras

I want to save my CNN model configuration (kernel sizes, activations, filters, etc) to txt file. By using "summary" I only get layers input, output and params, but I need more information.
I tried the following functions:
# large dictionary with all information, but have a lot of noise info
config = model.get_config()
# same dictionary, but converted to string
summaryJson = str(model.to_json())
All this solutions don't give me the most important parameters. So I found this solution that seems to give all that I need, but it doesn't work:
from keras_diagram import ascii
summary = asc(model)
But it gives me following error:
AttributeError: 'Activation' object has no attribute 'inbound_nodes'
This is my last layers:
...
hid = Conv2D(128, kernel_size=5, strides=1, padding='same')(hid)
hid = BatchNormalization(momentum=0.9)(hid)
hid = LeakyReLU(alpha=0.1)(hid)
hid = Conv2D(3, kernel_size=5, strides=1, padding="same")(hid)
out = Activation("tanh")(hid)
model = Model(input_layer, out)
summary = ascii(model)
Do you know what to do?

Related

How to save and load only particular layers of a neural network with PyTorch?

Bare Problem Statement:
I have trained a Model A, that consists of a feature Extractor FE and a classification head ACH.
I want to train a model B, that uses A's feature extractor FE and retrains it's own classification head BCH.
So far it's easy. Now I don't want to save the entire model B since the FE part of it is already saved in the model A. I only want to dump the BCH, and during inference
Load model A - do it's prediction
Load B's classification head BCH.
Swap the classification head ACH with BCH
Run prediction using this swapped state.
Reading pyTorches documentation it only talks about saving entire models. How can I achieve this?
End of problem statement
More details on the motivation of the problem:
I have a dataset of images that I want to classify, these images have can have several classes given to them. For example the same image can have the class of "Land Vehicle" (supercategory) and a class of "Car" (category) or a "Truck". Another image might have the class "Aerial Vehicle" and it can be a "Helicopter" or a "Plane".
Since the images and therefore most of the features should be the same, I wish to train one classifier for the supercategories, then freeze it's feature-extractor, and sort of transfer learn the same model for the categories using the pretrained feature extractor.
Since the weights of the feature extracting backbone is the same, I only want to save the weights of the classification head of the categories model, and thus save some precious computational resources.
In general, it's something usual to only want an access to the backbone of a model in order to reuse it for others purposes. You have several ways to perform this. But mostly, having in mind that saving a model checkpoint and loading it later means saving weights and biases and being able to load them correctly to the corresponding layers, you first need to know, from your model, what part do you want to save.
When you get the state of a model, you will obtain a dictionary. The keys will be the layers names and the values will be the weights and the biases. Let's see an example with an efficientnet classifier on how to only save the backbone of a model. Basically, an efficientnet, as in your example, is a backbone and a fully connected layer as a head, if you only want the backbone, you want every single layers, except the head that you'll fine tune later.
import torch
import torch.nn as nn
from efficientnet_pytorch import EfficientNet
model = EfficientNet.from_name("efficientnet-b0")
print(model)
It will print the model layers and some features, basic stuff.
EfficientNet(
(_conv_stem): Conv2dStaticSamePadding(
3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False
(static_padding): ZeroPad2d(padding=(0, 1, 0, 1), value=0.0)
)
(_bn0): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_blocks): ModuleList(
(0): MBConvBlock(
(_depthwise_conv): Conv2dStaticSamePadding(
32, 32, kernel_size=(3, 3), stride=[1, 1], groups=32, bias=False
(static_padding): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
)
(_bn1): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_se_reduce): Conv2dStaticSamePadding(
32, 8, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
(_se_expand): Conv2dStaticSamePadding(
8, 32, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
...
Now what is interesting is the final layers of this model :
...
(_bn1): BatchNorm2d(1280, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_avg_pooling): AdaptiveAvgPool2d(output_size=1)
(_dropout): Dropout(p=0.2, inplace=False)
(_fc): Linear(in_features=1280, out_features=1000, bias=True)
(_swish): MemoryEfficientSwish()
Let's say we want to reuse this model backbone, except _fcsince we would like to use the weights on another model having the same backbone but a different head, not pre-trained. In this example I'll take the same backbone and add 3 heads :
class ThreeHeadEfficientNet(torch.nn.Module):
def __init__(self,nbClasses1,nbClasses2,nbClasses3,model="efficientnet-b0",dropout_p=0.2):
super(ThreeHeadEfficientNet, self).__init__()
self.NBC1 = nbClasses1
self.NBC2 = nbClasses2
self.NBC3 = nbClasses3
self.dropout_p = dropout_p
self._dropout_layer = torch.nn.Dropout(p=self.dropout_p)
self._head1 = torch.nn.Linear(1280,self.NBC1)
self._head2 = torch.nn.Linear(1280,self.NBC2)
self._head3 = torch.nn.Linear(1280,self.NBC3)
self.model = EfficientNet.from_name(model,include_top=False) #you can notice here, I'm not loading the head, only the backbone
def forward(self,x):
features = self.model(x)
res = features.flatten(start_dim=1)
res = self._dropout_layer(res)
res1 = self._head1(res)
res2 = self._head2(res)
res3 = self._head3(res)
return res1,res2,res3
You'll notice now, if you print this ThreeHeadsModel layers, the layers name have slightly changed from _conv_stem.weight to model._conv_stem.weight since the backbone is now stored in a attribute variable model. We'll thus have to process that otherwise the keys will mismatch, create a new state dictionary that matches the expected keys of this new model and containing the pretrained weights and biases :
pretrained_dict = model.state_dict() #pretrained model keys
model_dict = new_model.state_dict() #new model keys
processed_dict = {}
for k in model_dict.keys():
decomposed_key = k.split(".")
if("model" in decomposed_key):
pretrained_key = ".".join(decomposed_key[1:])
processed_dict[k] = pretrained_dict[pretrained_key] #Here we are creating the new state dict to make our new model able to load the pretrained parameters without the head.
new_model.load_state_dict(processed_dict, strict=False) #strict here is important since the heads layers are missing from the state, we don't want this line to raise an error but load the present keys anyway.
And finally, in new_model you should have your new model with a pretrained backbone and heads to fine tune.
Now you should be able to fix your issues :)
For more pytorch information, please also check the forum.

Create unsupervised embedding model in keras?

I want to create an autoencoder with the following architecture:
path_source_token_input = Input(shape=(MAX_CONTEXTS,), dtype=tf.int32, name='source_token_input')
path_input = Input(shape=(MAX_CONTEXTS,), dtype=tf.int32, name='path_input')
path_target_token_input = Input(shape=(MAX_CONTEXTS,), dtype=tf.int32, name='target_token_input')
paths_embedded = Embedding(PATH_SIZE, DEFAULT_EMBEDDINGS_SIZE, name='path_embedding')(path_input)
token_embedding_shared_layer = Embedding(TOKEN_SIZE, DEFAULT_EMBEDDINGS_SIZE, name='token_embedding')
path_source_token_embedded = token_embedding_shared_layer(path_source_token_input)
path_target_token_embedded = token_embedding_shared_layer(path_target_token_input)
context_embedded = Concatenate()([path_source_token_embedded, paths_embedded, path_target_token_embedded]) # --> this up to now, is the output of the STANDALONE embedding model
-------- SPLIT HERE? ------
context_after_dense = TimeDistributed(Dense(CODE_VECTOR_SIZE, use_bias=False, activation='tanh'))(context_embedded) # in short, this layer probably has to stay
encoded = LSTM(100, activation='relu', input_shape=context_after_dense.shape)(context_after_dense)
decoded = RepeatVector(MAX_CONTEXTS)(encoded)
decoded = LSTM(100, activation='relu', return_sequences=True)(decoded)
result = TimeDistributed(Dense(1), name='PROBLEM_is_here')(decoded) # this seems to be some trick according to https://github.com/keras-team/keras/issues/10753, so probably don't remove
inputs = (path_source_token_input, path_input, path_target_token_input)
model = tf.keras.Model(inputs=inputs, outputs=result)
So far I have learned that it is impossible to implement an inversed embedding layer in the decoder, so naturally my conclusion is to split my network in two: one to generate concatenated embeddings of input, and the second part would be the autoencoder itself with input as concatenated embeddings (output of first part). Now my question is, is it possible to create an unsupervised embedding model in keras? or anywhere else for that matter. My data is unlabeled and the point of my final neural network would be to create clusters of said unlabeled data.

Keras: saving model defined as a class raises NotImplementedError

I am writing this post after reading similar questions and answers that didn't work in my case. You may notice that I defined the input shape in the first layer.
I created a very small CNN in Keras, as follows:
import tensorflow as tf
class MyNet(tf.keras.Model):
def __init__(self):
super(MyNet, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(32, 5, strides = (2,2), data_format = 'channels_first', input_shape = (3,224,224))
self.bn1 = tf.keras.layers.BatchNormalization(axis = 1)
self.fc1 = tf.keras.layers.Dense(10)
self.globalavg = tf.keras.layers.GlobalAveragePooling2D(data_format = 'channels_first')
def call(self, inputs):
x = self.conv1(inputs)
x = self.bn1(x)
x = tf.keras.activations.relu(x)
x = self.globalavg(x)
return self.fc1(x)
Then I fed something into it and printed the result successfully (the weights are probably random at the moment, but that's ok):
image = tf.ones(shape = (1, 3, 224, 224)) # Defined "channels first" when created the layers
mynet = MyNet()
outputs = mynet(image)
print(tf.keras.backend.eval(outputs))
The result I saw at this step was the 10 outputs of the fc1 layer:
[[-1.1747773 -0.21640654 -0.16266493 -0.44879064 -0.642066 0.78132695 -0.03920581 -0.30874395 -0.04169023 -0.10409291]]
Then I tried to save the model with its weights, by calling mynet.save('mynet.hdf5'), and got the following error:
NotImplementedError: Currently `save` requires model to be a graph network. Consider using `save_weights`, in order to save the weights of the model.
Note that I am new to Keras and that most of my experience is with PyTorch.
What am I doing wrong?
Update:
Following #ikibir's answer, I redefined the network as a sequential network:
myNetAsSeq = tf.keras.models.Sequential()
myNetAsSeq.add(tf.keras.layers.Conv2D(32, 5, strides = (2,2), data_format = 'channels_first', input_shape = (3,224,224)))
myNetAsSeq.add(tf.keras.layers.BatchNormalization(axis = 1))
myNetAsSeq.add(tf.keras.layers.Activation('relu'))
myNetAsSeq.add(tf.keras.layers.GlobalAveragePooling2D(data_format = 'channels_first'))
myNetAsSeq.add(tf.keras.layers.Dense(10))
This time calling myNetAsSeq.save('mynet.hdf5') succeeded.
I am not sure about my answer but i believe you don't create a model you are just creating each layer individually, when you run 'call' function you just pass the variables to this layers.
In keras you should use
model = models.Sequential()
for create model and you should use
model.add()
to add layers
then you can save this model

Is it possible to train using same model with two inputs?

Hello I have a some question for keras.
currently i want implement some network
using same cnn model, and use two images as input of cnn model
and use two result of cnn model, provide to Dense model
for example
def cnn_model():
input = Input(shape=(None, None, 3))
x = Conv2D(8, (3, 3), strides=(1, 1))(input)
x = GlobalAvgPool2D()(x)
model = Model(input, x)
return model
def fc_model(cnn1, cnn2):
input_1 = cnn1.output
input_2 = cnn2.output
input = concatenate([input_1, input_2])
x = Dense(1, input_shape=(None, 16))(input)
x = Activation('sigmoid')(x)
model = Model([cnn1.input, cnn2.input], x)
return model
def main():
cnn1 = cnn_model()
cnn2 = cnn_model()
model = fc_model(cnn1, cnn2)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x=[image1, image2], y=[1.0, 1.0], batch_size=1, ecpochs=1)
i want to implement model something like this, and train models
but i got error message like below :
'All layer names should be unique'
Actually i want use only one CNN model as feature extractor and finally use two features to predict one float value as 0.0 ~ 1.0
so whole system -->>
using two images and extract features from same CNN model, and features are provided to Dense model to get one floating value
Please, help me implement this system and how to train..
Thank you
See the section of the Keras documentation on shared layers:
https://keras.io/getting-started/functional-api-guide/
A code snippet from the documentation above demonstrating this:
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

How to change input shape in Sequential model in Keras

I have a sequential model that I built in Keras.
I try to figure out how to change the shape of the input. In the following example
model = Sequential()
model.add(Dense(32, input_shape=(500,)))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
let's say that I want to build a new model with different input shape, conceptual this should looks like this:
model1 = model
model1.layers[0] = Dense(32, input_shape=(250,))
is there a way to modify the model input shape?
Somewhat related, so hopefully someone will find this useful: If you have an existing model where the input is a placeholder that looks like (None, None, None, 3) for example, you can load the model, replace the first layer with a concretely shaped input. Transformation of this kind is very useful when for example you want to use your model in iOS CoreML (In my case the input of the model was a MLMultiArray instead of CVPixelBuffer, and the model compilation failed)
from keras.models import load_model
from keras import backend as K
from keras.engine import InputLayer
import coremltools
model = load_model('your_model.h5')
# Create a new input layer to replace the (None,None,None,3) input layer :
input_layer = InputLayer(input_shape=(272, 480, 3), name="input_1")
# Save and convert :
model.layers[0] = input_layer
model.save("reshaped_model.h5")
coreml_model = coremltools.converters.keras.convert('reshaped_model.h5')
coreml_model.save('MyPredictor.mlmodel')
Think about what changing the input shape in that situation would mean.
Your first model
model.add(Dense(32, input_shape=(500,)))
Has a dense layer that really is a 500x32 matrix.
If you changed your input to 250 elements, your layers's matrix and input dimension would mismatch.
If, however, what you were trying to achieve was to reuse your last layer's trained parameters from your first 500 element input model, you could get those weights by get_weights. Then you could rebuild a new model and set values at the new model with set_weights.
model1 = Sequential()
model1.add(Dense(32, input_shape=(250,)))
model1.add(Dense(10, activation='softmax'))
model1.layers[1].set_weights(model1.layers[1].get_weights())
Keep in mind that model1 first layer (aka model1.layers[0]) would still be untrained
Here is another solution without defining each layer of the model from scratch. The key for me was to use "_layers" instead of "layers". The latter only seems to return a copy.
import keras
import numpy as np
def get_model():
old_input_shape = (20, 20, 3)
model = keras.models.Sequential()
model.add(keras.layers.Conv2D(9, (3, 3), padding="same", input_shape=old_input_shape))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(1, activation="sigmoid"))
model.compile(loss='binary_crossentropy', optimizer=keras.optimizers.Adam(lr=0.0001), metrics=['acc'], )
model.summary()
return model
def change_model(model, new_input_shape=(None, 40, 40, 3)):
# replace input shape of first layer
model._layers[1].batch_input_shape = new_input_shape
# feel free to modify additional parameters of other layers, for example...
model._layers[2].pool_size = (8, 8)
model._layers[2].strides = (8, 8)
# rebuild model architecture by exporting and importing via json
new_model = keras.models.model_from_json(model.to_json())
new_model.summary()
# copy weights from old model to new one
for layer in new_model.layers:
try:
layer.set_weights(model.get_layer(name=layer.name).get_weights())
except:
print("Could not transfer weights for layer {}".format(layer.name))
# test new model on a random input image
X = np.random.rand(10, 40, 40, 3)
y_pred = new_model.predict(X)
print(y_pred)
return new_model
if __name__ == '__main__':
model = get_model()
new_model = change_model(model)

Resources