Recently, I made a GRU model with Pytorch. When the model has one nn.GRU layer, it runs well. But when there are more than one GRU layer, the model would report error like ' CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)' at the start of training. The detailed model architecture is attached at the end. My CUDA vesion is 11.0 and Pytorch vesion is 1.7. Does it mean I can't use GRU model with more than one nn.GRU layer?
class GRU(nn.Module):
def __init__(self):
super(GRU,self).__init__()
self.gru1 = nn.GRU(3,50,1,batch_first=True)
self.gru2 = nn.GRU(50,10,1,batch_first=True)
self.fc1 = nn.Linear(10,2)
def forward(self,x):
x, self.hidsta1 = self.gru1(x)
x, self.hidsta2 = self.gru2(x)
s,b,h = x.shape
x = x.reshape(s*b, h)
x = self.fc1(x)
x = x.reshape(s,b,-1)
return x
Related
I'm having trouble understanding how batches play a role into the Pytorch framework.
In this model:
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
# 28x28x1 => 26x26x32
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
self.d1 = nn.Linear(26 * 26 * 32, 128)
self.d2 = nn.Linear(128, 10)
def forward(self, x):
# 32x1x28x28 => 32x32x26x26
x = self.conv1(x)
x = F.relu(x)
# flatten => 32 x (32*26*26)
x = x.flatten(start_dim = 1)
#x = x.view(32, -1)
# 32 x (32*26*26) => 32x128
x = self.d1(x)
x = F.relu(x)
# logits => 32x10
logits = self.d2(x)
out = F.softmax(logits, dim=1)
return out
In the forward definition, we pass in some x, ie. aggregated images for a batch from a DataLoader. Here, the 32x1x28x28 dimension indicates that there are 32 images in a batch. Do we just ignore this fact and Pytorch handles applying Conv2d to each sample? The forward propagation seems to be just relative to a single image.
Indeed, the network is agnostic to batches: The model is designed to classify a single image.
So why do we need batches for?
Each model has weights (aka parameters) and one needs to optimize the weights using the training images so that the model will classify images as correctly as possible.
This optimization process is usually carried out using Stochastic Gradient Descent (SGD): we are using the current values of the weights to classify a batch of images. Using the prediction the current model made, and the expected predictions we know should be (the "labels") we can compute a gradient of the weights and improve the model.
I try to create image embeddings for the purpose of deep ranking using a triplet loss function. The idea is that we can take a pretrained CNN (e.g. resnet50 or vgg16), remove the FC layers and add an L2 normalization function to retrieve unit vectors which can then be compared via a distance metric (e.g. cosine similarity). As far as I understand the predicted vectors that come out of a pretrained CNN are not optimal, but are a good start. By adding the triplet loss function we can re-train the network to keep similar pictures 'close' to each other and different pictures 'far' apart in the feature space. Inspired by this notebook , I tried to setup the following code, but I get an error ValueError: The name "conv1_pad" is used 3 times in the model. All layer names should be unique..
# Anchor, Positive and Negative are numpy arrays of size (200, 256, 256, 3), same for the test images
pic_size=256
def shared_dnn(inp):
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(3, pic_size, pic_size),
input_tensor=inp)
x = base_model.output
x = Flatten()(x)
x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x)
for layer in base_model.layers[15:]:
layer.trainable = False
return x
anchor_input = Input((3, pic_size,pic_size ), name='anchor_input')
positive_input = Input((3, pic_size,pic_size ), name='positive_input')
negative_input = Input((3, pic_size,pic_size ), name='negative_input')
encoded_anchor = shared_dnn(anchor_input)
encoded_positive = shared_dnn(positive_input)
encoded_negative = shared_dnn(negative_input)
merged_vector = concatenate([encoded_anchor, encoded_positive, encoded_negative], axis=-1, name='merged_layer')
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=merged_vector)
#ValueError: The name "conv1_pad" is used 3 times in the model. All layer names should be unique.
model.compile(loss=triplet_loss, optimizer=adam_optim)
model.fit([Anchor,Positive,Negative],
y=Y_dummy,
validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=512, epochs=500)
I am new to keras and I am not quite sure how to solve this. The author in the link above creates his own CNN from scratch, but I would like to build it upon resnet (or vgg16). How can I configure ResNet50 to use a triplet loss function (in the link above you find also the source code for the triplet loss function).
In your ResNet50 definition, you've written
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(3, pic_size, pic_size), input_tensor=inp)
Remove the input_tensor argument. Change input_shape=inp.
If you're using TF backend as you mentioned the input should be (256, 256, 3), then your input should be (pic_size, pic_size, 3).
def shared_dnn(inp):
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=inp)
x = base_model.output
x = Flatten()(x)
x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x)
for layer in base_model.layers[15:]:
layer.trainable = False
return x
img_shape=(256, 256, 3)
anchor_input = Input(img_shape, name='anchor_input')
positive_input = Input(img_shape, name='positive_input')
negative_input = Input(img_shape, name='negative_input')
encoded_anchor = shared_dnn(anchor_input)
encoded_positive = shared_dnn(positive_input)
encoded_negative = shared_dnn(negative_input)
merged_vector = concatenate([encoded_anchor, encoded_positive, encoded_negative], axis=-1, name='merged_layer')
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=merged_vector)
model.compile(loss=triplet_loss, optimizer=adam_optim)
model.fit([Anchor,Positive,Negative],
y=Y_dummy,
validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=512, epochs=500)
The model plot is as follows:
model_plot
I am writing this post after reading similar questions and answers that didn't work in my case. You may notice that I defined the input shape in the first layer.
I created a very small CNN in Keras, as follows:
import tensorflow as tf
class MyNet(tf.keras.Model):
def __init__(self):
super(MyNet, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(32, 5, strides = (2,2), data_format = 'channels_first', input_shape = (3,224,224))
self.bn1 = tf.keras.layers.BatchNormalization(axis = 1)
self.fc1 = tf.keras.layers.Dense(10)
self.globalavg = tf.keras.layers.GlobalAveragePooling2D(data_format = 'channels_first')
def call(self, inputs):
x = self.conv1(inputs)
x = self.bn1(x)
x = tf.keras.activations.relu(x)
x = self.globalavg(x)
return self.fc1(x)
Then I fed something into it and printed the result successfully (the weights are probably random at the moment, but that's ok):
image = tf.ones(shape = (1, 3, 224, 224)) # Defined "channels first" when created the layers
mynet = MyNet()
outputs = mynet(image)
print(tf.keras.backend.eval(outputs))
The result I saw at this step was the 10 outputs of the fc1 layer:
[[-1.1747773 -0.21640654 -0.16266493 -0.44879064 -0.642066 0.78132695 -0.03920581 -0.30874395 -0.04169023 -0.10409291]]
Then I tried to save the model with its weights, by calling mynet.save('mynet.hdf5'), and got the following error:
NotImplementedError: Currently `save` requires model to be a graph network. Consider using `save_weights`, in order to save the weights of the model.
Note that I am new to Keras and that most of my experience is with PyTorch.
What am I doing wrong?
Update:
Following #ikibir's answer, I redefined the network as a sequential network:
myNetAsSeq = tf.keras.models.Sequential()
myNetAsSeq.add(tf.keras.layers.Conv2D(32, 5, strides = (2,2), data_format = 'channels_first', input_shape = (3,224,224)))
myNetAsSeq.add(tf.keras.layers.BatchNormalization(axis = 1))
myNetAsSeq.add(tf.keras.layers.Activation('relu'))
myNetAsSeq.add(tf.keras.layers.GlobalAveragePooling2D(data_format = 'channels_first'))
myNetAsSeq.add(tf.keras.layers.Dense(10))
This time calling myNetAsSeq.save('mynet.hdf5') succeeded.
I am not sure about my answer but i believe you don't create a model you are just creating each layer individually, when you run 'call' function you just pass the variables to this layers.
In keras you should use
model = models.Sequential()
for create model and you should use
model.add()
to add layers
then you can save this model
I have tried various ways to convert my Keras model to core ml using core ml tools, but it gives me this error.
Keras layer '' not supported.
I am trying to convert .h5 model to core ml so that I can use it in my app but it gives me some errors which I am not able to solve. Also, I have tried converting .h5 model to PB(frozen graph) but got errors over there.
This is how my model looks.
img_input = layers.Input(shape=(224, 224, 3))
seed = 230
numpy.random.seed(seed)
x = layers.Conv2D(16, 3, activation='relu')(img_input)
x = layers.MaxPooling2D(2)(x)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.MaxPooling2D(2)(x)
x = layers.Flatten()(x)
x = layers.Dense(128, activation='relu')(x)
x = layers.Dropout(0.4)(x)
output = layers.Dense(3, activation='softmax')(x)
model = Model(img_input, output)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
this is the code i found over the web to convert Keras model to core ml tools.
import keras
import coremltools
fcn_mlmodel = coremltools.converters.keras.convert( model, input_names = 'image', image_input_names = 'image', output_names = 'class_label' )
fcn_mlmodel.input_description['image']="Image size (224,224,3)"
fcn_mlmodel.output_description['class_label']=" Class label"
fcn_mlmodel.save("Test_my.mlmodel")
Err: Keras layer '' not supported. –
Can't recreate your problem, copied everything. Maybe it's a problem with your versions:
pip install -U coremltools==3.0b6 tensorflow==1.13.1 keras==2.2.4 works good together.
Hello I have a some question for keras.
currently i want implement some network
using same cnn model, and use two images as input of cnn model
and use two result of cnn model, provide to Dense model
for example
def cnn_model():
input = Input(shape=(None, None, 3))
x = Conv2D(8, (3, 3), strides=(1, 1))(input)
x = GlobalAvgPool2D()(x)
model = Model(input, x)
return model
def fc_model(cnn1, cnn2):
input_1 = cnn1.output
input_2 = cnn2.output
input = concatenate([input_1, input_2])
x = Dense(1, input_shape=(None, 16))(input)
x = Activation('sigmoid')(x)
model = Model([cnn1.input, cnn2.input], x)
return model
def main():
cnn1 = cnn_model()
cnn2 = cnn_model()
model = fc_model(cnn1, cnn2)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x=[image1, image2], y=[1.0, 1.0], batch_size=1, ecpochs=1)
i want to implement model something like this, and train models
but i got error message like below :
'All layer names should be unique'
Actually i want use only one CNN model as feature extractor and finally use two features to predict one float value as 0.0 ~ 1.0
so whole system -->>
using two images and extract features from same CNN model, and features are provided to Dense model to get one floating value
Please, help me implement this system and how to train..
Thank you
See the section of the Keras documentation on shared layers:
https://keras.io/getting-started/functional-api-guide/
A code snippet from the documentation above demonstrating this:
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)