keras fit_generator reading chunks from hdfstore - keras

I try to build a generator for a Keras model which will be trained on a large hdf store.
To speed up the training, I pre-calculated all features incl. one-hot encoding already in the hdfstore. So the call from that should be straight forward.
To feed chunks of my data into the network, I try to use fit_generator, but struggle to get it up and running.
The generator:
def myGenerator(myStore, generateFrom,generateTo):
# Create empty arrays to contain batch of features and labels#
while True:
X = pd.read_hdf(myStore,'X',start=generateFrom,stop=generateTo)
y = pd.read_hdf(myStore,'y',start=generateFrom,stop=generateTo)
yield X,y
Network and fitting:
def get_model(shape):
'''Create a keras model.'''
inputlayer = Input(shape=shape)
model = BatchNormalization()(inputlayer)
model = Dense(1024, activation='relu')(model)
model = Dropout(0.25)(model)
model = BatchNormalization()(inputlayer)
model = Dense(512, activation='relu')(model)
model = Dropout(0.25)(model)
model = BatchNormalization()(inputlayer)
model = Dense(256, activation='relu')(model)
model = Dropout(0.25)(model)
model = BatchNormalization()(inputlayer)
model = Dense(128, activation='relu')(model)
model = Dropout(0.25)(model)
# 11 because background noise has been taken out
model = Dense(2, activation='tanh')(model)
model = Model(inputs=inputlayer, outputs=model)
return model
shape = (6603,10000)
model = get_model(shape)
model.compile(loss='mean_squared_error', optimizer=Adam(), metrics=['accuracy'])
#X = generator(myStore)
#Xt = generator(myStore)
labelbinarizer = LabelBinarizer()
y = labelbinarizer.fit_transform(y)
#yt = labelbinarizer.fit_transform(yt)
generateFrom = 0
for i in range(10):
generateTo=generateFrom+10000
model.fit_generator(
generator=myGenerator(myStore,generateFrom,generateTo),
epochs=1,
steps_per_epoch=X[0].shape[0] // 1000)
generateFrom=generateTo
I have tried both, to have the fit_generator within a loop and plug in the range (as shown above), but also to handle the range inside the generator. Both does not work. Currently running into
TypeError: 'generator' object is not subscriptable
Likely I have some misunderstanding how fit_generator() is supposed to be used in this context. Most examples out there are around generating tensors from pictures.
Any hint is appreciated.
Thanks

The function read_hdf returns a panda object, you need to convert it to numpy array.

Related

How to apply triplet loss function in resnet50 for the purpose of deepranking

I try to create image embeddings for the purpose of deep ranking using a triplet loss function. The idea is that we can take a pretrained CNN (e.g. resnet50 or vgg16), remove the FC layers and add an L2 normalization function to retrieve unit vectors which can then be compared via a distance metric (e.g. cosine similarity). As far as I understand the predicted vectors that come out of a pretrained CNN are not optimal, but are a good start. By adding the triplet loss function we can re-train the network to keep similar pictures 'close' to each other and different pictures 'far' apart in the feature space. Inspired by this notebook , I tried to setup the following code, but I get an error ValueError: The name "conv1_pad" is used 3 times in the model. All layer names should be unique..
# Anchor, Positive and Negative are numpy arrays of size (200, 256, 256, 3), same for the test images
pic_size=256
def shared_dnn(inp):
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(3, pic_size, pic_size),
input_tensor=inp)
x = base_model.output
x = Flatten()(x)
x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x)
for layer in base_model.layers[15:]:
layer.trainable = False
return x
anchor_input = Input((3, pic_size,pic_size ), name='anchor_input')
positive_input = Input((3, pic_size,pic_size ), name='positive_input')
negative_input = Input((3, pic_size,pic_size ), name='negative_input')
encoded_anchor = shared_dnn(anchor_input)
encoded_positive = shared_dnn(positive_input)
encoded_negative = shared_dnn(negative_input)
merged_vector = concatenate([encoded_anchor, encoded_positive, encoded_negative], axis=-1, name='merged_layer')
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=merged_vector)
#ValueError: The name "conv1_pad" is used 3 times in the model. All layer names should be unique.
model.compile(loss=triplet_loss, optimizer=adam_optim)
model.fit([Anchor,Positive,Negative],
y=Y_dummy,
validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=512, epochs=500)
I am new to keras and I am not quite sure how to solve this. The author in the link above creates his own CNN from scratch, but I would like to build it upon resnet (or vgg16). How can I configure ResNet50 to use a triplet loss function (in the link above you find also the source code for the triplet loss function).
In your ResNet50 definition, you've written
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(3, pic_size, pic_size), input_tensor=inp)
Remove the input_tensor argument. Change input_shape=inp.
If you're using TF backend as you mentioned the input should be (256, 256, 3), then your input should be (pic_size, pic_size, 3).
def shared_dnn(inp):
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=inp)
x = base_model.output
x = Flatten()(x)
x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x)
for layer in base_model.layers[15:]:
layer.trainable = False
return x
img_shape=(256, 256, 3)
anchor_input = Input(img_shape, name='anchor_input')
positive_input = Input(img_shape, name='positive_input')
negative_input = Input(img_shape, name='negative_input')
encoded_anchor = shared_dnn(anchor_input)
encoded_positive = shared_dnn(positive_input)
encoded_negative = shared_dnn(negative_input)
merged_vector = concatenate([encoded_anchor, encoded_positive, encoded_negative], axis=-1, name='merged_layer')
model = Model(inputs=[anchor_input,positive_input, negative_input], outputs=merged_vector)
model.compile(loss=triplet_loss, optimizer=adam_optim)
model.fit([Anchor,Positive,Negative],
y=Y_dummy,
validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=512, epochs=500)
The model plot is as follows:
model_plot

Keras, Tensorflow : Merge two different model output into one

I am working on one deep learning model where I am trying to combine two different model's output :
The overall structure is like this :
So the first model takes one matrix, for example [ 10 x 30 ]
#input 1
input_text = layers.Input(shape=(1,), dtype="string")
embedding = ElmoEmbeddingLayer()(input_text)
model_a = Model(inputs = [input_text] , outputs=embedding)
# shape : [10,50]
Now the second model takes two input matrix :
X_in = layers.Input(tensor=K.variable(np.random.uniform(0,9,[10,32])))
M_in = layers.Input(tensor=K.variable(np.random.uniform(1,-1,[10,10]))
md_1 = New_model()([X_in, M_in]) #new_model defined somewhere
model_s = Model(inputs = [X_in, A_in], outputs = md_1)
# shape : [10,50]
I want to make these two matrices trainable like in TensorFlow I was able to do this by :
matrix_a = tf.get_variable(name='matrix_a',
shape=[10,10],
dtype=tf.float32,
initializer=tf.constant_initializer(np.array(matrix_a)),trainable=True)
I am not getting any clue how to make those matrix_a and matrix_b trainable and how to merge the output of both networks then give input.
I went through this question But couldn't find an answer because their problem statement is different from mine.
What I have tried so far is :
#input 1
input_text = layers.Input(shape=(1,), dtype="string")
embedding = ElmoEmbeddingLayer()(input_text)
model_a = Model(inputs = [input_text] , outputs=embedding)
# shape : [10,50]
X_in = layers.Input(tensor=K.variable(np.random.uniform(0,9,[10,10])))
M_in = layers.Input(tensor=K.variable(np.random.uniform(1,-1,[10,100]))
md_1 = New_model()([X_in, M_in]) #new_model defined somewhere
model_s = Model(inputs = [X_in, A_in], outputs = md_1)
# [10,50]
#tranpose second model output
tranpose = Lambda(lambda x: K.transpose(x))
agglayer = tranpose(md_1)
# concat first and second model output
dott = Lambda(lambda x: K.dot(x[0],x[1]))
kmean_layer = dotter([embedding,agglayer])
# input
final_model = Model(inputs=[input_text, X_in, M_in], outputs=kmean_layer,name='Final_output')
final_model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
final_model.summary()
Overview of the model :
Update:
Model b
X = np.random.uniform(0,9,[10,32])
M = np.random.uniform(1,-1,[10,10])
X_in = layers.Input(tensor=K.variable(X))
M_in = layers.Input(tensor=K.variable(M))
layer_one = Model_b()([M_in, X_in])
dropout2 = Dropout(dropout_rate)(layer_one)
layer_two = Model_b()([layer_one, X_in])
model_b_ = Model([X_in, M_in], layer_two, name='model_b')
model a
length = 150
dic_size = 100
embed_size = 12
input_text = Input(shape=(length,))
embedding = Embedding(dic_size, embed_size)(input_text)
embedding = LSTM(5)(embedding)
embedding = Dense(10)(embedding)
model_a = Model(input_text, embedding, name = 'model_a')
I am merging like this:
mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, model_b_.output])
final_model = Model(inputs=[model_b_.input[0],model_b_.input[1],model_a.input], outputs=mult)
Is it right way to matmul two keras model?
I don't know if I am merging the output correctly and the model is correct.
I would greatly appreciate it if anyone kindly gives me some advice on how should I make that matrix trainable and how to merge the model's output correctly then give input.
Thanks in advance!
Trainable weights
Ok. Since you are going to have custom trainable weights, the way to do this in Keras is creating a custom layer.
Now, since your custom layer has no inputs, we will need a hack that will be explained later.
So, this is the layer definition for the custom weights:
from keras.layers import *
from keras.models import Model
from keras.initializers import get as get_init, serialize as serial_init
import keras.backend as K
import tensorflow as tf
class TrainableWeights(Layer):
#you can pass keras initializers when creating this layer
#kwargs will take base layer arguments, such as name and others if you want
def __init__(self, shape, initializer='uniform', **kwargs):
super(TrainableWeights, self).__init__(**kwargs)
self.shape = shape
self.initializer = get_init(initializer)
#build is where you define the weights of the layer
def build(self, input_shape):
self.kernel = self.add_weight(name='kernel',
shape=self.shape,
initializer=self.initializer,
trainable=True)
self.built = True
#call is the layer operation - due to keras limitation, we need an input
#warning, I'm supposing the input is a tensor with value 1 and no shape or shape (1,)
def call(self, x):
return x * self.kernel
#for keras to build the summary properly
def compute_output_shape(self, input_shape):
return self.shape
#only needed for saving/loading this layer in model.save()
def get_config(self):
config = {'shape': self.shape, 'initializer': serial_init(self.initializer)}
base_config = super(TrainableWeights, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
Now, this layer should be used like this:
dummyInputs = Input(tensor=K.constant([1]))
trainableWeights = TrainableWeights(shape)(dummyInputs)
Model A
Having the layer defined, we can start modeling.
First, let's see the model_a side:
#general vars
length = 150
dic_size = 100
embed_size = 12
#for the model_a segment
input_text = Input(shape=(length,))
embedding = Embedding(dic_size, embed_size)(input_text)
#the following two lines are just a resource to reach the desired shape
embedding = LSTM(5)(embedding)
embedding = Dense(50)(embedding)
#creating model_a here is optional, only if you want to use model_a independently later
model_a = Model(input_text, embedding, name = 'model_a')
Model B
For this, we are going to use our TrainableWeights layer.
But first, let's simulate a New_model() as mentioned.
#simulates New_model() #notice the explicit batch_shape for the matrices
newIn1 = Input(batch_shape = (10,10))
newIn2 = Input(batch_shape = (10,30))
newOut1 = Dense(50)(newIn1)
newOut2 = Dense(50)(newIn2)
newOut = Add()([newOut1, newOut2])
new_model = Model([newIn1, newIn2], newOut, name='new_model')
Now the entire branch:
#the matrices
dummyInput = Input(tensor = K.constant([1]))
X_in = TrainableWeights((10,10), initializer='uniform')(dummyInput)
M_in = TrainableWeights((10,30), initializer='uniform')(dummyInput)
#the output of the branch
md_1 = new_model([X_in, M_in])
#optional, only if you want to use model_s independently later
model_s = Model(dummyInput, md_1, name='model_s')
The whole model
Finally, we can join the branches in a whole model.
Notice how I didn't have to use model_a or model_s here. You can do it if you want, but those submodels are not needed, unless you want later to get them individually for other usages. (Even if you created them, you don't need to change the code below to use them, they're already part of the same graph)
#I prefer tf.matmul because it's clear and understandable while K.dot has weird behaviors
mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, md_1])
#final model
model = Model([input_text, dummyInput], mult, name='full_model')
Now train it:
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
model.fit(np.random.randint(0,dic_size, size=(128,length)),
np.ones((128, 10)))
Since the output is 2D now, there is no problem about the 'categorical_crossentropy', my comment was because of doubts on the output shape.

Avoid feed_dict mechanism in static graph in tensorflow

I am trying to implement a model for generating/reconstructing samples (Variational autoencoder). During test time, I would like to be able to make the model generate new samples by feeding it a latent variable, but that requires changing the inputs to a part of the computational graph.
I could use a feed_dict to "dynamically" do that, since I cannot directly change a static graph, but I want to avoid the overhead of exchanging data between the GPU and the system RAM.
As it stands I feed the data using Iterators.
def make_mnist_dataset(batch_size, shuffle=True, include_labels=True):
"""Loads the MNIST data set and returns the relevant
iterator along with its initialization operations.
"""
# load the data
train, test = tf.keras.datasets.mnist.load_data()
# binarize and reshape the data sets
temp_train = train[0]
temp_train = (temp_train > 0.5).astype(np.float32).reshape(temp_train.shape[0], 784)
train = (temp_train, train[1])
temp_test = test[0]
temp_test = (temp_test > 0.5).astype(np.float32).reshape(temp_test.shape[0], 784)
test = (temp_test, test[1])
# prepare Dataset objects
if include_labels:
train_set = tf.data.Dataset.from_tensor_slices(train).repeat().batch(batch_size)
test_set = tf.data.Dataset.from_tensor_slices(test).repeat(1).batch(batch_size)
else:
train_set = tf.data.Dataset.from_tensor_slices(train[0]).repeat().batch(batch_size)
test_set = tf.data.Dataset.from_tensor_slices(test[0]).repeat(1).batch(batch_size)
if shuffle:
train_set = train_set.shuffle(buffer_size=int(0.5*train[0].shape[0]),
seed=123)
# make the iterator
iter = tf.data.Iterator.from_structure(train_set.output_types,
train_set.output_shapes)
data = iter.get_next()
# create initialization ops
train_init = iter.make_initializer(train_set)
test_init = iter.make_initializer(test_set)
return train_init, test_init, data
And here's the code snippet where the data being iterated over is being fed to the graph:
train_init, test_init, next_batch = make_mnist_dataset(batch_size, include_labels=True)
ops = build_graph(next_batch[0], next_batch[1], learning_rate, is_training,
latent_dim, tau, batch_size, inf_layers, gen_layers)
Is there any way to "switch" from an Iterator object to a different input source during test time, without resorting to feed_dict?

AttributeError: 'Sequential' object has no attribute '_built'

I am doing transfer learning and I am using VGG16 model. I am fine tuning my model by feature extraction. Then I have trained my final model and I have pickled its weight.
Here is my code
def prediction(array):
size = 224
array = image_array(filename , size)
print(array.shape)
array = np.array(array , dtype = np.float64)
array = np.reshape(array, (1,224,224,3))
print(array.shape)
final_array = preprocess_input(array)
vgg16 = VGG16(weights='imagenet', include_top=False)
features = vgg16.predict(final_array)
image = features.reshape(features.shape[0] , -1)
#return image
loaded_model = pickle.load(open('vgg16.sav', 'rb'))
#print(image.shape)
array = np.asarray(array)
y_predict = loaded_model.predict(array)
when i call this function I am getting error in line
y_predict = loaded_model.predict(array)
I am getting
AttributeError: 'Sequential' object has no attribute '_built'
You shouldn't use picke.dump to save the weights and load as a model. Instead use the provided functions model.save(filename) or model.save_weights(filename) to save the model or just the weights respectively. In your case you can do:
vgg16.save('vgg16.h5')
# ...
loaded_model = keras.models.load_model('vgg16.h5')
You'll need h5py package to use these functions.

Keras gets None gradient error when connecting models

I’m trying to implement a Visual Storytelling model using Keras with a hierarchical RNN model, basically Neural Image Captioner style but over a sequence of photos with a bidirectional RNN on top of the decoder RNNs.
I implemented and tested the three parts of this model, CNN, BRNN and decoder RNN separately but got this error when trying to connect them:
ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
My code are as follows:
#vgg16 model with the fc2 layer as output
cnn_base_model = self.cnn_model.base_model
brnn_model = self.brnn_model.model
rnn_model = self.rnn_model.model
cnn_part = TimeDistributed(cnn_base_model)
img_input = Input((self.story_length,) + self.cnn_model.input_shape, name='brnn_img_input')
extracted_feature = cnn_part(img_input)
#[None, 5, 512], a 512 length vector for each picture in the story
brnn_feature = brnn_model(extracted_feature)
#[None, 5, 25], input groundtruth word indices fed as input when training
decoder_input = Input((self.story_length, self.max_length), name='brnn_decoder_input')
decoder_outputs = []
for i in range(self.story_length):
#separate timesteps for decoding
decoder_input_i = Lambda(lambda x: x[:, i, :])(decoder_input)
brnn_feature_i = Lambda(lambda x: x[:, i, :])(brnn_feature)
#the problem persists when using Dense instead of the Lambda layers above
#decoder_input_i = Dense(25)(Reshape((125,))(decoder_input))
#brnn_feature_i = Dense(512)(Reshape((5 * 512,))(brnn_feature))
decoder_output_i = rnn_model([decoder_input_i, brnn_feature_i])
decoder_outputs.append(decoder_output_i)
decoder_output = Concatenate(axis=-2, name='brnn_decoder_output')(decoder_outputs)
self.model = Model([img_input, decoder_input], decoder_output)
And codes for the BRNN:
image_feature = Input(shape=(self.story_length, self.img_feature_dim,))
image_emb = TimeDistributed(Dense(self.lstm_size))(image_feature)
brnn = Bidirectional(LSTM(self.lstm_size, return_sequences=True), merge_mode='concat')(image_emb)
brnn_emb = TimeDistributed(Dense(self.lstm_size))(brnn)
self.model = Model(inputs=image_feature, outputs=brnn_emb)
And RNN:
#[None, 512], the vector to be decoded
initial_input = Input(shape=(self.input_dim,), name='rnn_initial_input')
#[None, 25], the groundtruth word indices fed as input when training
decoder_inputs = Input(shape=(None,), name='rnn_decoder_inputs')
decoder_input_masking = Masking(mask_value=0.0)(decoder_inputs)
decoder_input_embeddings = Embedding(self.vocabulary_size, self.emb_size,
embeddings_regularizer=l2(regularizer))(decoder_input_masking)
decoder_input_dropout = Dropout(.5)(decoder_input_embeddings)
initial_emb = Dense(self.emb_size,
kernel_regularizer=l2(regularizer))(initial_input)
initial_reshape = Reshape((1, self.emb_size))(initial_emb)
initial_masking = Masking(mask_value=0.0)(initial_reshape)
initial_dropout = Dropout(.5)(initial_masking)
decoder_lstm = LSTM(self.hidden_dim, return_sequences=True, return_state=True,
recurrent_regularizer=l2(regularizer),
kernel_regularizer=l2(regularizer),
bias_regularizer=l2(regularizer))
_, initial_hidden_h, initial_hidden_c = decoder_lstm(initial_dropout)
decoder_outputs, decoder_state_h, decoder_state_c = decoder_lstm(decoder_input_dropout,
initial_state=[initial_hidden_h, initial_hidden_c])
decoder_output_dense_layer = TimeDistributed(Dense(self.vocabulary_size, activation='softmax',
kernel_regularizer=l2(regularizer)))
decoder_output_dense = decoder_output_dense_layer(decoder_outputs)
self.model = Model([decoder_inputs, initial_input], decoder_output_dense)
I’m using adam as optimizer and sparse_categorical_crossentropy as loss.
At first I thought the problem is with the Lambda layers used for splitting the timesteps but the problem persists when I replaced them with Dense layers (which are guarantee
I had a similar error and it turned out I was suppose to build the layers (in my custom layer or model) in the init() like so:
self.lstm_custom_1 = keras.layers.LSTM(128,batch_input_shape=batch_input_shape, return_sequences=False,stateful=True)
self.lstm_custom_1.build(batch_input_shape)
self.dense_custom_1 = keras.layers.Dense(32, activation = 'relu')
self.dense_custom_1.build(input_shape=(batch_size, 128))```
The issue is actually with the Embedding layer, I think. Gradients can't pass through an Embedding layer, so unless it's the first layer in the model it won't work.

Resources