I have trained a model and I want to add more units to it's hidden units and train it for some more epochs. I am implementing a constructive learning algorithm. How can I add neuron to an existing model hidden layer ? And also is there a way to only train the added units parameters and other parameters get freezed ? (In KERAS)
def create_first_sub_NN(X):
sub_input = tf.keras.Input(shape=(X.shape[1],))
h = Dense(1, activation="sigmoid",name="hidden")(sub_input)
h = tf.keras.Model(inputs=sub_input, outputs=h)
m_combined = tf.keras.layers.concatenate([h.input, h.output])
out = Dense(1, activation="relu")(m_combined)
out = tf.keras.Model(inputs=sub_input, outputs=out)
return out
def train_current_model(model,input_groups,Y,error_thr):
opt = keras.optimizers.Adam(learning_rate=0.01)
callbacks = stopAtLossValue()
# overfitCallback = EarlyStopping(monitor='loss', min_delta=5,
patience=10) # if for 10 epochs the error did not decreased more than 5, then stop the current network training
model.compile(optimizer=opt, loss='mean_absolute_error')
model.fit(input_groups, train_label, epochs=100, batch_size=32,callbacks=[callbacks])
enter code here
model = create_first_sub_NN(X1_train)
keras.utils.plot_model(model, "first.png",show_shapes=True)
print(model.summary())
list_of_inputs = [sub_X_list[0]]
train_current_model(model, list_of_inputs, train_label, 0.1)
# how to add number of units in my hidden layer for the
enter code here
I want to add neuron to my hidden layer repetitively, until my network error gets below the threshold.
I solved the problem. Instead of adding a neuron to the current layer, We can add another Dense layer which is connected to the next and previous layer and then concatenate the new layer with the old one.
Related
In the model I am constructing, I have the following layer:
y = layers.Dense(10, activation="softmax")(x)
And I want the next layer of this model to be an Embedding layer that "represent" the choice made by the Dense layer.
I.e, I want
to sample a choice from y (based on the probability "represented" by the values of the softmax)
to turn this choice into an Embedding Layer with vocabulary size 10.
Any idea how to do this ?
Regards
Initial answer
Add a layer that takes the argmax of the output of the dense layer before feeding it into the embedding layer to propagate the most likely category label:
import tensorflow as tf
from keras import backend as K
# generate some data
BATCH_SIZE,INPUT_DIM = (4,2)
x = tf.random.uniform([BATCH_SIZE,INPUT_DIM])
# model
NUM_CLASSES = 10
EMBEDDING_DIM = 10
dense = tf.keras.layers.Dense(NUM_CLASSES,activation='softmax')(x)
argmax = tf.keras.layers.Lambda(lambda x: K.argmax(x,axis=-1))(dense)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(argmax)
Updated answer
If you want to propagate a randomly sampled category label instead of the most likely category label, you can do so by using tf.random.categorical. Note that tf.random.categorical takes logits as inputs, so you don't need the softmax activation at the end of the dense layer.
NUM_CLASSES = 10
EMBEDDING_DIM = 10
logits = tf.keras.layers.Dense(NUM_CLASSES)(x)
sample = tf.keras.layers.Lambda(lambda logits: tf.squeeze(tf.random.categorical(logits, 1)))(logits)
emb = tf.keras.layers.Embedding(NUM_CLASSES,EMBEDDING_DIM)(sample)
I would like to know how to stack many layers of RNN but every layer are the same RNN. I want every layer share the same weight. I have read stack LSTM and RNN, but I found that each layer was not the same.
1 layer code:
inputs = keras.Input(shape=(maxlen,), batch_size = batch_size)
Emb_layer = layers.Embedding(max_features,word_dim)
Emb_output = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=False,stateful =False)
first_layer_output = first_layer(Emb_output)
dense_layer = layers.Dense(1, activation='sigmoid')
dense_output = dense_layer(first_layer_output )
model = keras.Model(inputs=inputs, outputs=dense_output)
model.summary()
enter image description here
RNN 1 layer
inputs = keras.Input(shape=(maxlen,), batch_size = batch_size)
Emb_layer = layers.Embedding(max_features,word_dim)
Emb_output = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=True,stateful =True)
first_layer_output = first_layer(Emb_output)
first_layer_state = first_layer.states
second_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=False,stateful =False)
second_layer_set_state = second_layer(first_layer_output, initial_state=first_layer_state)
dense_layer = layers.Dense(1, activation='sigmoid')
dense_output = dense_layer(second_layer_set_state )
model = keras.Model(inputs=inputs, outputs=dense_output)
model.summary()
enter image description here
Stack RNN 2 layer.
For example, I want to build two layers RNN, but the first layer and the second must have the same weight, such that when I update the weight in the first layer the second layer must be updated and share the same value. As far as I know, TF has RNN.state. It returns the value from the previous layer. However, when I use this, it seems that each layer is treated independently. The 2-layer RNN that I want should have trainable parameters equal to the 1-layer since they shared the same weight, but this did not work.
You can view the layer object as a container for the weights that knows how to apply the weights. You can use the layer object as many times as you want. Assuming the embedding and the RNN dimension are the same, you can do:
states = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden, use_bias=True, return_sequences=True)
for _ in range(10):
states = first_layer(states)
There is no reason to set stateful to true. This is used when you split long sequences into multiple batches and what the RNN to remember the state between batches, so you do not have yo manually set initial states. You can get the final state of the RNN (that you wany you want to use for classification) by simply indexing the last position from states.
The model architecture is Conv2D with 32 filters -> Flatten -> Dense -> Compile -> Fit
I deleted the last filter from the first layer and the corresponding Fully connected layer in this model using
w,b = model.layers[0].get_weights()
w = np.delete(w, [32], -1)
b = np.delete(b, [32], 0)
w_2,b_2 = model.layers[2].get_weights()
w_2 = w_2[:20956,:]
I use 20956 because the output of the first layer is 26 x 26 x 31, which is an image dimension in 2D multiply by a number of channels.
I create a new model called model_1 using:
# Input stays the same
model_1 = Sequential()
# New modified conv layer
model_1.add(Conv2D(31, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape,
kernel_initializer='he_normal'))
model_1.add(Flatten())
model_1.add(Dense(10, activation='softmax'))
model_1.layers[0].set_weights([w,b])
model_1.layers[2].set_weights([w_2,b_2])
model_1.compile(loss="categorical_crossentropy",
optimizer="Adam",
metrics=['accuracy'])
I can confirm that the weights are the same by doing model_1.layers[0].get_weights()[0] == model.layers[0].get_weights()[0][:,:,:,:31] and model_1.layers[2].get_weights()[0] == model.layers[2].get_weights()[0][:20956,:]which returns True.
When I do
score = model_1.evaluate(x_test_reshape, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
score = model.evaluate(x_test_reshape, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
The accuracy drops from 98% to 10%, any ideas why?
What you are essentially doing is removing a channel from the last convolutional layer. Intuitively it may sound like this is not a big deal and the remaining 31 channel will still make the network perform well. In reality all convolution channels interact with each other in the dense layer that follows, but since this interaction is missing one of the channels of information it was optimized on it's accuracy will drop.
Another way to think of this is to view your network as a function of sequential steps that takes as input an image and as output a label with 98% accuracy. Removing a fraction (1/32) of calculations in this function will change the outcomes, and likely give worse results since the function is optimized with these calculations still present. You are removing a part of the function that is apparently crucial to reach the high accuracy.
You can test this by training your new model with 31 channels for a short time. Since the new model only needs to re-learn the function of the deleted channel, it should quickly reach the high performance again.
I am trying to create a ResNet50 model for a regression problem, with an output value ranging from -1 to 1.
I omitted the classes argument, and in my preprocessing step I resize my images to 224,224,3.
I try to create the model with
def create_resnet(load_pretrained=False):
if load_pretrained:
weights = 'imagenet'
else:
weights = None
# Get base model
base_model = ResNet50(weights=weights)
optimizer = Adam(lr=1e-3)
base_model.compile(loss='mse', optimizer=optimizer)
return base_model
and then create the model, print the summary and use the fit_generator to train
history = model.fit_generator(batch_generator(X_train, y_train, 100, 1),
steps_per_epoch=300,
epochs=10,
validation_data=batch_generator(X_valid, y_valid, 100, 0),
validation_steps=200,
verbose=1,
shuffle = 1)
I get an error though that says
ValueError: Error when checking target: expected fc1000 to have shape (1000,) but got array with shape (1,)
Looking at the model summary, this makes sense, since the final Dense layer has an output shape of (None, 1000)
fc1000 (Dense) (None, 1000) 2049000 avg_pool[0][0]
But I can't figure out how to modify the model. I've read through the Keras documentation and looked at several examples, but pretty much everything I see is for a classification model.
How can I modify the model so it is formatted properly for regression?
Your code is throwing the error because you're using the original fully-connected top layer that was trained to classify images into one of 1000 classes. To make the network working, you need to replace this top layer with your own which should have the shape compatible with your dataset and task.
Here is a small snippet I was using to create an ImageNet pre-trained model for the regression task (face landmarks prediction) with Keras:
NUM_OF_LANDMARKS = 136
def create_model(input_shape, top='flatten'):
if top not in ('flatten', 'avg', 'max'):
raise ValueError('unexpected top layer type: %s' % top)
# connects base model with new "head"
BottleneckLayer = {
'flatten': Flatten(),
'avg': GlobalAvgPooling2D(),
'max': GlobalMaxPooling2D()
}[top]
base = InceptionResNetV2(input_shape=input_shape,
include_top=False,
weights='imagenet')
x = BottleneckLayer(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='linear')(x)
model = Model(inputs=base.inputs, outputs=x)
return model
In your case, I guess you only need to replace InceptionResNetV2 with ResNet50. Essentially, you are creating a pre-trained model without top layers:
base = ResNet50(input_shape=input_shape, include_top=False)
And then attaching your custom layer on top of it:
x = Flatten()(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='sigmoid')(x)
model = Model(inputs=base.inputs, outputs=x)
That's it.
You also can check this link from the Keras repository that shows how ResNet50 is constructed internally. I believe it will give you some insights about the functional API and layers replacement.
Also, I would say that both regression and classification tasks are not that different if we're talking about fine-tuning pre-trained ImageNet models. The type of task mostly depends on your loss function and the top layer's activation function. Otherwise, you still have a fully-connected layer with N outputs but they are interpreted in a different way.
Hello I have a some question for keras.
currently i want implement some network
using same cnn model, and use two images as input of cnn model
and use two result of cnn model, provide to Dense model
for example
def cnn_model():
input = Input(shape=(None, None, 3))
x = Conv2D(8, (3, 3), strides=(1, 1))(input)
x = GlobalAvgPool2D()(x)
model = Model(input, x)
return model
def fc_model(cnn1, cnn2):
input_1 = cnn1.output
input_2 = cnn2.output
input = concatenate([input_1, input_2])
x = Dense(1, input_shape=(None, 16))(input)
x = Activation('sigmoid')(x)
model = Model([cnn1.input, cnn2.input], x)
return model
def main():
cnn1 = cnn_model()
cnn2 = cnn_model()
model = fc_model(cnn1, cnn2)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x=[image1, image2], y=[1.0, 1.0], batch_size=1, ecpochs=1)
i want to implement model something like this, and train models
but i got error message like below :
'All layer names should be unique'
Actually i want use only one CNN model as feature extractor and finally use two features to predict one float value as 0.0 ~ 1.0
so whole system -->>
using two images and extract features from same CNN model, and features are provided to Dense model to get one floating value
Please, help me implement this system and how to train..
Thank you
See the section of the Keras documentation on shared layers:
https://keras.io/getting-started/functional-api-guide/
A code snippet from the documentation above demonstrating this:
# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)
# We can then concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)
# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)