I'm making a binary image classifier. I'm just using a pretrained model to start and change the last fully connected layer to predict between 2 classes, which I'm told requires the last layer to be the number of features and then the number of classes.
model = models.resnet18(pretrained=True, progress=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)
I'm using a batch size of 6 so my model predicted probability of one batch is like so
tensor([[-0.4717, -0.2232],
[-0.6481, -0.2630],
[-0.2007, -0.1596],
[ 0.0277, -0.0759],
[-0.3314, -0.1211],
[-0.1722, -0.5304]],
and my ground truth labels are torch.tensor([0, 0, 0, 0, 0, 0])
For BCELoss I get an error that says
Using a target size (torch.Size([6])) that is different to the input size (torch.Size([6, 2])) is deprecated. Please ensure they have the same size.
However, for CrossEntropyLoss this works just fine.
The same happens with the example below
outputs = torch.randn(3, 2, 1)
target = torch.empty(3, 1, dtype=torch.long).random_(2)
criterion = nn.CrossEntropyLoss(reduction='mean')
print(outputs)
print(target)
loss = criterion(outputs, target)
print(loss)
outputs = torch.randn(3, 2, 1)
target = torch.empty(3, 1, dtype=torch.long).random_(2)
criterion = torch.nn.BCELoss()
print(outputs)
print(target)
loss = criterion(outputs, target)
print(loss)
Its because ,if using BCE , the last layer should have only single neuron i.e
model = models.resnet18(pretrained=True, progress=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 1)
And when using CE,
model = models.resnet18(pretrained=True, progress=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)
Related
I have work on Autoencoder typed model with the attention method. Around 10000 batches of data are fed into the model and each batch contains 30 images (30 is the "step_size" in ConvLSTM) with a shape of (5, 5, 3 [R,G,B]).
Therefore, the array is of shape (10000, 30, 5, 5, 3) (batch_size, step_size, image_height, image_width, scale).
I intentionally made an output array shape as (1,5,5,3), because each image has to be handled independently to apply attention method to.
When I link all operations with tf.keras.Model such that its input has the shape of (10000,30,5,5,3) and the output shape of (1,5,5,3).
history = model.fit(train_data, train_data, batch_size = 1, epochs = 3)
I am trying to modify arguments in Model module, but it seems not working because the output shape is not the same as the input.
Are there any possible ways to feed data one by one?
I am eventually running a code something like:
model = keras.Model(intput, output)
model.compile(optimizer='adam',loss= tf.keras.losses.MSE)
history = model.fit(train_data, train_data, batch_size = 1, epochs = 3)
It could've done with GradientTape, feeding one by one.
def train(loss, model, opt, x_inp):
with tf.GradientTape() as tape:
gradients = tape.gradient(loss(model, x_inp), model.trainable_variables)
gradient_variables = zip(gradients, model.trainable_variables)
opt.apply_gradients(gradient_variables)
opt = tf.optimizers.Adam(learning_rate=learning_rate)
import datetime
current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
train_summary_writer = tf.summary.create_file_writer(train_log_dir)
epochs = 3
with train_summary_writer.as_default():
with tf.summary.record_if(True):
for epoch in range(epochs):
for train_id in range(0, len(batch_data)):
x_inp = np.reshape(np.asarray(batch_data), [-1, step_max, sensor_n, sensor_n, scale_n])
train(loss, model, opt, x_inp)
loss_values = loss(model, x_inp)
reconstructed = np.reshape(model(x_inp), [1, sensor_n, sensor_n, scale_n])
print("loss : {}".format(loss_values.numpy()))
i've a CRNN model for text recognition, it was published on Github, trained on english language,
Now i'm doing the same thing using this algorithm but for arabic.
My ctc function is:
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
My Model is:
def get_Model(training):
img_w = 128
img_h = 64
# Network parameters
conv_filters = 16
kernel_size = (3, 3)
pool_size = 2
time_dense_size = 32
rnn_size = 128
if K.image_data_format() == 'channels_first':
input_shape = (1, img_w, img_h)
else:
input_shape = (img_w, img_h, 1)
# Initialising the CNN
act = 'relu'
input_data = Input(name='the_input', shape=input_shape, dtype='float32')
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv1')(input_data)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner)
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv2')(inner)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)
conv_to_rnn_dims = (img_w // (pool_size ** 2), (img_h // (pool_size ** 2)) * conv_filters)
inner = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(inner)
# cuts down input size going into RNN:
inner = Dense(time_dense_size, activation=act, name='dense1')(inner)
# Two layers of bidirectional GRUs
# GRU seems to work as well, if not better than LSTM:
gru_1 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru1')(inner)
gru_1b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru1_b')(inner)
gru1_merged = add([gru_1, gru_1b])
gru_2 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru2')(gru1_merged)
gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru2_b')(gru1_merged)
# transforms RNN output to character activations:
inner = Dense(num_classes+1, kernel_initializer='he_normal',
name='dense2')(concatenate([gru_2, gru_2b]))
y_pred = Activation('softmax', name='softmax')(inner)
Model(inputs=input_data, outputs=y_pred).summary()
labels = Input(name='the_labels', shape=[30], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
# clipnorm seems to speeds up convergence
# the loss calc occurs elsewhere, so use a dummy lambda func for the loss
if training:
return Model(inputs=[input_data, labels, input_length, label_length], outputs=loss_out)
return Model(inputs=[input_data], outputs=y_pred)
Then i compile it with SGD optimizer (Tried SGD,adam)
sgd = SGD(lr=0.0000002, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)
Then i fit the model with my training set (Images of words up to 30 characters) into (sequence of labels of 30
model.fit_generator(generator=tiger_train.next_batch(),
steps_per_epoch=int(tiger_train.n / batch_size),
epochs=30,
callbacks=[checkpoint],
validation_data=tiger_val.next_batch(),
validation_steps=int(tiger_val.n / val_batch_size))
Once it starts, it give me loss = inf, after many searches, i didn't find any similar problem.
So my questions is, how can i solve this, what can make a ctc_loss compute an infinite cost?
Thanks in advance
I found the problem, it was dimensions problem,
For R-CNN OCR using CTC layer, if you are detecting a sequence with length n, you should have an image with at least a width of (2*n-1). The more the better till you reach the best image/timesteps ratio to let the CTC layer able to recognize the letter correctly. If image with is less than (2*n-1), it will give a nan loss.
This error is happened when image text have two equal characters in the same sequence e.g happen --> pp. for so that you can remove data that has this characteristic.
Well my neural network is as follows :
# Leaks data input is a 2-D vector of window*size*341 features
# Reshape to match picture format [Height x Width x Channel]
# Tensor input become 4-D: [Batch Size, Height, Width, Channel]
x = tf.reshape(x, shape= [-1, 16, 341, 2])
# Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 6, 2, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 64 filters and a kernel size of 3
conv2 = tf.layers.conv2d(conv1, 8, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in tf contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
# 1-layer LSTM with n_hidden units.
out = tf.layers.dense(fc1, n_classes)
it predicts a multi-label classification vector on len = 339, first i wanted to make sure that i'm fully able to overfit small sample of data to make sure that every thing work okey and well defined.
I trained my neural network on 1700 len data,to measure my model performance i added accuracy as follow :
logits_train = conv_net(features, num_classes, dropout, reuse=False,
is_training=True)
logits_test = conv_net(features, num_classes, dropout, reuse=True,
is_training=False)
# Predictions
pred_classes = tf.cast(tf.greater(logits_test,0.5), tf.float32)
pred_probas = tf.nn.sigmoid(logits_test)
# If prediction mode, early return
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode, predictions=pred_classes)
# Define loss and optimizer
#tf.one_hot(tf.cast(labels,dtype=tf.int32),depth=2)
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.cast(labels,dtype=tf.float32),logits=logits_train))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op,global_step=tf.train.get_global_step())
# Evaluate the accuracy of the model
accuracy = tf.metrics.accuracy(labels=labels , predictions = pred_classes )
#correct_prediction = tf.equal(tf.round(tf.nn.sigmoid(logits_test)), tf.round(labels))
#accuracy1 = tf.metrics.mean(tf.cast(correct_prediction, tf.float32))
#acc_op = tf.metrics.mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred_classes,labels=labels))
# TF Estimators requires to return a EstimatorSpec, that specify
# the different ops for training, evaluating, ...
estim_specs = tf.estimator.EstimatorSpec(
mode=mode,
predictions=pred_probas,
loss=loss_op,
train_op=train_op,
eval_metric_ops={'accuracy': accuracy})
return estim_specs
The problem is that with few epochs the performance seems to be very good
for i in range(1,50):
print('Epoch',(i+1))
input_fn = tf.estimator.inputs.numpy_input_fn(x= curr_data_batch,y=curr_target_batch[:,:339] ,batch_size=96, shuffle=False)
model.train(input_fn=input_fn)
if (i+1) % 10 :
# eval the model
eval_model = model.evaluate(input_fn=input_fn)
print('Loss ,',eval_model['loss'] )
print('accuracy ,',eval_model['accuracy'] )
Loss , 0.029562088
accuracy , 0.9958855
Epoch 3:
Loss , 0.028194984
accuracy , 0.99588597
Epoch 4:
Loss , 0.027557796
accuracy , 0.9958862
but when i try to predict same training data i got fully oposet metrics
loss = 0.65
accuracy = 0.33
I don't know where this issue come from did i miss defined something or no ?
Ty
I'm working on a RNN architecture which does speech enhancement. The dimensions of the input is [XX, X, 1024] where XX is the batch size and X is the variable sequence length.
The input to the network is positive valued data and the output is masked binary data(IBM) which is later used to construct enhanced signal.
For instance, if the input to network is [10, 65, 1024] the output will be [10,65,1024] tensor with binary values. I'm using Tensorflow with mean squared error as loss function. But I'm not sure which activation function to use here(which keeps the outputs either zero or one), Following is the code I've come up with so far
tf.reset_default_graph()
num_units = 10 #
num_layers = 3 #
dropout = tf.placeholder(tf.float32)
cells = []
for _ in range(num_layers):
cell = tf.contrib.rnn.LSTMCell(num_units)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob = dropout)
cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)
X = tf.placeholder(tf.float32, [None, None, 1024])
Y = tf.placeholder(tf.float32, [None, None, 1024])
output, state = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
out_size = Y.get_shape()[2].value
logit = tf.contrib.layers.fully_connected(output, out_size)
prediction = (logit)
flat_Y = tf.reshape(Y, [-1] + Y.shape.as_list()[2:])
flat_logit = tf.reshape(logit, [-1] + logit.shape.as_list()[2:])
loss_op = tf.losses.mean_squared_error(labels=flat_Y, predictions=flat_logit)
#adam optimizier as the optimization function
optimizer = tf.train.AdamOptimizer(learning_rate=0.001) #
train_op = optimizer.minimize(loss_op)
#extract the correct predictions and compute the accuracy
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
Also my reconstruction isn't good. Can someone suggest on improving the model?
If you want your outputs to be either 0 or 1, to me it seems a good idea to turn this into a classification problem. To this end, I would use a sigmoidal activation and cross entropy:
...
prediction = tf.nn.sigmoid(logit)
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=Y, logits=logit))
...
In addition, from my point of view the hidden dimensionality (10) of your stacked RNNs seems quite small for such a big input dimensionality (1024). However this is just a guess, and it is something that needs to be tuned.
I’m trying to implement a Visual Storytelling model using Keras with a hierarchical RNN model, basically Neural Image Captioner style but over a sequence of photos with a bidirectional RNN on top of the decoder RNNs.
I implemented and tested the three parts of this model, CNN, BRNN and decoder RNN separately but got this error when trying to connect them:
ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
My code are as follows:
#vgg16 model with the fc2 layer as output
cnn_base_model = self.cnn_model.base_model
brnn_model = self.brnn_model.model
rnn_model = self.rnn_model.model
cnn_part = TimeDistributed(cnn_base_model)
img_input = Input((self.story_length,) + self.cnn_model.input_shape, name='brnn_img_input')
extracted_feature = cnn_part(img_input)
#[None, 5, 512], a 512 length vector for each picture in the story
brnn_feature = brnn_model(extracted_feature)
#[None, 5, 25], input groundtruth word indices fed as input when training
decoder_input = Input((self.story_length, self.max_length), name='brnn_decoder_input')
decoder_outputs = []
for i in range(self.story_length):
#separate timesteps for decoding
decoder_input_i = Lambda(lambda x: x[:, i, :])(decoder_input)
brnn_feature_i = Lambda(lambda x: x[:, i, :])(brnn_feature)
#the problem persists when using Dense instead of the Lambda layers above
#decoder_input_i = Dense(25)(Reshape((125,))(decoder_input))
#brnn_feature_i = Dense(512)(Reshape((5 * 512,))(brnn_feature))
decoder_output_i = rnn_model([decoder_input_i, brnn_feature_i])
decoder_outputs.append(decoder_output_i)
decoder_output = Concatenate(axis=-2, name='brnn_decoder_output')(decoder_outputs)
self.model = Model([img_input, decoder_input], decoder_output)
And codes for the BRNN:
image_feature = Input(shape=(self.story_length, self.img_feature_dim,))
image_emb = TimeDistributed(Dense(self.lstm_size))(image_feature)
brnn = Bidirectional(LSTM(self.lstm_size, return_sequences=True), merge_mode='concat')(image_emb)
brnn_emb = TimeDistributed(Dense(self.lstm_size))(brnn)
self.model = Model(inputs=image_feature, outputs=brnn_emb)
And RNN:
#[None, 512], the vector to be decoded
initial_input = Input(shape=(self.input_dim,), name='rnn_initial_input')
#[None, 25], the groundtruth word indices fed as input when training
decoder_inputs = Input(shape=(None,), name='rnn_decoder_inputs')
decoder_input_masking = Masking(mask_value=0.0)(decoder_inputs)
decoder_input_embeddings = Embedding(self.vocabulary_size, self.emb_size,
embeddings_regularizer=l2(regularizer))(decoder_input_masking)
decoder_input_dropout = Dropout(.5)(decoder_input_embeddings)
initial_emb = Dense(self.emb_size,
kernel_regularizer=l2(regularizer))(initial_input)
initial_reshape = Reshape((1, self.emb_size))(initial_emb)
initial_masking = Masking(mask_value=0.0)(initial_reshape)
initial_dropout = Dropout(.5)(initial_masking)
decoder_lstm = LSTM(self.hidden_dim, return_sequences=True, return_state=True,
recurrent_regularizer=l2(regularizer),
kernel_regularizer=l2(regularizer),
bias_regularizer=l2(regularizer))
_, initial_hidden_h, initial_hidden_c = decoder_lstm(initial_dropout)
decoder_outputs, decoder_state_h, decoder_state_c = decoder_lstm(decoder_input_dropout,
initial_state=[initial_hidden_h, initial_hidden_c])
decoder_output_dense_layer = TimeDistributed(Dense(self.vocabulary_size, activation='softmax',
kernel_regularizer=l2(regularizer)))
decoder_output_dense = decoder_output_dense_layer(decoder_outputs)
self.model = Model([decoder_inputs, initial_input], decoder_output_dense)
I’m using adam as optimizer and sparse_categorical_crossentropy as loss.
At first I thought the problem is with the Lambda layers used for splitting the timesteps but the problem persists when I replaced them with Dense layers (which are guarantee
I had a similar error and it turned out I was suppose to build the layers (in my custom layer or model) in the init() like so:
self.lstm_custom_1 = keras.layers.LSTM(128,batch_input_shape=batch_input_shape, return_sequences=False,stateful=True)
self.lstm_custom_1.build(batch_input_shape)
self.dense_custom_1 = keras.layers.Dense(32, activation = 'relu')
self.dense_custom_1.build(input_shape=(batch_size, 128))```
The issue is actually with the Embedding layer, I think. Gradients can't pass through an Embedding layer, so unless it's the first layer in the model it won't work.