How can I improve mask prediction by Mask RCNN? - conv-neural-network

How can I improve the mask prediction by Mask RCNN model? Bounding box and class prediction seems to be okay in my case. Masks are kind of acceptable for small objects but not for big objects. The story is similar for other images as well. Here's my configurations:
RPN_ANCHOR_SCALES = (16, 32, 64, 128, 256)
TRAIN_ROIS_PER_IMAGE = 64
MAX_GT_INSTANCES = 50
POST_NMS_ROIS_INFERENCE = 500
POST_NMS_ROIS_TRAINING = 1000
USE_MINI_MASK True
MASK_SHAPE [28, 28]
MINI MASK_SHAPE [56, 56]
LEARNING_RATE = 0.001
LEARNING_MOMENTUM = 0.9
WEIGHT_DECAY = 0.0001
EPOCHS = 500
Any suggestions would be great!

Wanted to give an update to my post. I improved my accuracy by changing default mask_shape to [56, 56]. In order to be able to change the config, an extra conv2dtranspose layer should be added in model.train py

Related

How to specify the batch dimension in a conv2D layer with pyTorch

I have a dataset of 600x600 grayscale images, grouped in batches of 50 images by a dataloader.
My network has a convolution layer with 16 filters, followed by Maxpooling with 6x6 kernels, and then a Dense layer. The output of the conv2D should be out_channels*width*height/maxpool_kernel_W/maxpool_kernel_H = 16*600*600/6/6 = 160000, multiplied by the batch size, 50.
However when I try to do a forward pass I get the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (80000x100 and 160000x1000). I verified that the data is formatted correctly as [batch,n_channels,width,height] (so [50,1,600,600] in my case).
Logically the output should be a 50x160000 matrix, but apparently it is formatted as a 80000x100 matrix. It seems like torch is multiplying the matrices along the wrong dimensions. If anyone understands why, please help me understand too.
# get data (using a fake dataset generator)
dataset = FakeData(size=500, image_size= (1, 600, 600), transform=ToTensor())
training_data, test_data = random_split(dataset,[400,100])
train_dataloader = DataLoader(training_data, batch_size=50, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=50, shuffle=True)
net = nn.Sequential(
nn.Conv2d(
in_channels=1,
out_channels=16,
kernel_size=5,
padding=2,
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=6),
nn.Linear(160000, 1000),
nn.ReLU(),
)
optimizer = optim.Adam(net.parameters(), lr=1e-3,)
epochs = 10
for i in range(epochs):
for (x, _) in train_dataloader:
optimizer.zero_grad()
# make sure the data is in the right shape
print(x.shape) # returns torch.Size([50, 1, 600, 600])
# error happens here, at the first forward pass
output = net(x)
criterion = nn.MSELoss()
loss = criterion(output, x)
loss.backward()
optimizer.step()
If you inspect your model's inference layer by layer you would have noticed that the nn.MaxPool2d returns a 4D tensor shaped (50, 16, 100, 100). There are different ways to reduce spatial dimensionality (flattening, average-pooling, max-pooling). For instance, if you want to flatten the spatial dimensions, this will result in a tensor of shape (50, 16*100*100), ie. (50, 160_000) as you expected to have. This being said you are required to use a nn.Flatten layer.
net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, padding=2),
nn.ReLU(),
nn.MaxPool2d(kernel_size=6),
nn.Flatten(),
nn.Linear(160000, 1000),
nn.ReLU())

Irregular size in shapes in caffe model

When I tried to make my own model in caffe like this:
n.data = L.Input(input_param={'shape':{'dim':[1,1,64,64]}})
n.conv1 = L.Convolution(n.data, kernel_size=5,
num_output=16, pad=1, weight_filler=dict(type='xavier'))
n.elu1 = L.ELU(n.conv1, in_place=True)
n.scale1 = L.Scale(n.elu1, bias_term=False, in_place=True)
I got a output shape of 62x62x16 but the right thing would be to get one of 64x64x16, is there a mistake in my code?
Output_size is ((Input_size+2(padding)-kernel_size)/Stride)+1
Input_size is 64 , kernel_size is 5, padding is 1
So ((64+2-5)/1)+1 = 62
You need to change the padding or kernel_size.

Error in model performance metrics

Well my neural network is as follows :
# Leaks data input is a 2-D vector of window*size*341 features
# Reshape to match picture format [Height x Width x Channel]
# Tensor input become 4-D: [Batch Size, Height, Width, Channel]
x = tf.reshape(x, shape= [-1, 16, 341, 2])
# Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 6, 2, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 64 filters and a kernel size of 3
conv2 = tf.layers.conv2d(conv1, 8, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in tf contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
# 1-layer LSTM with n_hidden units.
out = tf.layers.dense(fc1, n_classes)
it predicts a multi-label classification vector on len = 339, first i wanted to make sure that i'm fully able to overfit small sample of data to make sure that every thing work okey and well defined.
I trained my neural network on 1700 len data,to measure my model performance i added accuracy as follow :
logits_train = conv_net(features, num_classes, dropout, reuse=False,
is_training=True)
logits_test = conv_net(features, num_classes, dropout, reuse=True,
is_training=False)
# Predictions
pred_classes = tf.cast(tf.greater(logits_test,0.5), tf.float32)
pred_probas = tf.nn.sigmoid(logits_test)
# If prediction mode, early return
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode, predictions=pred_classes)
# Define loss and optimizer
#tf.one_hot(tf.cast(labels,dtype=tf.int32),depth=2)
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.cast(labels,dtype=tf.float32),logits=logits_train))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op,global_step=tf.train.get_global_step())
# Evaluate the accuracy of the model
accuracy = tf.metrics.accuracy(labels=labels , predictions = pred_classes )
#correct_prediction = tf.equal(tf.round(tf.nn.sigmoid(logits_test)), tf.round(labels))
#accuracy1 = tf.metrics.mean(tf.cast(correct_prediction, tf.float32))
#acc_op = tf.metrics.mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred_classes,labels=labels))
# TF Estimators requires to return a EstimatorSpec, that specify
# the different ops for training, evaluating, ...
estim_specs = tf.estimator.EstimatorSpec(
mode=mode,
predictions=pred_probas,
loss=loss_op,
train_op=train_op,
eval_metric_ops={'accuracy': accuracy})
return estim_specs
The problem is that with few epochs the performance seems to be very good
for i in range(1,50):
print('Epoch',(i+1))
input_fn = tf.estimator.inputs.numpy_input_fn(x= curr_data_batch,y=curr_target_batch[:,:339] ,batch_size=96, shuffle=False)
model.train(input_fn=input_fn)
if (i+1) % 10 :
# eval the model
eval_model = model.evaluate(input_fn=input_fn)
print('Loss ,',eval_model['loss'] )
print('accuracy ,',eval_model['accuracy'] )
Loss , 0.029562088
accuracy , 0.9958855
Epoch 3:
Loss , 0.028194984
accuracy , 0.99588597
Epoch 4:
Loss , 0.027557796
accuracy , 0.9958862
but when i try to predict same training data i got fully oposet metrics
loss = 0.65
accuracy = 0.33
I don't know where this issue come from did i miss defined something or no ?
Ty

Multivariate time series predictions with RNN - LSTM using Keras

I want to build a model in order to perform anomaly detection in multivariate time series. Indeed, I have 21 features which are 21 time series for each time-window. The method lays on RNN - LSTM with Keras, the training is done on 100 time-windows considered as normal data and the objective is to apply the model on a new time window so as to detect whether some time-instances are considererd as abnormal.
The model predicts the next instance of each feature, so there are 21 outputs of the model.
My "normal" data are shaped like this:
100 time windows with 1650 observations and 21 features.
My way is to build a model to predict the t+1 instance of a vector of 21 features , so I try to shape X and Y giving:
train_X.shape = (80, 1649, 21)
train_Y.shape = (80, 1649, 21)
train_Y is the t+1 vector of the train_X vector.
I also have a validation set in the training process (to tackle overfitting)
test_X.shape = (20, 1649, 21)
test_Y.shape = (20,1649, 21)
I found this code on machinelearningmastery.com and tried to deal with it:
config = {'sequence_length': 100, 'epochs': 120, 'batch_size': 30,
'validation_split': 0.2}
layers = {'input': 21, 'hidden1': 60, 'hidden2':60, 'output': 21}
model = Sequential()
model.add(LSTM(output_dim=4, layers['hidden1'], input_shape=(1649, 21),
return_sequences=True))
model.add(Dropout(config['validation_split']))
model.add(LSTM(units=layers['hidden2'], return_sequences=False) )
model.add(Dropout(config['validation_split']))
model.add(Dense(units=layers['output']))
model.add(Activation("linear"))
model.add(Dense(1))
model.compile(loss='mse', optimizer='rmsprop')
Fit network
history = model.fit(train_X, train_y, epochs=config['epochs'],
batch_size=config['batch_size'], validation_data=(test_X, test_y),
verbose=2, shuffle=False)
print("Predicting...")
predicted = model.predict(test_X)
print("Reshaping predicted")
predicted = np.reshape(predicted, (predicted.size,))
Do you think I have the right approach? Someone could give me some tips to modify the code or the shaping of the data ?
Thanks.
This code won't run as the data output and model output shapes are inconsistent. Here are a few things to fix and try:
Your train_Y shape does not seem to be correct. If you indeed want train_Y to be the t+1 observation of train_X, then train_Y.shape = (80, 1, 21).
Why do you use "validation_split" for dropout. It seems poor choice of name for dropout variable as it has nothing to do with validation_split.
model.add(Activation("linear")) seems redundant as Dense would already do a linear transformation. More than one linear transformation would be redundant. So maybe replace it with appropriate nonlinearity.
As you ultimate goal is to do anomaly detection, you have to come up with a "detection" threshold. So if abs(y_true - y_pred) > threshold, you would call that sample as anomalous.

Tensorflow: why softmax outputs [1, 0, 0..., 0]

I have a neural net model, it's last layer is fully connected layer with 9 output neurons.
To train my network correctly, I'm using softmax_cross_entropy_with_logits.
It trains ok, but when I want to evaluate my model, I want probabilities also.
So I take an evaluation sample and feed it to the network.
After that I apply softmax to the output and get
[[ 0. 0. 0. 0. 0. 1. 0. 0. 0.]]
Here unnormalized probabilites also:
[[ -2710.10620117 -2914.37866211 -5045.04443359 -4361.91601562
-459.57000732 8843.65820312 -1871.62756348 5447.12451172
-10947.22949219]]
I'm also getting probility of 1 and rest are zeros.
Could anyone please help to handle this issue?
EDIT:
Input images are of shape 64 * 160.
All activation functions are relu.
Max poolings are 2x2.
In conv_plus_max_pool_layer(x_image, 5, 1, 96) 5 is kernel size.
Here is network layout:
hidden_block_1 = conv_plus_max_pool_layer(x_image, 5, 1, 96)
hidden_block_2 = conv_plus_max_pool_layer(hidden_block_1, 5, 96, 256)
hidden_block_3 = conv_plus_max_pool_layer(hidden_block_2, 3, 256, 384)
hidden_block_4 = conv_plus_max_pool_layer(hidden_block_3, 3, 384, 512)
fc1 = dropout_plus_fc(4 * 10 * 512, 512, hidden_block_4, keep_prob_drop1)
output = dropout_plus_fc(512, model_net10_train.class_num, fc1, keep_prob_drop2)
Looks like your network is pretty sure about the output ;)
In this case, I don't think we can do a lot for you without your network layout... Some gut feelings from my side: the layer leading up to your output layer has too many nodes (thus giving you these huuuge numbers), and I suspect that you don't use nonlinearities such as RELU, or tanh. Another thing you might want to check are the initial values for the weights (might be too big), and the learning rate you are using (might be too high).

Resources