Pytorch - Semantic segmentation task, error on target shape - pytorch

Using weighted nn.CrossEntropyLoss() for semantic segmentation task (with 3 "classes"):
model = UneXt50().cuda()
loss = nn.CrossEntropyLoss(class_weights)
learn = Learner(data, model,
loss_func=loss,
opt_func = ranger,
splitter=split_layers).to_fp16()
learn.fit_flat_cos(5, 6e-4)
I get this error:
RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of size: : [32, 1, 256, 256]
Images and masks have the following shapes [C, W, H]:
torch.Size([3, 256, 256])
torch.Size([1, 256, 256])
Why I get the aforementioned error?
Which has to be the right targets shape?

Related

CNN model is not learn well

I started to learn CNN implementation in PyTorch, and I tried to build CNNs to process the grayscale images with 4 classes from 0 to 3. I got in the beginning accuracy around 0.55. The maximum accuracy I got is ~ 0.683%.
I tried SGD and Adam optimizer with different values for lr and batch_size, but the accuracy is still low.
I used data Augmentation to create more samples, around 4k.
I cannot improve accuracy further and wondered if I could get some advices about what I need to change in CNN structure to increase accuracy.
Loss starts around: Loss: [1.497] then decreases near: Loss: [0.001] then fluctuated up and down around this value.
I spent time reading about similar problems but without luck.
I am using nn.CrossEntropyLoss() for my loss_fn. I don't use softmax for dense layer.
This is the Summary of the CNN model:
-------------------------------------------------------------
Layer (type) Output Shape Param #
=============================================================
Conv2d-1 [-1, 32, 128, 128] 320
ReLU-2 [-1, 32, 128, 128] 0
BatchNorm2d-3 [-1, 32, 128, 128] 64
MaxPool2d-4 [-1, 32, 64, 64] 0
Conv2d-5 [-1, 64, 64, 64] 18,496
ReLU-6 [-1, 64, 64, 64] 0
BatchNorm2d-7 [-1, 64, 64, 64] 128
MaxPool2d-8 [-1, 64, 32, 32] 0
Conv2d-9 [-1, 128, 32, 32] 73,856
ReLU-10 [-1, 128, 32, 32] 0
BatchNorm2d-11 [-1, 128, 32, 32] 256
MaxPool2d-12 [-1, 128, 16, 16] 0
Flatten-13 [-1, 32768] 0
Linear-14 [-1, 512] 16,777,728
ReLU-15 [-1, 512] 0
Dropout-16 [-1, 512] 0
Linear-17 [-1, 4] 2,052
============================================================
I would appreciate the help.
How many images are in the train set ? the test set ? What are the size of the images ? How would you consider the difficulty of classification of the images ? Do you think it should be simple or difficult ?
According to the numbers you have, you're overfitting as your loss is near 0 (meaning nothing much will retropropagate to the weights, i.e your model won't change anymore) and your 68.3% (it's a typo right ?) is from the test set (I suppose). So you don't have any problem to train the network which is a good point.
Then you can search ways of countering overfitting online and here is some "classical" possibilities :
- you may raise the dropout parameter
- putting some regularizer (L1 or L2) to constraint the learning
- early stopping using a validation set
- using a classical and/or lighter convolutional network (resnet,inception) with/without pretrained weights. This latter also depends on your images type (natural, biomedical ...)
- ... a lot more or less difficult to implement
Also technically you are using a softmax layer as it's included in the crossentropyloss of pytorch.

Problem in prediction of my image in my own trained Keras model

I trained my model using transfer learning. Now when I am predicting my image in Colab it shows me an error:
WARNING:tensorflow:Model was constructed with shape (None, 128, 128, 3) for input Tensor("xception_input:0", shape=(None, 128, 128, 3), dtype=float32), but it was called on an input with incompatible shape (None, 275, 3).
WARNING:tensorflow:Model was constructed with shape (None, 128, 128, 3) for input Tensor("input_1:0", shape=(None, 128, 128, 3), dtype=float32), but it was called on an input with incompatible shape (None, 275, 3).
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-142a5ca8cbef> in <module>()
1 import numpy as np
----> 2 classes = np.argmax(model.predict(img), axis=-1)
3 print(classes)
.
.
.
ValueError: Input 0 of layer block1_conv1 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: [None, 275, 3]
Basically during training, you were feeding a batch of images as input the network, the same will be required at the test/evaluation time. So, the easy solution would be to expand the dimension of img tensor to [1, img.shape].
img_test = tf.expand_dims(img, axis=0)
The message is saying that you trained your model using a shape of
shape=(None, 128, 128, 3)
but when you try to predict from the model you provided an input of
[None, 275, 3]
Obviously, this cannot be used by your model. First of all, you provided a 3dim dimension input but you should have provided a 4dim one. Typically images are (height, width, 3) and if you provide them in batches this becomes (batch_size, height, width, 3) and if you have just one image it becomes:
(1, height, width, 3)
So, you should check the input you provide your model with. With numpy you typically use something like
np.expand_dims(original_image, axis=0)
to go from 3dim to 4dim input.

"saved_model_cli show" on a keras+transformers model display different inputs and shapes that the one used for training

I am using transformers TFBertForSequenceClassification.from_pretrained with 'bert-base-multilingual-uncased') and keras to build my model.
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# metric
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
# optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=epsilon)
# create and compile the Keras model in the context of strategy.scope
model = TFBertForSequenceClassification.from_pretrained(pretrained_weights,
num_labels=num_labels,
cache_dir=pretrained_model_dir)
model._name = 'tf_bert_classification'
# compile Keras model
model.compile(optimizer=optimizer,
loss=loss,
metrics=[metric])
I am using SST2 data, that are tokenize and the feed to the model for training. The data have the following shape:
shape: (32,)
dict structure
dim: 3
[input_ids / attention_mask / token_type_ids ]
[(32, 128) / (32, 128) / (32, 128) ]
[ndarray / ndarray / ndarray ]
and here an example:
({'input_ids': <tf.Tensor: shape=(32, 128), dtype=int32, numpy=
array([[ 101, 21270, 94696, ..., 0, 0, 0],
[ 101, 143, 45100, ..., 0, 0, 0],
[ 101, 24220, 102, ..., 0, 0, 0],
...,
[ 101, 11008, 10346, ..., 0, 0, 0],
[ 101, 43062, 15648, ..., 0, 0, 0],
[ 101, 13178, 18418, ..., 0, 0, 0]], dtype=int32)>, 'attention_mask': ....
As we can see we have input_ids with shape (32, 128) where 32 is the batch size and 128 is the maxiumum length of the string (max for BERT is 512). We also have attention_mask and token_type_ids with the same structure.
I am able to train a model and to do prediction using model.evaluate(test_dataset). All good.
The issue that I am having is that when I serve the model on GCP, then it require data in a different input shape and structure! I saw the same if I run the cli on the saved model:
saved_model_cli show --dir $MODEL_LOCAL --tag_set serve --signature_def serving_default
The given SavedModel SignatureDef contains the following input(s):
inputs['input_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 5)
name: serving_default_input_ids:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
As we can see we only need to give input_ids and not (attention_mask and token_type_ids) and the sape is different. While the batch size is not defined (-1) which expected, the maxium length is 5 instead of 128!It was working 2 months ago and I probably introduce something that created this issue.
I tried few version of Tensorfow (2.2.0 and 2.3.0) and transformers (2.8.0, 2.9.0 and 3.0.2). I cannot see the Keras'model input and outpu shape (None):
model.inputs
model.outputs
model.summary()
Model: "tf_bert_classification"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bert (TFBertMainLayer) multiple 167356416
_________________________________________________________________
dropout_37 (Dropout) multiple 0
_________________________________________________________________
classifier (Dense) multiple 1538
=================================================================
Total params: 167,357,954
Trainable params: 167,357,954
Non-trainable params: 0
Any idea what could explain that the saved model require a different input that the one use for training! I could use the Keras functional API and defined the input shape but I am pretty sure the this code was working before.
I have seen such behavior when model was instantiated from pretrained one, then weights were loaded, and only then it was saved in fully-pledged keras format.
When I was loading the latter afterwards, it was not able to issue correct prediction because its signatures became garbage: attention_mask disappeared, seq_length changed, dummy None inputs appeared out of nowhere. So probably try to save your model in keras format right after fitting (without intermediary loading from weights), if that's your case.

Concatenation of Keras parallel layers changes wanted target shape

I'm a bit new to Keras and deep learning. I'm currently trying to replicate this paper but when I'm compiling the first model (without the LSTMs) I get the following error:
"ValueError: Error when checking target: expected dense_3 to have shape (None, 120, 40) but got array with shape (8, 40, 1)"
The description of the model is this:
Input (length T is appliance specific window size)
Parallel 1D convolution with filter size 3, 5, and 7
respectively, stride=1, number of filters=32,
activation type=linear, border mode=same
Merge layer which concatenates the output of
parallel 1D convolutions
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim=T , activation type=linear
My code is this:
from keras import layers, Input
from keras.models import Model
# the window sizes (seq_length?) are 40, 1075, 465, 72 and 1246 for the kettle, dish washer,
# fridge, microwave, oven and washing machine, respectively.
def ae_net(T):
input_layer = Input(shape= (T,))
branch_a = layers.Conv1D(32, 3, activation= 'linear', padding='same', strides=1)(input_layer)
branch_b = layers.Conv1D(32, 5, activation= 'linear', padding='same', strides=1)(input_layer)
branch_c = layers.Conv1D(32, 7, activation= 'linear', padding='same', strides=1)(input_layer)
merge_layer = layers.concatenate([branch_a, branch_b, branch_c], axis=1)
dense_1 = layers.Dense(128, activation='relu')(merge_layer)
dense_2 =layers.Dense(128, activation='relu')(dense_1)
output_dense = layers.Dense(T, activation='linear')(dense_2)
model = Model(input_layer, output_dense)
return model
model = ae_net(40)
model.compile(loss= 'mean_absolute_error', optimizer='rmsprop')
model.fit(X, y, batch_size= 8)
where X and y are numpy arrays of 8 sequences of a length of 40 values. So X.shape and y.shape are (8, 40, 1). It's actually one batch of data. The thing is I cannot understand how the output would be of shape (None, 120, 40) and what these sizes would mean.
As you noted, your shapes contain batch_size, length and channels: (8,40,1)
Your three convolutions are, each one, creating a tensor like (8,40,32).
Your concatenation in the axis=1 creates a tensor like (8,120,32), where 120 = 3*40.
Now, the dense layers only work on the last dimension (the channels in this case), leaving the length (now 120) untouched.
Solution
Now, it seems you do want to keep the length at the end. So you won't need any flatten or reshape layers. But you will need to keep the length 40, though.
You're probably doing the concatenation in the wrong axis. Instead of the length axis (1), you should concatenate in the channels axis (2 or -1).
So, this should be your concatenate layer:
merge_layer = layers.Concatenate()([branch_a, branch_b, branch_c])
#or layers.Concatenate(axis=-1)([branch_a, branch_b, branch_c])
This will output (8, 40, 96), and the dense layers will transform the 96 in something else.

Deep Net with keras for image segmentation

I am pretty new to deep learning; I want to train a network on image patches of size (256, 256, 3) to predict three labels of pixel-wise segmentation. As a start I want to provide one convolutional layer:
model = Sequential()
model.add(Convolution2d(32, 16, 16, input_shape=(3, 256, 256))
The model output so far is an image with 32 channels. Now, I want to add a dense layer which merges all of these 32 channels into three channels, each one predicting the probability of a class for one pixel.
How can I do that?
The simplest method to merge your 32 channels back to 3 would be to add another convolution, this time with three filters (I arbitrarily set the filter sizes to be 1x1):
model = Sequential()
model.add(Convolution2d(32, 16, 16, input_shape=(3, 256, 256))
model.add(Convolution2d(3, 1, 1))
And then finally add an activation function for segmentation
model.add(Activation("tanh"))
Or you could add it all at once if you want to with activation parameter (arbitrarily chosen to be tanh)
model = Sequential()
model.add(Convolution2d(32, 16, 16, input_shape=(3, 256, 256))
model.add(Convolution2d(3, 1, 1,activation="tanh"))
https://keras.io/layers/convolutional/
You have to use flatten between the convolution layers and the dense layer:
model = Sequential()
model.add(Convolution2d(32, 16, 16, input_shape=(3, 256, 256))
# Do not forget to add an activation layer after your convolution layer, so here.
model.add(Flatten())
model.add(Dense(3))
model.add(Activation("sigmoid")) # whatever activation you want.

Resources