I started to learn CNN implementation in PyTorch, and I tried to build CNNs to process the grayscale images with 4 classes from 0 to 3. I got in the beginning accuracy around 0.55. The maximum accuracy I got is ~ 0.683%.
I tried SGD and Adam optimizer with different values for lr and batch_size, but the accuracy is still low.
I used data Augmentation to create more samples, around 4k.
I cannot improve accuracy further and wondered if I could get some advices about what I need to change in CNN structure to increase accuracy.
Loss starts around: Loss: [1.497] then decreases near: Loss: [0.001] then fluctuated up and down around this value.
I spent time reading about similar problems but without luck.
I am using nn.CrossEntropyLoss() for my loss_fn. I don't use softmax for dense layer.
This is the Summary of the CNN model:
-------------------------------------------------------------
Layer (type) Output Shape Param #
=============================================================
Conv2d-1 [-1, 32, 128, 128] 320
ReLU-2 [-1, 32, 128, 128] 0
BatchNorm2d-3 [-1, 32, 128, 128] 64
MaxPool2d-4 [-1, 32, 64, 64] 0
Conv2d-5 [-1, 64, 64, 64] 18,496
ReLU-6 [-1, 64, 64, 64] 0
BatchNorm2d-7 [-1, 64, 64, 64] 128
MaxPool2d-8 [-1, 64, 32, 32] 0
Conv2d-9 [-1, 128, 32, 32] 73,856
ReLU-10 [-1, 128, 32, 32] 0
BatchNorm2d-11 [-1, 128, 32, 32] 256
MaxPool2d-12 [-1, 128, 16, 16] 0
Flatten-13 [-1, 32768] 0
Linear-14 [-1, 512] 16,777,728
ReLU-15 [-1, 512] 0
Dropout-16 [-1, 512] 0
Linear-17 [-1, 4] 2,052
============================================================
I would appreciate the help.
How many images are in the train set ? the test set ? What are the size of the images ? How would you consider the difficulty of classification of the images ? Do you think it should be simple or difficult ?
According to the numbers you have, you're overfitting as your loss is near 0 (meaning nothing much will retropropagate to the weights, i.e your model won't change anymore) and your 68.3% (it's a typo right ?) is from the test set (I suppose). So you don't have any problem to train the network which is a good point.
Then you can search ways of countering overfitting online and here is some "classical" possibilities :
- you may raise the dropout parameter
- putting some regularizer (L1 or L2) to constraint the learning
- early stopping using a validation set
- using a classical and/or lighter convolutional network (resnet,inception) with/without pretrained weights. This latter also depends on your images type (natural, biomedical ...)
- ... a lot more or less difficult to implement
Also technically you are using a softmax layer as it's included in the crossentropyloss of pytorch.
Related
Using weighted nn.CrossEntropyLoss() for semantic segmentation task (with 3 "classes"):
model = UneXt50().cuda()
loss = nn.CrossEntropyLoss(class_weights)
learn = Learner(data, model,
loss_func=loss,
opt_func = ranger,
splitter=split_layers).to_fp16()
learn.fit_flat_cos(5, 6e-4)
I get this error:
RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of size: : [32, 1, 256, 256]
Images and masks have the following shapes [C, W, H]:
torch.Size([3, 256, 256])
torch.Size([1, 256, 256])
Why I get the aforementioned error?
Which has to be the right targets shape?
I am using transformers TFBertForSequenceClassification.from_pretrained with 'bert-base-multilingual-uncased') and keras to build my model.
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# metric
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
# optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=epsilon)
# create and compile the Keras model in the context of strategy.scope
model = TFBertForSequenceClassification.from_pretrained(pretrained_weights,
num_labels=num_labels,
cache_dir=pretrained_model_dir)
model._name = 'tf_bert_classification'
# compile Keras model
model.compile(optimizer=optimizer,
loss=loss,
metrics=[metric])
I am using SST2 data, that are tokenize and the feed to the model for training. The data have the following shape:
shape: (32,)
dict structure
dim: 3
[input_ids / attention_mask / token_type_ids ]
[(32, 128) / (32, 128) / (32, 128) ]
[ndarray / ndarray / ndarray ]
and here an example:
({'input_ids': <tf.Tensor: shape=(32, 128), dtype=int32, numpy=
array([[ 101, 21270, 94696, ..., 0, 0, 0],
[ 101, 143, 45100, ..., 0, 0, 0],
[ 101, 24220, 102, ..., 0, 0, 0],
...,
[ 101, 11008, 10346, ..., 0, 0, 0],
[ 101, 43062, 15648, ..., 0, 0, 0],
[ 101, 13178, 18418, ..., 0, 0, 0]], dtype=int32)>, 'attention_mask': ....
As we can see we have input_ids with shape (32, 128) where 32 is the batch size and 128 is the maxiumum length of the string (max for BERT is 512). We also have attention_mask and token_type_ids with the same structure.
I am able to train a model and to do prediction using model.evaluate(test_dataset). All good.
The issue that I am having is that when I serve the model on GCP, then it require data in a different input shape and structure! I saw the same if I run the cli on the saved model:
saved_model_cli show --dir $MODEL_LOCAL --tag_set serve --signature_def serving_default
The given SavedModel SignatureDef contains the following input(s):
inputs['input_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 5)
name: serving_default_input_ids:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
As we can see we only need to give input_ids and not (attention_mask and token_type_ids) and the sape is different. While the batch size is not defined (-1) which expected, the maxium length is 5 instead of 128!It was working 2 months ago and I probably introduce something that created this issue.
I tried few version of Tensorfow (2.2.0 and 2.3.0) and transformers (2.8.0, 2.9.0 and 3.0.2). I cannot see the Keras'model input and outpu shape (None):
model.inputs
model.outputs
model.summary()
Model: "tf_bert_classification"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bert (TFBertMainLayer) multiple 167356416
_________________________________________________________________
dropout_37 (Dropout) multiple 0
_________________________________________________________________
classifier (Dense) multiple 1538
=================================================================
Total params: 167,357,954
Trainable params: 167,357,954
Non-trainable params: 0
Any idea what could explain that the saved model require a different input that the one use for training! I could use the Keras functional API and defined the input shape but I am pretty sure the this code was working before.
I have seen such behavior when model was instantiated from pretrained one, then weights were loaded, and only then it was saved in fully-pledged keras format.
When I was loading the latter afterwards, it was not able to issue correct prediction because its signatures became garbage: attention_mask disappeared, seq_length changed, dummy None inputs appeared out of nowhere. So probably try to save your model in keras format right after fitting (without intermediary loading from weights), if that's your case.
I was trying to port CRNN model to Keras.
But, I got stuck while connecting output of Conv2D layer to LSTM layer.
Output from CNN layer will have a shape of ( batch_size, 512, 1, width_dash) where first one depends on batch_size, and last one depends on input width of input ( this model can accept variable width input )
For eg: an input with shape [2, 1, 32, 829] was resulting output with shape of (2, 512, 1, 208)
Now, as per Pytorch model, we have to do squeeze(2) followed by permute(2, 0, 1)
it will result a tensor with shape [208, 2, 512 ]
I was trying to implement this is Keras, but I was not able to do that because, in Keras we can not alter batch_size dimension in a keras.models.Sequential model
Can someone please guide me how to port above part of this model to Keras?
Current state of ported CNN layer
You don't need to permute the batch axis in Keras. In a pytorch model you need to do it because a pytorch LSTM expects an input shape (seq_len, batch, input_size). However in Keras, the LSTM layer expects (batch, seq_len, input_size).
So after defining the CNN and squeezing out axis 2, you just need to permute the last two axes. As a simple example (in 'channels_first' Keras image format),
model = Sequential()
model.add(Conv2D(512, 3, strides=(32, 4), padding='same', input_shape=(1, 32, None)))
model.add(Reshape((512, -1)))
model.add(Permute((2, 1)))
model.add(LSTM(32))
You can verify the shapes with model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_4 (Conv2D) (None, 512, 1, None) 5120
_________________________________________________________________
reshape_3 (Reshape) (None, 512, None) 0
_________________________________________________________________
permute_4 (Permute) (None, None, 512) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 32) 69760
=================================================================
Total params: 74,880
Trainable params: 74,880
Non-trainable params: 0
_________________________________________________________________
I am not able to figure out how the tensor dimension got reduced by 1 in the TimeDistributed Dense step in the following
model = Sequential()
model.add(Embedding(vocab_size +1, 128, input_length=unravel_len)) # embedding shape: (99, 15, 128)
model.add(Bidirectional(LSTM(64, return_sequences=True))) # (99, 15, 128)
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(categories, activation='softmax'))) # (99, 15, 127)
I labeled the tensor shape along each step. You can see the last dimension dropped from 128 to 127. Can someone explain why that is. Thanks
I am pretty new to deep learning; I want to train a network on image patches of size (256, 256, 3) to predict three labels of pixel-wise segmentation. As a start I want to provide one convolutional layer:
model = Sequential()
model.add(Convolution2d(32, 16, 16, input_shape=(3, 256, 256))
The model output so far is an image with 32 channels. Now, I want to add a dense layer which merges all of these 32 channels into three channels, each one predicting the probability of a class for one pixel.
How can I do that?
The simplest method to merge your 32 channels back to 3 would be to add another convolution, this time with three filters (I arbitrarily set the filter sizes to be 1x1):
model = Sequential()
model.add(Convolution2d(32, 16, 16, input_shape=(3, 256, 256))
model.add(Convolution2d(3, 1, 1))
And then finally add an activation function for segmentation
model.add(Activation("tanh"))
Or you could add it all at once if you want to with activation parameter (arbitrarily chosen to be tanh)
model = Sequential()
model.add(Convolution2d(32, 16, 16, input_shape=(3, 256, 256))
model.add(Convolution2d(3, 1, 1,activation="tanh"))
https://keras.io/layers/convolutional/
You have to use flatten between the convolution layers and the dense layer:
model = Sequential()
model.add(Convolution2d(32, 16, 16, input_shape=(3, 256, 256))
# Do not forget to add an activation layer after your convolution layer, so here.
model.add(Flatten())
model.add(Dense(3))
model.add(Activation("sigmoid")) # whatever activation you want.