Ideal input size of CNN given training data - conv-neural-network

Should the input size of the CNN follow that of the training data ? For example, if my training data is of size 192 x 98 then what should the input size of my CNN be ? 192 x 192 ? 98 x 98 ? Would it be a bad idea if I use a 32x32 input CNN ?
I have sooooo many questions on the specifics of CNN but no one has the answer.

It is not necessary that the input size of the CNN should be the same as that of the training data.
What input size to choose depends on what application you are using the CNN for. For example, for classification, a 32x32 image might give a good accuracy. But for something like segmentation, it will most probably not give a good output. That being said, using a higher resolution image will result in a slightly higher accuracy. Refer this paper. So, if you afford the extra processing time, go for the higher resolution.

Related

Diffusion with melspectrogram data

I am trying to put a data set of melspectrogram tensors into a diffusion model, the shape of the ensors is (128, 646) (a 15 second second audio file).
I want to run it through a diffusion model like the one in this notebook: (https://colab.research.google.com/drive/1sjy9odlSSy0RBVgMTgP7s99NXsqglsUL?usp=sharing)
This code is for images of size 64 x 64
My questions are as follows:
How do i adjust the model to accept tensors instead of images?
Would it be a viable solution to pad the tensors to look 'square'
Do you have any other advice for diffusion on tensors?
Thank you.
I havent tried anything yet...i am still researching how to do this.

Multilabel text classification with BERT and highly imbalanced training data

I'm trying to train a multilabel text classification model using BERT. Each piece of text can belong to 0 or more of a total of 485 classes. My model consists of a dropout layer and a linear layer added on top of the pooled output from the bert-base-uncased model from Hugging Face. The loss function I'm using is the BCEWithLogitsLoss in PyTorch.
I have millions of labeled observations to train on. But the training data are highly unbalanced, with some labels appearing in less than 10 observations and others appearing in more than 100K observations! I'd like to get a "good" recall.
My first attempt at training without adjusting for data imbalance produced a micro recall rate of 70% (good enough) but a macro recall rate of 45% (not good enough). These numbers indicate that the model isn't performing well on underrepresented classes.
How can I effectively adjust for the data imbalance during training to improve the macro recall rate? I see we can provide label weights to BCEWithLogitsLoss loss function. But given the very high imbalance in my data leading to weights in the range of 1 to 1M, can I actually get the model to converge? My initial experiments show that a weighted loss function is going up and down during training.
Alternatively, is there a better approach than using BERT + dropout + linear layer for this type of task?
In your case it might be helpful to balance the labels in the training data. You have a lot of data, so you could afford to loose a part of it by balancing. But before you do this, I recommend to read this answer about balancing classes in traing data.
If you really only care about recall, you could try to tune your model maximizing recall.

What should be the size of input image for training a YOLOv3 Model Architecture CNN.?

I've implemented a YOLOv3 from scratch and I plan to fine-tune using MS-COCO weights for some different data.
The dataset I've chosen has images of 720*1280 size.
When I go through the YOLOv3 paper, 1st CONV2d layer is there with filter_size =3 and stride = 1, and output size is 256*256....
Can someone give me a walkthrough for how YOLO training part works in here?
From Yolov3 paper:
If best possible accuracy/mAP is what you want then use 608 x 608 as input layer size in the config.
If you want good inference/speed at the cost of accuracy then use, 320 x 320
If balanced model is what you want then use 416 x 416
Note that first layer automatically resizes your images to the size of first layer in Yolov3 CNN, so you need not convert your 1280 x 720 images to the input layer size.
Suggest you to read following things:
To understand how Yolov3 works, read this blog post.
To understand some basic stuff read from original site
Learn how to train your custom object detector here

Overfitting problem in convolutional neural Network and deciding the parameters of convolution and dence layer

I applied batch normalization technique to increase the accuracy of my cnn model.The accuracy of model without batch Normalization was only 46 % but after applying batch normalization it crossed 83% but a here arisen a bif overfitting problem that the model was giving validation Accuracy only 15%. Also please tell me how to decide no of filters strides in convolution layer and no of units in dence layer
Batch normalization has been shown to help in many cases but is not always optimal. I found that it depends where it resides in your model architecture and what you are trying to achieve. I have done a lot with different GAN CNNs and found that often BN is not needed and can even degrade performance. It's purpose is to help the model generalize faster but sometimes it increases training times. If I am trying to replicate images, I skip BN entirely. I don't understand what you mean with regards to the accuracy. Do you mean it achieved 83% accuracy with the training data but dropped to 15% accuracy on the validation data? What was the validation accuracy without the BN? In general, the validation accuracy is the more important metric. If you have a high training accuracy and a low validation accuracy, you are indeed overfitting. If you have several convolution layers, you may want to apply BN after each. If you still over-fit, try increasing your strides and kernel size. If that doesn't work you might need to look at the data again and make sure you have enough and that it is somewhat diverse. Assuming you are working with image data, are you creating samples where you rotate your images, crop them, etc. Consider synthetic data to augment your real data to help combat overfiiting.

Convolutional Neural Network - Visualizing weights

Main Problem
I cannot understand the Plot of the weights of a specific layer.
I used a method from no-learn : plot_conv_weights(layer, figsize=(6, 6))
Im using lasagne as my neural-network library.
The plot comes out fine, but I dont know how i should interpret it.
Neural Network Structure
The structure im using :
InputLayer 1x31x31
Conv2DLayer 20x3x3
Conv2DLayer 20x3x3
Conv2DLayer 20x3x3
MaxPool2DLayer 2x2
Conv2DLayer 40x3x3
Conv2DLayer 40x3x3
Conv2DLayer 40x3x3
MaxPool2DLayer 40x2x2
DropoutLayer
DenseLayer 96
DropoutLayer 96
DenseLayer 32
DropoutLayer 32
DenseLayer 1 as sigmoid
Here are the weights of the first 3 Layers :
** About the Images **
So for me, they look random and i cannot interpret them!
However, on Cs231, it says the following :
Conv/FC Filters. The second common strategy is to visualize the
weights. These are usually most interpretable on the first CONV layer
which is looking directly at the raw pixel data, but it is possible to
also show the filter weights deeper in the network. The weights are
useful to visualize because well-trained networks usually display nice
and smooth filters without any noisy patterns. Noisy patterns can be
an indicator of a network that hasn’t been trained for long enough, or
possibly a very low regularization strength that may have led to
overfitting
http://cs231n.github.io/understanding-cnn/
Then why mine are random?
The structure is trained and performs well for its task.
References
http://cs231n.github.io/understanding-cnn/
https://github.com/dnouri/nolearn/blob/master/nolearn/lasagne/visualize.py
Normally when you visualize the weights you want to check 2 things:
That they are smooth and cover a wide range of values, i.e. it's not a bunch of 1's and 0's. That would mean the non-linearity is being saturated.
That they have some kind of structure. Normally you tend to see oriented edges although this is more difficult to see when you have small filters like 3x3.
That being said, your weights do not appear to be saturated, but they indeed seem to be too random.
During training, did the network converge correctly?
I am also surprised at how big your filters are (30x30). Not sure what you are trying to accomplish with that.

Resources