Let's say we are just testing out simple single convolution layer with any kind of pooling. Now once we do that operation does that skew image itself?
Related
I work on multitask learning convolutional neural netowrk. It performs the semantic segmentationa and pixel-wise depth estimation task. After decoder output, feature maps resolution is (240,240) which i want to upsample to achieve input resolution of (480,480). Can anyone help to understand that which of the following locations of upsampling layer could results into better performance ? or does appropriate location of upsampling layer has any significant impact on result ?if yes then could you please elaborate it ?
Apply Umsampling before final output layer and use strided or padded convolution layer to mantain resolution as input image (480,480)?
Apply upsampling after final output layer
I have trained network, in which, I have used upsampling layer before the final prediction and then used padding in convolution layer to maintain desired resolution. I read that if we use upsampling before final prediction, then it enables network to learn a more spatial information and make finer predictions since it deals with higher resolution feature maps. But it will increase computation burden.
I want to ask what the difference between Patch Merging in Swin Transformer and Pooling layer (e.x. Max Pooling) in CNNs. Why do they use Patch Merging instead of Pooling layer.
I understand that Patch Merging will reduce the spatial dimension in half and increase the channel dimension, so there will no information loss when using Patch Merging, while Pooling layer will cause a loss of information of input future
I applied batch normalization technique to increase the accuracy of my cnn model.The accuracy of model without batch Normalization was only 46 % but after applying batch normalization it crossed 83% but a here arisen a bif overfitting problem that the model was giving validation Accuracy only 15%. Also please tell me how to decide no of filters strides in convolution layer and no of units in dence layer
Batch normalization has been shown to help in many cases but is not always optimal. I found that it depends where it resides in your model architecture and what you are trying to achieve. I have done a lot with different GAN CNNs and found that often BN is not needed and can even degrade performance. It's purpose is to help the model generalize faster but sometimes it increases training times. If I am trying to replicate images, I skip BN entirely. I don't understand what you mean with regards to the accuracy. Do you mean it achieved 83% accuracy with the training data but dropped to 15% accuracy on the validation data? What was the validation accuracy without the BN? In general, the validation accuracy is the more important metric. If you have a high training accuracy and a low validation accuracy, you are indeed overfitting. If you have several convolution layers, you may want to apply BN after each. If you still over-fit, try increasing your strides and kernel size. If that doesn't work you might need to look at the data again and make sure you have enough and that it is somewhat diverse. Assuming you are working with image data, are you creating samples where you rotate your images, crop them, etc. Consider synthetic data to augment your real data to help combat overfiiting.
I am new to ML and Pytorch and I have the following problem:
I am looking for a Fully Convolutional Network architecture in Pytorch, so that the input would be an RGB image (HxWxC or 480x640x3) and the output would be a single channel image (HxW or 480x640). In other words, I am looking for a network that will preserve the resolution of the input (HxW), and will loose the channel dimension. All of the networks that I've came across (ResNet, Densenet, ...) end with a fully connected layer (without any upsampling or deconvolution). This is problematic for two reasons:
I am restricted with the choice of the input size (HxWxC).
It has nothing to do with the output that I expect to get (a single channel image HxW).
What am I missing? Why is there even a FC layer? Why is there no up-sampling, or some deconvolution layers after feature extraction? Is there any build-in torchvision.model that might suit my requirements? Where can I find such pytorch architecture? As I said, I am new in this field so I don't really like the idea of building such a network from scratch.
Thanks.
You probably came across the networks that are used in classification. So they end up with a pooling and a fully connected layer to produce a fixed number of categorical output.
Have a look at Unet
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
Note: the original unet implementation use a lot of tricks.
You can simply downsample and then upsample symmetrically to do the work.
Your kind of task belongs to dense classification tasks, e.g. segmentation. In those tasks, we use fully convolution nets (see here for the original paper). In the FCNs you don't have any fully-connected layers, because when applying fully-connected layers you lose spatial information which you need for the dense prediction. Also have a look at the U-Net paper. All state-of-the art architectures use some kind of encoder-decoder architecture extended for example with a pyramid pooling module.
There are some implementations in the pytorch model zoo here. Search also Github for pytorch implementations for other networks.
I am building a chatbot model in keras, and I am planning on using it on a raspberry pi. I have a huge database with the size of (1000000, 15, 100) which means there are 1 million sample with a maximum of 15 words and the embedding dimensions are 100d using GloVe. I build a simple model consists of 1 embedding layer, 1 bidirectional lstm layer, 1 droput layer and 2 dense layer with output shape of (25,).
I know that because of the huge database the training process is going to take long, but does the size of the database going to affect the speed of model.predict or does the speed only influenced by the structure of the model and the shape of the input?
No, the size of the dataset does not affect the prediction speed of the model per se, as you say prediction computation time is only affected by the architecture of the model, and the dimensionality of the inputs.
In general the problem with making small models that are fast in embedded hardware is that as small model (with less parameters) might not perform as well as a more complex model (in terms of accuracy or error), so you have to perform a trade-off between model complexity and computational performance.