Size mismatch when fitting ResNet34 with 3x390x320 pixel images [duplicate] - python-3.x

I used python 3 and when i insert transform random crop size 224 it gives miss match error.
here my code
what did i wrong ?

Your code makes variations on resnet: you changed the number of channels, the number of bottlenecks at each "level", and you removed a "level" entirely. As a result, the dimension of the feature map you have at the end of layer3 is not 64: you have a larger spatial dimension than you anticipated by the nn.AvgPool2d(8). The error message you got actually tells you that the output of level3 is of shape 64x56x56 and after avg pooling with kernel and stride 8 you have 64x7x7=3136 dimensional feature vector, instead of only 64 you are expecting.
What can you do?
As opposed to "standard" resnet, you removed stride from conv1 and you do not have max pool after conv1. Moreover, you removed layer4 which also have a stride. Therefore, You can add pooling to your net to reduce the spatial dimensions of layer3.
Alternatively, you can replace nn.AvgPool(8) with nn.AdaptiveAvgPool2d([1, 1]) an avg pool that outputs only one feature regardless of the spatial dimensions of the input feature map.

Related

Calculating the kernel, stride and padding size in a Conv3D

I have a 5D input tensor and I also know what 5D tensor I need as the output. How can I find out if I would need padding or not? Also, how can I calculate the kernel size and stride size? There are some formulations but stride size and kernel sizes would contain 3 elements like (a, b, c). When it comes to calculate these multiple elements, solving those equations would be more complicated. How can I compute thos elements?
For example, if I have a tensor of size [16,1024,16,14,14], how can I calculate the stride and kernel sizes in a conv3d if I want an output with the size of [16,32,16,3,5]? How can I know that if padding is needed?
Is there any website that automatically can calculate these parameters?

Is it possible to add tensors of different sizes together in pytorch?

I have an image gradient of size (3, 224, 224) and a patch of (1, 768). is it possible to add this gradient to the patch to get a size of the patch (1, 768)?
Forgive my inquisitiveness. I know pytorch too utilizes broadcasting and I am not sure if I will able to do so with two different tensors in way similar to the line below:
torch.add(a, b)
For example:
The end product would be the same patch on the left with the gradient of an entire image on the right added to it. My understanding is that it’s not possible, but knowledge isn’t bounded.
No. Whether two tensors are broadcastable is defined by the following rules:
Each tensor has at least one dimension.
When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.
Because the second bullet doesn't hold in your example (i.e., 768 != 224, 1 not in {224, 768}), you can't broadcast the add. If you have some meaningful way to reshape your gradients, you might be able to.
I figured out to do it myself. I divided the image gradient (right) into 16 x 16 patches, created a loop that adds each patch to the original image patch (left). This way, I was able to add a 224 x 224 image gradient into a 16 x 16 patch. I just wanted to see what would happen if I do such

Dimension of fully connected layers

I didn't understand the patch size and the input size of the fully connected layers. Why the first connected layer has input with 3 dimensions?
Thanks
Patch size for fully connected layers
It's simply the size of the weight matrix of each fully connected layer. For example the first fully connected layer takes a 5x5x2048=51200 sized input and produces 1024 sized output. Therefore it has a 51200 x 1024 weight matrix
Input size
It's an image of size 224x224 with 3 channels. Three channels are simply the RGB channels.

Using CNN with Dataset that has different depths between volumes

I am working with Medical Images, where I have 130 Patient Volumes, each volume consists of N number of DICOM Images/slices.
The problem is that between the volumes the the number of slices N, varies.
Majority, 50% of volumes have 20 Slices, rest varies by 3 or 4 slices, some even more than 10 slices (so much so that interpolation to make number of slices equal between volumes is not possible)
I am able to use Conv3d for volumes where the depth N (number of slices) is same between volumes, but I have to make use of entire data set for the classification task. So how do I incorporate entire dataset and feed it to my network model ?
If I understand your question, you have 130 3-dimensional images, which you need to feed into a 3D ConvNet. I'll assume your batches, if N was the same for all of your data, would be tensors of shape (batch_size, channels, N, H, W), and your problem is that your N varies between different data samples.
So there's two problems. First, there's the problem of your model needing to handle data with different values of N. Second, there's the more implementation-related problem of batching data of different lengths.
Both problems come up in video classification models. For the first, I don't think there's a way of getting around having to interpolate SOMEWHERE in your model (unless you're willing to pad/cut/sample) -- if you're doing any kind of classification task, you pretty much need a constant-sized layer at your classification head. However, the interpolation doesn't have happen right at the beginning. For example, if for an input tensor of size (batch, 3, 20, 256, 256), your network conv-pools down to (batch, 1024, 4, 1, 1), then you can perform an adaptive pool (e.g. https://pytorch.org/docs/stable/nn.html#torch.nn.AdaptiveAvgPool3d) right before the output to downsample everything larger to that size before prediction.
The other option is padding and/or truncating and/or resampling the images so that all of your data is the same length. For videos, sometimes people pad by looping the frames, or you could pad with zeros. What's valid depends on whether your length axis represents time, or something else.
For the second problem, batching: If you're familiar with pytorch's dataloader/dataset pipeline, you'll need to write a custom collate_fn which takes a list of outputs of your dataset object and stacks them together into a batch tensor. In this function, you can decide whether to pad or truncate or whatever, so that you end up with a tensor of the correct shape. Different batches can then have different values of N. A simple example of implementing this pipeline is here: https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/03-advanced/image_captioning/data_loader.py
Something else that might help with batching is putting your data into buckets depending on their N dimension. That way, you might be able to avoid lots of unnecessary padding.
You'll need to flatten the dataset. You can treat every individual slice as an input in the CNN. You can set each variable as a boolean flag Yes / No if categorical or if it is numerical you can set the input as the equivalent of none (Usually 0).

Plotting Hidden Weights

I've had an interest for neural networks for a while now and have just started following the deep learning tutorials. I have what I hope is a relatively straight forward question that I am hoping someone may answer.
In the multilayer perception tutorial, I am interested in seeing the state of the network at different layers (something similar to what is seen in this paper: http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/247 ). For instance, I am able to write out the weights of the hidden layer using:
W_open = open('mlp_w_pickle.pkl','w')
cPickle.dump(classifier.hiddenLayer.W.get_value(borrow=True), W_open, -1)
When I plot this using the utils.py tile plotting, I get the following pretty plot [edit: pretty plot rmoved as I dont have enough rep].
If I wanted to plot the weights at the logRegressionLayer, such that
cPickle.dump(classifier.logRegressionLayer.W.get_value(borrow=True), W_open, -1)
what would I actually have to do? The above doesn't seem to work - it returns a 2darray of shape (500,10). I understand that the 500 relates to the number of hidden units. The paragraph on the Miscellaneous page:
Plotting the weights is a bit more tricky. We have n_hidden hidden
units, each of them corresponding to a column of the weight matrix. A
column has the same shape as the visible, where the weight
corresponding to the connection with visible unit j is at position j.
Therefore, if we reshape every such column, using numpy.reshape, we
get a filter image that tells us how this hidden unit is influenced by
the input image.
confuses me alittle. I am unsure exactly how I would string it together.
Thanks to all - sorry if the question is confusing!
You could plot them just the like the weights in the first layer but they will not necessarily make much sense.
Consider the weights in the first layer of a neural network. If the inputs have size 784 (e.g. MNIST images) and there are 2000 hidden units in the first layer then the first layer weights are a matrix of size 784x2000 (or maybe the transpose depending on how it's implemented). Those weights can be plotted as either 784 patches of size 2000 or, more usually, 2000 patches of size 784. In this latter case each patch can be plotted as a 28x28 image which directly ties back to the original inputs and thus is interpretable.
For you higher level regression layer, you could plot 10 tiles, each of size 500 (e.g. patches of size 22x23 with some padding to make it rectangular), or 500 patches of size 10. Either might illustrate some patterns that are being found but it may be difficult to tie those patterns back to the original inputs.

Resources