Conditional Torchvision transforms - conv-neural-network

I have a dataset 1000 of images and corresponding segmentation masks from dermatologists. The images come in different sizes (as low as 400x600 and as large as 4Kx4K). 95% of image pixels are not targets. 5% of pixels are labeled as targets falling in one of four categories (conditions A, B, C, D). HarDNet CNN uses 352x352 input layer.
Data augmentation is done with torchvision.transforms
import torchvision.transforms as transforms
# ...
transforms.RandomRotation(90, resample=False, expand=False, center=None),
transforms.RandomVerticalFlip(p=0.5),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomPerspective(),
transforms.RandomAffine(10, translate=None, scale=None, shear=5, resample=False),
transforms.RandomApply([transforms.CenterCrop((self.trainsize, self.trainsize))], p=0.5), # this
# transforms.RandomApply([transforms.RandomCrop((self.trainsize, self.trainsize))], p=0.5), # or this
transforms.Resize((self.trainsize, self.trainsize)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])
Images were either supplied as is or cropped on 704x704 sections. First approach loses most of the information when the CenterCrop is applied (only 352x352 of 4Kx4K is used, 99% of the image outside of the central region removed by cropping).
When CenterCrop is not applied the the texture information is lost due to loss of resolution. When I crop images to 704x704 sections I have a problem shown below:
The class depends on the color of the central circle: A if red and B if blue. An image is cropped in four images (1, 2, 3, 4). There is no way for a model (or a human) to tell if upper left crop contains class A or class B because the critical information (color of central circle) is not visible. Same is true for the lower left crop.
Imbalanced data is another problem.
I've tried several ways to prepare data:
Remove all images without target pixels. Split image in 704x704 segments. Problem: images tend to have only class A or class B'. Model trained on class Anever seesclass B. Because class Blooks more likeclass Athan healthy skin the model labelsclass Basclass A`.
Split and feed all images to the model for training. Problem: the dataset is highly imbalanced. Only one percent of image pixels belong to the given class.
For each image make one hundred of random patches 704x704. Calculate percent of the label. Include in the set with probability proportional to the percent of the labeled pixels. This leads to undersampling of small target areas. Besides, either the dataset becomes huge, or most of the unlabeled areas are excluded from the dataset. This approach oversamples large labeled areas and undersamples small target areas, see below.
What is the right way to consume images and masks from a folder, randomly crop an image. If the segment has a large labeled area - feed it to the model with high probability. If the segment does not have labeled pixels - feed it with low probability?

Related

Pytorch dataset when the size of the dataset is unknown

I need a custom PyTorch Dataset that generates training images as following. Get an image from a training set. Choose a random location to crop a 352x352 segment of an image. Compute the usefulness score: the score increases if the segment contains a lot of class_A pixels and decreases the more times segment pixels were selected. Return the segment if the score is above a certain threshold and update the counter of selected pixels. If the selected segment has a value below the threshold try another segment. If four consecutive segments scored below the threshold (the image does not have class_A pixels or class_A pixels appeared on several segments), then go to next image and repeat the procedure. Epoch ends when all images were processed.
Due to the randomness of the selection process I cannot provide __len__ method or __getitem__ method.
How should I override these methods, so that I can use my custom Dataset class?

Direct Heatmap Regression with Fully Convolutional Nets

I'm trying to develop a fully-convolutional neural net to estimate the 2D locations of keypoints in images that contain renders of known 3D models. I've read plenty of literature on this subject (human pose estimation, model based estimation, graph networks for occluded objects with known structure) but no method I've seen thus far allows for estimating an arbitrary number of keypoints of different classes in an image. Every method I've seen is trained to output k heatmaps for k keypoint classes, with one keypoint per heatmap. In my case, I'd like to regress k heatmaps for k keypoint classes, with an arbitrary number of (non-overlapping) points per heatmap.
In this toy example, the network would output heatmaps around each visible location of an upper vertex for each shape. The cubes have 4 vertices on top, the extruded pentagons have 2, and the pyramids just have 1. Sometimes points are offscreen or occluded, and I don't wish to output heatmaps for occluded points.
The architecture is a 6-6 layer Unet (as in this paper https://arxiv.org/pdf/1804.09534.pdf). The ground truth heatmaps are normal distributions centered around each keypoint. When training the network with a batch size of 5 and l2 loss, the network learns to never make an estimate whatsoever, just outputting blank images. Datatypes are converted properly and normalized from 0 to 1 for input and 0 to 255 for output. I'm not sure how to solve this, are there any red flags with my general approach? I'll post code if there's no clear problem in general...

Cropping a minibatch of images in Pytorch -- each image differently

I have a tensor named input with dimensions 64x21x21. It is a minibatch of 64 images, each 21x21 pixels. I'd like to crop each image down to 11x11 pixels. So the output tensor I want would have dimensions 64x11x11.
I'd like to crop each image around a different "center pixel." The center pixels are given by a 2-dimensional long tensor named center with dimensions 64x2. For image i, center[i][0] gives the row index and center[i][1] gives the column index for the pixel that should be at the center in the output. We can assume that the center pixel is always at least 5 pixels away from the border.
Is there an efficient way to do this in pytorch (on the gpu)?
UPDATE: Let me clarify that the center tensor is formed by a deep neural network. It acts as a "hard attention mechanism," to use the reinforcement learning term for it. After I "crop" an image, that subimage becomes the input to another neural network. That's why I want to do the cropping in Pytorch: because the operations before and after the cropping are in Pytorch. I'd like to avoid having to transfer anything from the GPU back to the CPU.
I raised the question over on the pytorch forums, and got an answer there from smth. The grid_sample function should totally solve the problem.
https://discuss.pytorch.org/t/cropping-a-minibatch-of-images-each-image-a-bit-differently/12247
torchvision contains transforms including RandomCrop, but it doesn't seem to fit your use case if you want the images cropped in a specific way. I would recon that PyTorch, a deep learning framework, is not the appropriate tool for cropping images.
Instead, have a look at this tutorial that uses pillow. You should be able to implement your use case with this. Also have a look at pillow-simd which does some operations faster.

Scikit-learn, image classification

This example allows the classification of images with scikit-learn:
http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html
However, it is important that all the images have the same size (width and height, as written in the comments).
How can I modify this code to allow classification of images with different sizes?
You will need to define your own Feature Extraction.
In example from above, every pixel is represent a feature. If your images of different sizes, most trivial (but certainly not the best) thing that you can do is pad all images to the size of largest image with, for example, white pixels.
Here an example how to add boarders to image.

Plotting Hidden Weights

I've had an interest for neural networks for a while now and have just started following the deep learning tutorials. I have what I hope is a relatively straight forward question that I am hoping someone may answer.
In the multilayer perception tutorial, I am interested in seeing the state of the network at different layers (something similar to what is seen in this paper: http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/247 ). For instance, I am able to write out the weights of the hidden layer using:
W_open = open('mlp_w_pickle.pkl','w')
cPickle.dump(classifier.hiddenLayer.W.get_value(borrow=True), W_open, -1)
When I plot this using the utils.py tile plotting, I get the following pretty plot [edit: pretty plot rmoved as I dont have enough rep].
If I wanted to plot the weights at the logRegressionLayer, such that
cPickle.dump(classifier.logRegressionLayer.W.get_value(borrow=True), W_open, -1)
what would I actually have to do? The above doesn't seem to work - it returns a 2darray of shape (500,10). I understand that the 500 relates to the number of hidden units. The paragraph on the Miscellaneous page:
Plotting the weights is a bit more tricky. We have n_hidden hidden
units, each of them corresponding to a column of the weight matrix. A
column has the same shape as the visible, where the weight
corresponding to the connection with visible unit j is at position j.
Therefore, if we reshape every such column, using numpy.reshape, we
get a filter image that tells us how this hidden unit is influenced by
the input image.
confuses me alittle. I am unsure exactly how I would string it together.
Thanks to all - sorry if the question is confusing!
You could plot them just the like the weights in the first layer but they will not necessarily make much sense.
Consider the weights in the first layer of a neural network. If the inputs have size 784 (e.g. MNIST images) and there are 2000 hidden units in the first layer then the first layer weights are a matrix of size 784x2000 (or maybe the transpose depending on how it's implemented). Those weights can be plotted as either 784 patches of size 2000 or, more usually, 2000 patches of size 784. In this latter case each patch can be plotted as a 28x28 image which directly ties back to the original inputs and thus is interpretable.
For you higher level regression layer, you could plot 10 tiles, each of size 500 (e.g. patches of size 22x23 with some padding to make it rectangular), or 500 patches of size 10. Either might illustrate some patterns that are being found but it may be difficult to tie those patterns back to the original inputs.

Resources