I'm building a neural network using Keras to perform landmarks localization on a grayscale images.
I saw that for classification task there is a Keras function to perform Data Augmentation. But for localization task I have not found a function to perform Data Augmentation since in the labeled data there are points to be changed.
Any idea to perform Data Augmentation for landmarks localization task?
Thanks!
I don't know if there is anything you can use out of the box in keras.
But what you can always do is write your own data augmentation. The only real difference to data augmentation for e.g. classification is that you also have to keep track of your landmarks.
So if you do a flipping you have to also flip the coordinates of your landmarks. Same with rotation, if you rotate around some angle you also have to rotate the landmarks. If you do cropping you have to transform the landmark coordinates to the new cropped image, which is just a simple translation and removing the points which are not in the new image.
On how to do this you can have a look here
https://medium.com/the-artificial-impostor/custom-image-augmentation-with-keras-70595b01aeac
or just in general search for something like "custom image augmentation in keras".
Related
I am currently working on fingers-count deep learning problem. When you look at the dataset, images in the training and validation set are very basic and are almost the same. The network can achieve high training and validation accuracies. But when it comes to prediction in real-life images, it performs very badly(this is because the model has been trained on very basic images).
To overcome this, I converted the training and validation images to HSV(Hue-Saturation-Value) and trained the model on new HSV images. Example of 1 such image from new training set is:
I then convert my image from real life to HSV and pass it to model for prediction. But still, the model is not able to predict correctly. I assumed that since the training images and predicting image are almost same after applying HSV, the model should be predicting good. Is there something which I am thinking incorrectly here? Can HSV images be actually used for training CNN?
It seems you have the overfitting issue, and your model only memorize the simple samples of the training set and in contrast it can not generalize to more complex and diverse data.
In the context of Deep Learning there are various methods to avoid overfitting and I think you don't need to transform your input to HSV necessarily. First of all you can apply various data augmentation methods like random crop or rotation to create various versions of your data. If this method does not work, you can use a smaller model or applying techniques such as Drop Out or Regularization.
Here is a good tutorial from TensorFlow.
I am looking at using Landsat imagery to train a CNN for unsupervised pixel-wise semantic segmentation classification. That said, I have been unable to find a method that allows me to crop images from the larger Landsat image for training and then predict on the original image. Essentially here is what I am trying to do:
Original Landsat image (5,000 x 5,000 - this is an arbitrary size, not exactly sure of the actual dimensions off-hand) -> crop the image into (100 x 100) chunks -> train the model on these cropped images -> output a prediction for each pixel in the original (uncropped) image.
That said, I am not sure if I should predict on the cropped images and stitch them together after they are predicted or if I can predict on the original image.
Any clarification/code examples would be greatly appreciated. For reference, I use both pytorch and tensorflow.
Thank you!
Lance D
Borrowing from Ronneberger et al., what we have been doing is to split the input Landsat scene and corresponding ground truth mask into overlapping tiles. Take the original image and pad it by the overlap margin (we use reflection for the padding) then split into tiles. Here is a code snippet using scikit-image:
import skimage as sk
patches = sk.util.view_as_windows(image,
(self.tile_height+2*self.image_margin,
self.tile_width+2*self.image_margin,raster_value['channels']),
(self.tile_height,self.tile_width,raster_value['channels'])
I don't know what you are using for a loss function for unsupervised segmentation. In our case with supervised learning, we crop the final segmentation prediction to match the ground truth output shape. In the Ronneberger paper they relied on shrinkage due to the use of valid padding.
For predictions you would do the same (split into overlapping tiles) and stitch the result.
I'm currently performing a pixel-based classification of an image using simple supervised classifiers implemented in Scikit-learn. The image is first reshaped into a vector of single pixel intensities, then the training and the classification are carried out as in the following:
from sklearn.linear_model import SGDClassifier
classifier = SGDClassifier(verbose=True)
classifier.fit(training_data, training_target)
predictions = classifier.predict(test_data)
The problem with pixel-based classification is the noisy nature of the resulting classified image. To prevent it, I wanted to use Graph Cut (e.g. Boykov-Kolmogorov implementation) to take into account the spatial context between pixels. But, the implmentations I found in Python (NetworkX, Graph-tool) and in C++ (OpenGM and the original implementation: [1] and [2]) don't show how to go from an image to a Graph, except for [2] which is in matlab, and I'm not really enough familiar with either of Graph Cut and matlab.
So my question is basically how can graph cuts be integrated into the previous classification (e.g. before the training or as a post-processing)?
I had a look at the graph algorithms in Scikit-image (here), but these work only on RGB images with discreet values, whereas my pixel values are continuous.
I found this image restoration tutorial which does more or less what I was looking for. Besides, you use a Python library wrapper (PyMaxflow) to call the maxflow algorithm to partition the graph.
It starts from the noisy image on the left, and takes into account the spatial constraint between pixels, to obtain the binary image on the right.
I want to do regression with images. There are images of roads and the associated steering angle. As I want to apply data augmentation in Keras I would like to flip the input images horizontally but that would imply that the steering angle has to change its sign if the image is flipped. As far as I can see the documentation does not cover this problem. Is there a tutorial explaining how this can be achieved?
You have to write your own data-generator.
Check out the ImageLoader class (custom image generator) in my code here:
https://github.com/Golbstein/EDSR-Keras/blob/master/utilities.py
I am trying to train a classifier to separate images taken by a particle physics detector into two classes. For each image, I also have a coordinate (x,y,z) describing where the particle interaction took place. That coordinate is very useful is understanding these images by eye, but doesn't have an obvious translation to weighting image pixels.
I've been trying some basic machine learning techniques in scikit-learn, feeding in data points with 103 features: the three axes of the coordinates, and the 10x10 pixels of the image. Those basic techniques aren't cutting it, unfortunately, so I thought I'd try to take advantage of the properties of convolutional neural networks. Since I've never tried that before, Keras seemed like an easy way to get started.
Looking at Keras, I see that I ought to provide an input shape. I could presumably use a input shape of (103), but if I understand CNN correctly, I'd lose all the advantages of CNN for images. Intuitively, what I want the input shape to be is (3)+(10,10). Is that a sensible concept in the world of CNN? Can it be done in Keras?
You might want to look into the Merge layer. In essence this allows you to use two independent inputs, maybe give them a few different processing layers and them combine them for the rest of the model.
With this you could, for example, do several convolutional layers to process the image and then simply merge it with the coordinate inputs.