I want to do regression with images. There are images of roads and the associated steering angle. As I want to apply data augmentation in Keras I would like to flip the input images horizontally but that would imply that the steering angle has to change its sign if the image is flipped. As far as I can see the documentation does not cover this problem. Is there a tutorial explaining how this can be achieved?
You have to write your own data-generator.
Check out the ImageLoader class (custom image generator) in my code here:
https://github.com/Golbstein/EDSR-Keras/blob/master/utilities.py
Related
I'm building a neural network using Keras to perform landmarks localization on a grayscale images.
I saw that for classification task there is a Keras function to perform Data Augmentation. But for localization task I have not found a function to perform Data Augmentation since in the labeled data there are points to be changed.
Any idea to perform Data Augmentation for landmarks localization task?
Thanks!
I don't know if there is anything you can use out of the box in keras.
But what you can always do is write your own data augmentation. The only real difference to data augmentation for e.g. classification is that you also have to keep track of your landmarks.
So if you do a flipping you have to also flip the coordinates of your landmarks. Same with rotation, if you rotate around some angle you also have to rotate the landmarks. If you do cropping you have to transform the landmark coordinates to the new cropped image, which is just a simple translation and removing the points which are not in the new image.
On how to do this you can have a look here
https://medium.com/the-artificial-impostor/custom-image-augmentation-with-keras-70595b01aeac
or just in general search for something like "custom image augmentation in keras".
I want to train a facial recognition CNN from scratch. I can write a Keras Sequential() model following popular architectures and copying their networks.
I wish to use the LFW dataset, however I am confused regarding the technical methodology. Do I have to crop each face to a tight-fitting box? That seems impractical, as the dataset has 13000+ faces.
Lastly, I know it's stupid, but all I have to do is preprocess the images (of course), then fit the model to these images? What's the exact procedure?
Your question is very open ended. Before preprocessing and fitting the model, you need to understand Object Detection. Once you understand what object detection you will get answer to your 1st question whether you are required to manually crop every 13000 image. The answer is no. However, you will have to draw bounding boxes around faces and assign label to images if they are not available in the training data.
Your second question is very vague . What do you mean by exact procedure? Is it the steps you need to do or how to do preprocessing and fitting of the model in python/or any other language? There are lots of references available on the internet about how to do preprocessing and model training for every specific problem. There are no universal steps which can be applied to any problem
I know that it might be a dumb question, but I searched everywhere for an answer but I could not get.
Okay first properly explaining my question,
When I was learning CNN I was told that kernels or filters or activation map represent a feature of image.
To be specific, assume a cat image identification, a feature map would represent a "whiskers"
and in images which the activation of this feature map would be high it is inferred as whisker is present in image and so the image is a cat. (Correct me if I am wrong)
Well now when I made a Keras ConvNet I save the model
and then loaded the model and
saved all the filters to png images.
What I saw was 3x3 px images where each each pixel was of different colour (green, blue or their various variants and so on)
So how these 3x3px random colour pattern images of kernels represent in any way the "whisker" or any other feature of cat?
Or how could I know which png images is which feature ie which is whisker detector filter etc?
I am asking this because I might be asked in oral examination by teacher.
Sorry for the length of answer (but I had to make it so to explain properly)
You need to have a further look into how convolutional neural networks operate: the main topic being the convolution itself. The convolution occurs with the input image and filters/kernels to produce feature maps. A feature map is what may highlight important features.
The filters/kernels do not know anything of the input data so when you save these you are only going to see psuedo-random images.
Put simply, where * is the convolution operator,
input_image * filter = feature map
What you want to save, if you want to vizualise what is occuring during convolution, are the feature maps. This website gives a very detailed account on how to do so, and it is the method I have used in the past.
I'm currently performing a pixel-based classification of an image using simple supervised classifiers implemented in Scikit-learn. The image is first reshaped into a vector of single pixel intensities, then the training and the classification are carried out as in the following:
from sklearn.linear_model import SGDClassifier
classifier = SGDClassifier(verbose=True)
classifier.fit(training_data, training_target)
predictions = classifier.predict(test_data)
The problem with pixel-based classification is the noisy nature of the resulting classified image. To prevent it, I wanted to use Graph Cut (e.g. Boykov-Kolmogorov implementation) to take into account the spatial context between pixels. But, the implmentations I found in Python (NetworkX, Graph-tool) and in C++ (OpenGM and the original implementation: [1] and [2]) don't show how to go from an image to a Graph, except for [2] which is in matlab, and I'm not really enough familiar with either of Graph Cut and matlab.
So my question is basically how can graph cuts be integrated into the previous classification (e.g. before the training or as a post-processing)?
I had a look at the graph algorithms in Scikit-image (here), but these work only on RGB images with discreet values, whereas my pixel values are continuous.
I found this image restoration tutorial which does more or less what I was looking for. Besides, you use a Python library wrapper (PyMaxflow) to call the maxflow algorithm to partition the graph.
It starts from the noisy image on the left, and takes into account the spatial constraint between pixels, to obtain the binary image on the right.
I am trying to train a classifier to separate images taken by a particle physics detector into two classes. For each image, I also have a coordinate (x,y,z) describing where the particle interaction took place. That coordinate is very useful is understanding these images by eye, but doesn't have an obvious translation to weighting image pixels.
I've been trying some basic machine learning techniques in scikit-learn, feeding in data points with 103 features: the three axes of the coordinates, and the 10x10 pixels of the image. Those basic techniques aren't cutting it, unfortunately, so I thought I'd try to take advantage of the properties of convolutional neural networks. Since I've never tried that before, Keras seemed like an easy way to get started.
Looking at Keras, I see that I ought to provide an input shape. I could presumably use a input shape of (103), but if I understand CNN correctly, I'd lose all the advantages of CNN for images. Intuitively, what I want the input shape to be is (3)+(10,10). Is that a sensible concept in the world of CNN? Can it be done in Keras?
You might want to look into the Merge layer. In essence this allows you to use two independent inputs, maybe give them a few different processing layers and them combine them for the rest of the model.
With this you could, for example, do several convolutional layers to process the image and then simply merge it with the coordinate inputs.