I am working on a project that I need to train a CNN with Computer Tomography (CT) images, which has a circular Field Of View (FOV). All voxels outside the FOV is meaningless.
To my knowledge, people would usually only feed image with rectangular dimension into CNN model. For similar issues with circular image shape, they usually just crop part of the image to form a rectangular dimension during image-preprocessing. By this method, some part of the image is unavoidably cropped out from the training. Is there any way I can feed image with circular FOV into the model without cropping out any feature from the image? I can't really find similar topics for this concern on the internet so I ask here. Thank you.
Related
I am testing some well known models for computer vision: UNet, FC-DenseNet103, this implementation
I train them with 224x224 randomly cropped patches and do the same on the validation set.
Now when I run inference on some videos, I pass it the frames directly (1280x640) and it works. It runs the same operations on different image sizes and never gives an error. It actually gives a nice output, but the quality of the output depends on the image size...
Now it's been a long time since I've worked with neural nets but when I was using tensorflow I remember I had to crop the input images to the train crop size.
Why don't I need to do this anymore? What's happening under the hood?
It seems that the models that you are using have no linear layers. Because of this the output of the convolutional layers go straight into the softmax function. The softmax function doesn't take a specific shape for its input so it can take any shape as input. Because of this your model will work with any shape of image but the accuracy of your model will probably be far worse given different image shapes than the one you trained on.
There is always a specific input size in the documentation of the model. You should use this size. These are the current model limitations.
For UNets this may even be a ratio. I think it depends on implementation.
Just a note on resize:
transform.Resize((h,w))
transform.Resize(d)
In case of the (h, w), output size will be matched to this.
In the second case of d size, the smaller edge of the image will be matched to d.
For example, if height > width, then image will be re-scaled to (d * height / width, d)
The idea is to not ruin the aspect ratio of the image.
I'm building a neural network using Keras to perform landmarks localization on a grayscale images.
I saw that for classification task there is a Keras function to perform Data Augmentation. But for localization task I have not found a function to perform Data Augmentation since in the labeled data there are points to be changed.
Any idea to perform Data Augmentation for landmarks localization task?
Thanks!
I don't know if there is anything you can use out of the box in keras.
But what you can always do is write your own data augmentation. The only real difference to data augmentation for e.g. classification is that you also have to keep track of your landmarks.
So if you do a flipping you have to also flip the coordinates of your landmarks. Same with rotation, if you rotate around some angle you also have to rotate the landmarks. If you do cropping you have to transform the landmark coordinates to the new cropped image, which is just a simple translation and removing the points which are not in the new image.
On how to do this you can have a look here
https://medium.com/the-artificial-impostor/custom-image-augmentation-with-keras-70595b01aeac
or just in general search for something like "custom image augmentation in keras".
I have been thinking about building a YOLO model for detecting parking lot occupancy, I have all the small segmented out images for every parking space. Can I train YOLO on these small images already divided into separate empty and occupied classes and test it on a test image like the ariel view of a parking lot with say 28 parking spots and the model should detect the occupied and empty spaces.
If yes then can someone guide me how to approach the problem? I will be using YOLO implemented on Keras.
YOLO is a n object detection model. During training, it takes coordinates of bounding boxes in an image as input and learns to identify the images inside such bounding boxes. As per your problem statement, if you have a aerial view of parking lot then draw the bounding boxes, generate xml files (as per your training requirement) and start training. This ideally should give you the desired model to predict.
Free tool to label images - https://github.com/tzutalin/labelImg
Github project to get an idea of how to train Yolo in Keras on custom dataset - https://github.com/experiencor/keras-yolo2
By any means, this is not a perfect tailor made solution for your problem given you haven't provided any code or images. But this is a good place to start.
I want to do regression with images. There are images of roads and the associated steering angle. As I want to apply data augmentation in Keras I would like to flip the input images horizontally but that would imply that the steering angle has to change its sign if the image is flipped. As far as I can see the documentation does not cover this problem. Is there a tutorial explaining how this can be achieved?
You have to write your own data-generator.
Check out the ImageLoader class (custom image generator) in my code here:
https://github.com/Golbstein/EDSR-Keras/blob/master/utilities.py
I am trying to train a classifier to separate images taken by a particle physics detector into two classes. For each image, I also have a coordinate (x,y,z) describing where the particle interaction took place. That coordinate is very useful is understanding these images by eye, but doesn't have an obvious translation to weighting image pixels.
I've been trying some basic machine learning techniques in scikit-learn, feeding in data points with 103 features: the three axes of the coordinates, and the 10x10 pixels of the image. Those basic techniques aren't cutting it, unfortunately, so I thought I'd try to take advantage of the properties of convolutional neural networks. Since I've never tried that before, Keras seemed like an easy way to get started.
Looking at Keras, I see that I ought to provide an input shape. I could presumably use a input shape of (103), but if I understand CNN correctly, I'd lose all the advantages of CNN for images. Intuitively, what I want the input shape to be is (3)+(10,10). Is that a sensible concept in the world of CNN? Can it be done in Keras?
You might want to look into the Merge layer. In essence this allows you to use two independent inputs, maybe give them a few different processing layers and them combine them for the rest of the model.
With this you could, for example, do several convolutional layers to process the image and then simply merge it with the coordinate inputs.