Training
Are training images already lable with classification & bounding box position, width, height?
If bounding box is labeled , do we use loss function (iou) to update our cnn model parameter?
Testing
When we prediction, do we still generate 2000 regions and input into trained model?
how to use these 2000 regions to find prediction's bounding box?
how about multiple images?
Related
I am working on a project that I need to train a CNN with Computer Tomography (CT) images, which has a circular Field Of View (FOV). All voxels outside the FOV is meaningless.
To my knowledge, people would usually only feed image with rectangular dimension into CNN model. For similar issues with circular image shape, they usually just crop part of the image to form a rectangular dimension during image-preprocessing. By this method, some part of the image is unavoidably cropped out from the training. Is there any way I can feed image with circular FOV into the model without cropping out any feature from the image? I can't really find similar topics for this concern on the internet so I ask here. Thank you.
Is ROI pooling (e.g. in Pytorch) suitable for any tasks instead of object detection?
For example, a pre-trained VGG16 on ImageNet. I use this image example as the input with the size of 640*640, and I will get a feature map with a size of 40*40 after the last convolutional layer. Assume that I already have the bounding box of the cat, shown in the red rectangle. Can I use the known bounding box to extract the features of the cat with ROI Pooling directly?
I'm building a neural network using Keras to perform landmarks localization on a grayscale images.
I saw that for classification task there is a Keras function to perform Data Augmentation. But for localization task I have not found a function to perform Data Augmentation since in the labeled data there are points to be changed.
Any idea to perform Data Augmentation for landmarks localization task?
Thanks!
I don't know if there is anything you can use out of the box in keras.
But what you can always do is write your own data augmentation. The only real difference to data augmentation for e.g. classification is that you also have to keep track of your landmarks.
So if you do a flipping you have to also flip the coordinates of your landmarks. Same with rotation, if you rotate around some angle you also have to rotate the landmarks. If you do cropping you have to transform the landmark coordinates to the new cropped image, which is just a simple translation and removing the points which are not in the new image.
On how to do this you can have a look here
https://medium.com/the-artificial-impostor/custom-image-augmentation-with-keras-70595b01aeac
or just in general search for something like "custom image augmentation in keras".
I am looking at using Landsat imagery to train a CNN for unsupervised pixel-wise semantic segmentation classification. That said, I have been unable to find a method that allows me to crop images from the larger Landsat image for training and then predict on the original image. Essentially here is what I am trying to do:
Original Landsat image (5,000 x 5,000 - this is an arbitrary size, not exactly sure of the actual dimensions off-hand) -> crop the image into (100 x 100) chunks -> train the model on these cropped images -> output a prediction for each pixel in the original (uncropped) image.
That said, I am not sure if I should predict on the cropped images and stitch them together after they are predicted or if I can predict on the original image.
Any clarification/code examples would be greatly appreciated. For reference, I use both pytorch and tensorflow.
Thank you!
Lance D
Borrowing from Ronneberger et al., what we have been doing is to split the input Landsat scene and corresponding ground truth mask into overlapping tiles. Take the original image and pad it by the overlap margin (we use reflection for the padding) then split into tiles. Here is a code snippet using scikit-image:
import skimage as sk
patches = sk.util.view_as_windows(image,
(self.tile_height+2*self.image_margin,
self.tile_width+2*self.image_margin,raster_value['channels']),
(self.tile_height,self.tile_width,raster_value['channels'])
I don't know what you are using for a loss function for unsupervised segmentation. In our case with supervised learning, we crop the final segmentation prediction to match the ground truth output shape. In the Ronneberger paper they relied on shrinkage due to the use of valid padding.
For predictions you would do the same (split into overlapping tiles) and stitch the result.
I'm using Keras to train a model to detect objects in images and put a bounding box around them.
I want to use ImageDataGenerator to augment the images with shift/rotate/sclae/etc.
The ImageDataGenerator is building a transformation matrix and using it to transform the images.
My question is, after I get back the augmented image, how can I adjust the bounding box according to the augmentation?
I'd say that if the transformation matrix was returned from the ImageDataGenerator together with the augmented image it will be great. but it doesn't.
So how to do it correctly?
Is it worth opening an issue for Keras to add this functionality?