all.
I want to use MNIST and SVHN dataset.
MNIST is 28x28 and SVHN is 32x32.
so I want to make those dataset have same size.
But I`m not sure which one is the better.
1.Resize MNIST 28x28 to 32x32
2.Resize SVHN 32x32 to 28x28
Choose the second one(resize MNIST, or add zero padding for MNIST).
There is only one reason, when we will crop or resize from SVHN to MNIST we will lose some data anyway.
But SVHN and MNIST have not the only different size of pictures, but the number of channels(SVHN is colored and probably is in RGB). The question here is "Should we change RGB to Grayscale or vice versa?"
And my answer for it is "it depends". Depends on the goals of the final application of NN. What color mode will have pictures from the general sample? Would it be other MNIST pictures or colored handwritten digits or maybe general digits in our life?
That answer you have to answer yourself.
Related
I am trying to implement an image classification task for the grayscale images, which were converted from some sensor readings. It means that I had initially time series data e.g. acceleration or displacement, then I transformed them into images. Before I do the transformation, I did apply normalization across the data. I have a 1000x9 image dimension where 1000 represents the total time step and 9 is the number of data points. The split ratio is 70%, 15%, and 15% for training, validation, and test data sets. There are 10 different labels, each label has 100 images, it's a multi-class classification task.
An example of my array before image conversion is:
As you see above, the precisions are so sensitive. When I convert them into images, I am able to see the darkness and white part of the image;
Imagine that I have a directory from D1 to D9 (damaged cases) and UN (health case) and there are so many images like this.
Then, I have a CNN-network where my goal is to make a classification. But, there is a significant overfitting issue and whatever I do it's not working out. One of the architecture I've been working on;
Model summary;
I also augment the data. After 250 epochs, this is what I get;
So, what I wonder is that I tried to apply some regularization or augmentation but they do not give me kind of solid results. I experimented it by changing the number of hidden units, layers, etc. Would you think that I need to fully change my architecture? I basically consider two blocks of CNN and FC layers at the end. This is not the first time I've been working on images like this, but I cannot mitigate this overfitting issue. I appreciate it if any of you give me some solid suggestions so I can get smooth results. i was thinking to use some pre-trained models for transfer learning but the image dimension causes some problems, do you know if I can use any of those pre-trained models with 1000x9 image dimension? I know there are some overfiting topics in the forum, but since those images are coming from numerical arrays and I could not make it work, I wanted to create a new title. Thank you!
I am testing some well known models for computer vision: UNet, FC-DenseNet103, this implementation
I train them with 224x224 randomly cropped patches and do the same on the validation set.
Now when I run inference on some videos, I pass it the frames directly (1280x640) and it works. It runs the same operations on different image sizes and never gives an error. It actually gives a nice output, but the quality of the output depends on the image size...
Now it's been a long time since I've worked with neural nets but when I was using tensorflow I remember I had to crop the input images to the train crop size.
Why don't I need to do this anymore? What's happening under the hood?
It seems that the models that you are using have no linear layers. Because of this the output of the convolutional layers go straight into the softmax function. The softmax function doesn't take a specific shape for its input so it can take any shape as input. Because of this your model will work with any shape of image but the accuracy of your model will probably be far worse given different image shapes than the one you trained on.
There is always a specific input size in the documentation of the model. You should use this size. These are the current model limitations.
For UNets this may even be a ratio. I think it depends on implementation.
Just a note on resize:
transform.Resize((h,w))
transform.Resize(d)
In case of the (h, w), output size will be matched to this.
In the second case of d size, the smaller edge of the image will be matched to d.
For example, if height > width, then image will be re-scaled to (d * height / width, d)
The idea is to not ruin the aspect ratio of the image.
I am looking at using Landsat imagery to train a CNN for unsupervised pixel-wise semantic segmentation classification. That said, I have been unable to find a method that allows me to crop images from the larger Landsat image for training and then predict on the original image. Essentially here is what I am trying to do:
Original Landsat image (5,000 x 5,000 - this is an arbitrary size, not exactly sure of the actual dimensions off-hand) -> crop the image into (100 x 100) chunks -> train the model on these cropped images -> output a prediction for each pixel in the original (uncropped) image.
That said, I am not sure if I should predict on the cropped images and stitch them together after they are predicted or if I can predict on the original image.
Any clarification/code examples would be greatly appreciated. For reference, I use both pytorch and tensorflow.
Thank you!
Lance D
Borrowing from Ronneberger et al., what we have been doing is to split the input Landsat scene and corresponding ground truth mask into overlapping tiles. Take the original image and pad it by the overlap margin (we use reflection for the padding) then split into tiles. Here is a code snippet using scikit-image:
import skimage as sk
patches = sk.util.view_as_windows(image,
(self.tile_height+2*self.image_margin,
self.tile_width+2*self.image_margin,raster_value['channels']),
(self.tile_height,self.tile_width,raster_value['channels'])
I don't know what you are using for a loss function for unsupervised segmentation. In our case with supervised learning, we crop the final segmentation prediction to match the ground truth output shape. In the Ronneberger paper they relied on shrinkage due to the use of valid padding.
For predictions you would do the same (split into overlapping tiles) and stitch the result.
I'm using Keras to train a model to detect objects in images and put a bounding box around them.
I want to use ImageDataGenerator to augment the images with shift/rotate/sclae/etc.
The ImageDataGenerator is building a transformation matrix and using it to transform the images.
My question is, after I get back the augmented image, how can I adjust the bounding box according to the augmentation?
I'd say that if the transformation matrix was returned from the ImageDataGenerator together with the augmented image it will be great. but it doesn't.
So how to do it correctly?
Is it worth opening an issue for Keras to add this functionality?
I want to do regression with images. There are images of roads and the associated steering angle. As I want to apply data augmentation in Keras I would like to flip the input images horizontally but that would imply that the steering angle has to change its sign if the image is flipped. As far as I can see the documentation does not cover this problem. Is there a tutorial explaining how this can be achieved?
You have to write your own data-generator.
Check out the ImageLoader class (custom image generator) in my code here:
https://github.com/Golbstein/EDSR-Keras/blob/master/utilities.py