Can anyone tell me in which situations the above functions are used and how they affect the image size?
I want to resize the Cat V Dogs images and i am a bit confuse about how to use them.
There are lots of details in TorchVision documentation actually.
The typical use case is for object detection or image segmentation tasks, but other uses could exist.
Here is a non-exhaustive list of uses:
Resize is used in Convolutional Neural Networks to adapt the input image to the network input shape, in this case this is not data-augmentation but just pre-processing. It can also be used in Fully Convolutional Networks to emulate different scales for an input image, this is data-augmentation.
CenterCrop RandomCrop and RandomResizedCrop are used in segmentation tasks to train a network on fine details without impeding too much burden during training. For with a database of 2048x2048 images you can train on 512x512 sub-images and then at test time infer on full resolution images. It is also used in object detection networks as data-augmentation. The resized variant lets you combine the previous resize operation.
All of them potentially change the image resolution.
Related
I was training a neural network with images of an eye that are shaped 36x60. So I can only predict the result using a 36x60 image? But in my application I have a video stream, this stream is divided into frames, for each frame 68 points of landmarks are predicted. In the eye range, I can select the eye point, and using the 'boundingrect' function from OpenCV, it is very easy to get a cropped image. But this image has no form 36x60. What is the correct way to get 36x60 data that can be used for forecasting? Or how to use a neural network for data of another form?
Neural networks (insofar as I've encountered) have a fixed input shape, freedom permitted only to batch size. This (probably) goes for every amazing neural network you've ever seen. Don't be too afraid of reshaping your image with off-the-shelf sampling to the network's expected input size. Robust computer-vision networks are generally trained on augmented data; randomly scaled, skewed, and otherwise transformed in order to---among other things---broaden the network's ability to handle this unavoidable scaling situation.
There are caveats, of course. An input for prediction should be as similar to the dataset it was trained on as possible, which is to say that a model should be applied to the data for which it was designed. For example, consider an object detection network made for satellite applications. If that same network is then applied to drone imagery, the relative size of objects may be substantially larger than the objects for which the network (specifically its anchor-box sizes) was designed.
Tl;dr: Assuming you're using the right network for the job, don't be afraid to scale your images/frames to fit the network's inputs.
The primary objective (my assigned work) is to do an image segmentation for the underwater images using a convolutional neural network. The camera shots taken from the underwater structure will have poor image quality due to severe noise and bad light exposure. In order to achieve higher classification accuracy, I want to do an automatic image enhancement for the images (see the attached file). So, I want to know, which CNN architecture will be best to do both tasks. Please kindly suggest any possible solutions to achieve the objective.
What do you need to segment? I'd be nice so see some labels of the segmentation.
You may not need to enhance the image, if all your dataset has that same amount of noise, the network will generalize properly.
Regarding CNNs architectures, it depends on the constraints you have with processing power and accuracy. If that is not a constrain go with something like MaskRCNN, check that repo as a good starting point, some results are like this:
Be mindful it's a bit of a complex architecture so inference times might be a bit too high (but it's doable on realtime depending your gpu).
Other simple architectures are FCN (Fully Convolutional Networks) with are basically your CNN but instead of fully connected layers:
You replace with with Fully Convolutional Layers:
Images taken from HERE.
The advantage of this FCNs are that they are really easy to implement and modify since you can go with simple architectures (FCN-Alexnet), to more complex and more accurate ones (FCN-VGG, FCN-Resnet).
Also, I think you don't mention framework, there are many to choose from and it depends on your familiarly with languages, most of them you can do them with python:
TensorFlow
Pytorch
MXNet
But if you are a beginner, try starting with a GUI based one, Nvidia Digits is a great starting point and really easy to configure, it's based on Caffe so it's fairly fast when deploying and can easily be integrated with accelerators like TensorRT.
I want to make mushroom classifier with tensorflow using CNN.
But, I wonder about image data pre-processing.
Should I remove background of picture as black color or just use raw picture?
Also, if any pre-processing step before cnn what I do, please let me know.
The question is a little bit too broad, but I'll give you a hint.
Should I remove background of picture as black color or just use raw picture?
If you can do this, you can achieve higher accuracy with data augmentation, because you can generate training images with various backgrounds, thus help generalization.
Note however that by just removing the background the neural network will likely "get used" to the black background, so you would need to translate your test images as well, which in turn needs image segmentation.
Since image segmentation is even harder than classification, the background is usually left unchanged.
Also, if any pre-processing step before CNN what I do, please let me know.
The one pre-processing step that works consistently for all image related tasks is zero-centering: compute the mean value across the training set and use that value to zero-center the images. Be careful not to use test images in computing the mean.
I have made a convolutional neural network to mnist data. Now I want to change the input to my image. How can I do it? need to save the picture in a specific format?In addition, how save all picture and train one after the other?I use in tensorflow with python.
Tensorflow has support for bmp, gif, jpeg and png out of the box.
So load the data (read the file into memory as a 0D tensor of type string) then pass it to tf.image.decode_image or one of the specialized functions if it doesn't work for some reason.
You should get back the image as a tensor of shape [width, height, channels] (channels might be missing if you only have a single channel image, like grayscale).
To make this work nice you should have all the images in the same format. If you can load all the images into ram and pass them in bulk go for it since it's probably the easiest thing to do. Next easiest thing would be to copy the images into tensorflow.Example and to tf.TFRecordReader to do the shuffling and batching. If all else fails I think you can setup the input functions to read the images on demand and pipe them through the batching mechanism but I'm not sure how I would do that.
Here's a link to the tensorflow documentation related to images.
I have a data set containing over 10,000 images of dogs. I plan to use tensorflow to classify dog breeds, however the images are random sizes. For example, some are 200x280 pixels, 100x140 pixels, etc. I would like to standardize the images so that they are smaller and all the same dimensions. Is this possible? I realize it will skew the images, but that is not a huge concern. I just need to be able to build a model.
See the Tensorflow image resizing docs.
It's also worth noting that you don't have to resize the images with Tensorflow. You could do it as a preprocessing step with OpenCV. Whether you do it with Tensorflow comes down the usual tradeoffs between doing preprocessing in or outside the graph (if you do it in the graph, one less step when you are serving predictions, but it also makes the graph more computationally expensive to train, among other tradeoffs).