Image pre-processing for convolutional neural network - python-3.x

I want to make mushroom classifier with tensorflow using CNN.
But, I wonder about image data pre-processing.
Should I remove background of picture as black color or just use raw picture?
Also, if any pre-processing step before cnn what I do, please let me know.

The question is a little bit too broad, but I'll give you a hint.
Should I remove background of picture as black color or just use raw picture?
If you can do this, you can achieve higher accuracy with data augmentation, because you can generate training images with various backgrounds, thus help generalization.
Note however that by just removing the background the neural network will likely "get used" to the black background, so you would need to translate your test images as well, which in turn needs image segmentation.
Since image segmentation is even harder than classification, the background is usually left unchanged.
Also, if any pre-processing step before CNN what I do, please let me know.
The one pre-processing step that works consistently for all image related tasks is zero-centering: compute the mean value across the training set and use that value to zero-center the images. Be careful not to use test images in computing the mean.

Related

Can't overcome Overfitting - GrayScale Images from Numerical Arrays and CNN with PyTorch

I am trying to implement an image classification task for the grayscale images, which were converted from some sensor readings. It means that I had initially time series data e.g. acceleration or displacement, then I transformed them into images. Before I do the transformation, I did apply normalization across the data. I have a 1000x9 image dimension where 1000 represents the total time step and 9 is the number of data points. The split ratio is 70%, 15%, and 15% for training, validation, and test data sets. There are 10 different labels, each label has 100 images, it's a multi-class classification task.
An example of my array before image conversion is:
As you see above, the precisions are so sensitive. When I convert them into images, I am able to see the darkness and white part of the image;
Imagine that I have a directory from D1 to D9 (damaged cases) and UN (health case) and there are so many images like this.
Then, I have a CNN-network where my goal is to make a classification. But, there is a significant overfitting issue and whatever I do it's not working out. One of the architecture I've been working on;
Model summary;
I also augment the data. After 250 epochs, this is what I get;
So, what I wonder is that I tried to apply some regularization or augmentation but they do not give me kind of solid results. I experimented it by changing the number of hidden units, layers, etc. Would you think that I need to fully change my architecture? I basically consider two blocks of CNN and FC layers at the end. This is not the first time I've been working on images like this, but I cannot mitigate this overfitting issue. I appreciate it if any of you give me some solid suggestions so I can get smooth results. i was thinking to use some pre-trained models for transfer learning but the image dimension causes some problems, do you know if I can use any of those pre-trained models with 1000x9 image dimension? I know there are some overfiting topics in the forum, but since those images are coming from numerical arrays and I could not make it work, I wanted to create a new title. Thank you!

Resize Vs CenterCrop Vs RandomResizedCrop Vs RandomCrop

Can anyone tell me in which situations the above functions are used and how they affect the image size?
I want to resize the Cat V Dogs images and i am a bit confuse about how to use them.
There are lots of details in TorchVision documentation actually.
The typical use case is for object detection or image segmentation tasks, but other uses could exist.
Here is a non-exhaustive list of uses:
Resize is used in Convolutional Neural Networks to adapt the input image to the network input shape, in this case this is not data-augmentation but just pre-processing. It can also be used in Fully Convolutional Networks to emulate different scales for an input image, this is data-augmentation.
CenterCrop RandomCrop and RandomResizedCrop are used in segmentation tasks to train a network on fine details without impeding too much burden during training. For with a database of 2048x2048 images you can train on 512x512 sub-images and then at test time infer on full resolution images. It is also used in object detection networks as data-augmentation. The resized variant lets you combine the previous resize operation.
All of them potentially change the image resolution.

How feature map in Keras ConvNet represent features?

I know that it might be a dumb question, but I searched everywhere for an answer but I could not get.
Okay first properly explaining my question,
When I was learning CNN I was told that kernels or filters or activation map represent a feature of image.
To be specific, assume a cat image identification, a feature map would represent a "whiskers"
and in images which the activation of this feature map would be high it is inferred as whisker is present in image and so the image is a cat. (Correct me if I am wrong)
Well now when I made a Keras ConvNet I save the model
and then loaded the model and
saved all the filters to png images.
What I saw was 3x3 px images where each each pixel was of different colour (green, blue or their various variants and so on)
So how these 3x3px random colour pattern images of kernels represent in any way the "whisker" or any other feature of cat?
Or how could I know which png images is which feature ie which is whisker detector filter etc?
I am asking this because I might be asked in oral examination by teacher.
Sorry for the length of answer (but I had to make it so to explain properly)
You need to have a further look into how convolutional neural networks operate: the main topic being the convolution itself. The convolution occurs with the input image and filters/kernels to produce feature maps. A feature map is what may highlight important features.
The filters/kernels do not know anything of the input data so when you save these you are only going to see psuedo-random images.
Put simply, where * is the convolution operator,
input_image * filter = feature map
What you want to save, if you want to vizualise what is occuring during convolution, are the feature maps. This website gives a very detailed account on how to do so, and it is the method I have used in the past.

Image segmentation training set labeling

I am new to pytorch and Deep learning. I am trying to do image segmentation.
But , I am stuck at how to label training set images.
Can anyone please help me ?
This is one of my training image
I have two kinds of plants here - one is weed and another one is a good crop. I need to label them.
Can anyone tell me how can I do this ?
I am going to use deep neural network models ( like ResNet ) on the labelled data.
There is discussions here about segmentation tools for image labeling. You may find it useful.
Try with https://oclavi.com which is a web-based object annotation tool

Can I use Keras or a similar CNN tool on a paired image and coordinate?

I am trying to train a classifier to separate images taken by a particle physics detector into two classes. For each image, I also have a coordinate (x,y,z) describing where the particle interaction took place. That coordinate is very useful is understanding these images by eye, but doesn't have an obvious translation to weighting image pixels.
I've been trying some basic machine learning techniques in scikit-learn, feeding in data points with 103 features: the three axes of the coordinates, and the 10x10 pixels of the image. Those basic techniques aren't cutting it, unfortunately, so I thought I'd try to take advantage of the properties of convolutional neural networks. Since I've never tried that before, Keras seemed like an easy way to get started.
Looking at Keras, I see that I ought to provide an input shape. I could presumably use a input shape of (103), but if I understand CNN correctly, I'd lose all the advantages of CNN for images. Intuitively, what I want the input shape to be is (3)+(10,10). Is that a sensible concept in the world of CNN? Can it be done in Keras?
You might want to look into the Merge layer. In essence this allows you to use two independent inputs, maybe give them a few different processing layers and them combine them for the rest of the model.
With this you could, for example, do several convolutional layers to process the image and then simply merge it with the coordinate inputs.

Resources