I work on a project of classification of images (in 5 types).
The images have various sizes, so I reshape them first and then use a CNN.
I have already pretty good results. However, the type of my images is very correlated with their shape (for example image of type A are very little and image of type B is very big, in 99% of case).
So, I think that my score would significatively increase if I would give to my model the information on the original size of the image. But I don't know how do this... How can I give this information to my model? (I work with Keras)
Thanks in advance,
Larapa
Related
I am trying to implement an image classification task for the grayscale images, which were converted from some sensor readings. It means that I had initially time series data e.g. acceleration or displacement, then I transformed them into images. Before I do the transformation, I did apply normalization across the data. I have a 1000x9 image dimension where 1000 represents the total time step and 9 is the number of data points. The split ratio is 70%, 15%, and 15% for training, validation, and test data sets. There are 10 different labels, each label has 100 images, it's a multi-class classification task.
An example of my array before image conversion is:
As you see above, the precisions are so sensitive. When I convert them into images, I am able to see the darkness and white part of the image;
Imagine that I have a directory from D1 to D9 (damaged cases) and UN (health case) and there are so many images like this.
Then, I have a CNN-network where my goal is to make a classification. But, there is a significant overfitting issue and whatever I do it's not working out. One of the architecture I've been working on;
Model summary;
I also augment the data. After 250 epochs, this is what I get;
So, what I wonder is that I tried to apply some regularization or augmentation but they do not give me kind of solid results. I experimented it by changing the number of hidden units, layers, etc. Would you think that I need to fully change my architecture? I basically consider two blocks of CNN and FC layers at the end. This is not the first time I've been working on images like this, but I cannot mitigate this overfitting issue. I appreciate it if any of you give me some solid suggestions so I can get smooth results. i was thinking to use some pre-trained models for transfer learning but the image dimension causes some problems, do you know if I can use any of those pre-trained models with 1000x9 image dimension? I know there are some overfiting topics in the forum, but since those images are coming from numerical arrays and I could not make it work, I wanted to create a new title. Thank you!
I am testing some well known models for computer vision: UNet, FC-DenseNet103, this implementation
I train them with 224x224 randomly cropped patches and do the same on the validation set.
Now when I run inference on some videos, I pass it the frames directly (1280x640) and it works. It runs the same operations on different image sizes and never gives an error. It actually gives a nice output, but the quality of the output depends on the image size...
Now it's been a long time since I've worked with neural nets but when I was using tensorflow I remember I had to crop the input images to the train crop size.
Why don't I need to do this anymore? What's happening under the hood?
It seems that the models that you are using have no linear layers. Because of this the output of the convolutional layers go straight into the softmax function. The softmax function doesn't take a specific shape for its input so it can take any shape as input. Because of this your model will work with any shape of image but the accuracy of your model will probably be far worse given different image shapes than the one you trained on.
There is always a specific input size in the documentation of the model. You should use this size. These are the current model limitations.
For UNets this may even be a ratio. I think it depends on implementation.
Just a note on resize:
transform.Resize((h,w))
transform.Resize(d)
In case of the (h, w), output size will be matched to this.
In the second case of d size, the smaller edge of the image will be matched to d.
For example, if height > width, then image will be re-scaled to (d * height / width, d)
The idea is to not ruin the aspect ratio of the image.
I have trained a model for image segmentation task on 320x240x3 resolution images using tensorflow 2.x. I am wondering if there is a way to use the same model or tweak the model to make it work on different resolutions?
I have to use a model trained on a 320x240 resolution for Full HD (1920x1080) and SD(1280x720) images but as the GPU Memory is not sufficient to train the model at the specified resolutions with my architecture, I have trained it on 320x240 images.
I am looking for a scalable solution that works at all the resolutions. Any Suggestions?
The answer to your question is no: you cannot use a model trained at a particular resolution to be used at different resolution; in essence, this is why we train the models at different resolutions, to check the performance and possibly improve it.
The suggestion below omits one crucial aspect: that, depending on the task at hand, increasing the resolution can considerably improve the results in object detection and image segmentation, particularly if you have small objects.
The only solution for your problem, considering the GPU memory constraint, is to try to split the initial image into smaller parts (or maybe tiles) and train per part(say 320x240) and then reconstruct the initial image; otherwise, there is no other solution than to increase the GPU memory in order to train at higher resolutions.
PS: I understood your question after reading it a couple of times; I suggest that you modify a little bit the details w.r.t the resolution.
YEAH, you can do it in high resolution image. But the small resolution is easy to train and it is easy for the model to find the features of the image. Training in small resolution models saves your time and makes your model faster since it has the less number of parameters. HD images contains large amount of pixels, so if you train your model in higher resolution images, it makes your training and model slower as it contains large number of parameters due to the presence of higher number of pixels and it makes difficult for your model to find features in the high resolution image. So, mostly your are advisable to use lower resolution instead of higher resolution.
I have been working on a dataset in which the goal is to determine that which type of orientation is it. It is a classification problem in which for each record(for most of them) I am having 4 images - front facing, left facing, right facing and back facing product images.
I want to classify these images in the above 4 categories.
The dataset looks like this :
I have downloaded the images and put them in different folders according to their classes.
Methods I have applied:
Till now I have applied two methods to classify these images.
1) I have tried vgg16 directly to classify the images but it did not give me even 50% accuracy.
2) I converted those images into edge images with black background as:
This is done using canny edge detection. It was done because in the result I was getting images with similar color dresses, similar design dresses, etc.
On top of these I again applied vgg16, resnet50, inception models but nothing seemed to work.
Can you suggest some ideas that can work in my case and classify the images in a better way.
first of all yor data set has to be equally splited. For instance 80% train and 20% test. After that you have to balance these sets (train set 60% of class A images, 40% of class B images) the exact same for test set.
I know that it might be a dumb question, but I searched everywhere for an answer but I could not get.
Okay first properly explaining my question,
When I was learning CNN I was told that kernels or filters or activation map represent a feature of image.
To be specific, assume a cat image identification, a feature map would represent a "whiskers"
and in images which the activation of this feature map would be high it is inferred as whisker is present in image and so the image is a cat. (Correct me if I am wrong)
Well now when I made a Keras ConvNet I save the model
and then loaded the model and
saved all the filters to png images.
What I saw was 3x3 px images where each each pixel was of different colour (green, blue or their various variants and so on)
So how these 3x3px random colour pattern images of kernels represent in any way the "whisker" or any other feature of cat?
Or how could I know which png images is which feature ie which is whisker detector filter etc?
I am asking this because I might be asked in oral examination by teacher.
Sorry for the length of answer (but I had to make it so to explain properly)
You need to have a further look into how convolutional neural networks operate: the main topic being the convolution itself. The convolution occurs with the input image and filters/kernels to produce feature maps. A feature map is what may highlight important features.
The filters/kernels do not know anything of the input data so when you save these you are only going to see psuedo-random images.
Put simply, where * is the convolution operator,
input_image * filter = feature map
What you want to save, if you want to vizualise what is occuring during convolution, are the feature maps. This website gives a very detailed account on how to do so, and it is the method I have used in the past.