I have a data set containing over 10,000 images of dogs. I plan to use tensorflow to classify dog breeds, however the images are random sizes. For example, some are 200x280 pixels, 100x140 pixels, etc. I would like to standardize the images so that they are smaller and all the same dimensions. Is this possible? I realize it will skew the images, but that is not a huge concern. I just need to be able to build a model.
See the Tensorflow image resizing docs.
It's also worth noting that you don't have to resize the images with Tensorflow. You could do it as a preprocessing step with OpenCV. Whether you do it with Tensorflow comes down the usual tradeoffs between doing preprocessing in or outside the graph (if you do it in the graph, one less step when you are serving predictions, but it also makes the graph more computationally expensive to train, among other tradeoffs).
Related
I am trying to implement an image classification task for the grayscale images, which were converted from some sensor readings. It means that I had initially time series data e.g. acceleration or displacement, then I transformed them into images. Before I do the transformation, I did apply normalization across the data. I have a 1000x9 image dimension where 1000 represents the total time step and 9 is the number of data points. The split ratio is 70%, 15%, and 15% for training, validation, and test data sets. There are 10 different labels, each label has 100 images, it's a multi-class classification task.
An example of my array before image conversion is:
As you see above, the precisions are so sensitive. When I convert them into images, I am able to see the darkness and white part of the image;
Imagine that I have a directory from D1 to D9 (damaged cases) and UN (health case) and there are so many images like this.
Then, I have a CNN-network where my goal is to make a classification. But, there is a significant overfitting issue and whatever I do it's not working out. One of the architecture I've been working on;
Model summary;
I also augment the data. After 250 epochs, this is what I get;
So, what I wonder is that I tried to apply some regularization or augmentation but they do not give me kind of solid results. I experimented it by changing the number of hidden units, layers, etc. Would you think that I need to fully change my architecture? I basically consider two blocks of CNN and FC layers at the end. This is not the first time I've been working on images like this, but I cannot mitigate this overfitting issue. I appreciate it if any of you give me some solid suggestions so I can get smooth results. i was thinking to use some pre-trained models for transfer learning but the image dimension causes some problems, do you know if I can use any of those pre-trained models with 1000x9 image dimension? I know there are some overfiting topics in the forum, but since those images are coming from numerical arrays and I could not make it work, I wanted to create a new title. Thank you!
I have trained a model for image segmentation task on 320x240x3 resolution images using tensorflow 2.x. I am wondering if there is a way to use the same model or tweak the model to make it work on different resolutions?
I have to use a model trained on a 320x240 resolution for Full HD (1920x1080) and SD(1280x720) images but as the GPU Memory is not sufficient to train the model at the specified resolutions with my architecture, I have trained it on 320x240 images.
I am looking for a scalable solution that works at all the resolutions. Any Suggestions?
The answer to your question is no: you cannot use a model trained at a particular resolution to be used at different resolution; in essence, this is why we train the models at different resolutions, to check the performance and possibly improve it.
The suggestion below omits one crucial aspect: that, depending on the task at hand, increasing the resolution can considerably improve the results in object detection and image segmentation, particularly if you have small objects.
The only solution for your problem, considering the GPU memory constraint, is to try to split the initial image into smaller parts (or maybe tiles) and train per part(say 320x240) and then reconstruct the initial image; otherwise, there is no other solution than to increase the GPU memory in order to train at higher resolutions.
PS: I understood your question after reading it a couple of times; I suggest that you modify a little bit the details w.r.t the resolution.
YEAH, you can do it in high resolution image. But the small resolution is easy to train and it is easy for the model to find the features of the image. Training in small resolution models saves your time and makes your model faster since it has the less number of parameters. HD images contains large amount of pixels, so if you train your model in higher resolution images, it makes your training and model slower as it contains large number of parameters due to the presence of higher number of pixels and it makes difficult for your model to find features in the high resolution image. So, mostly your are advisable to use lower resolution instead of higher resolution.
Can anyone tell me in which situations the above functions are used and how they affect the image size?
I want to resize the Cat V Dogs images and i am a bit confuse about how to use them.
There are lots of details in TorchVision documentation actually.
The typical use case is for object detection or image segmentation tasks, but other uses could exist.
Here is a non-exhaustive list of uses:
Resize is used in Convolutional Neural Networks to adapt the input image to the network input shape, in this case this is not data-augmentation but just pre-processing. It can also be used in Fully Convolutional Networks to emulate different scales for an input image, this is data-augmentation.
CenterCrop RandomCrop and RandomResizedCrop are used in segmentation tasks to train a network on fine details without impeding too much burden during training. For with a database of 2048x2048 images you can train on 512x512 sub-images and then at test time infer on full resolution images. It is also used in object detection networks as data-augmentation. The resized variant lets you combine the previous resize operation.
All of them potentially change the image resolution.
I'm currently performing a pixel-based classification of an image using simple supervised classifiers implemented in Scikit-learn. The image is first reshaped into a vector of single pixel intensities, then the training and the classification are carried out as in the following:
from sklearn.linear_model import SGDClassifier
classifier = SGDClassifier(verbose=True)
classifier.fit(training_data, training_target)
predictions = classifier.predict(test_data)
The problem with pixel-based classification is the noisy nature of the resulting classified image. To prevent it, I wanted to use Graph Cut (e.g. Boykov-Kolmogorov implementation) to take into account the spatial context between pixels. But, the implmentations I found in Python (NetworkX, Graph-tool) and in C++ (OpenGM and the original implementation: [1] and [2]) don't show how to go from an image to a Graph, except for [2] which is in matlab, and I'm not really enough familiar with either of Graph Cut and matlab.
So my question is basically how can graph cuts be integrated into the previous classification (e.g. before the training or as a post-processing)?
I had a look at the graph algorithms in Scikit-image (here), but these work only on RGB images with discreet values, whereas my pixel values are continuous.
I found this image restoration tutorial which does more or less what I was looking for. Besides, you use a Python library wrapper (PyMaxflow) to call the maxflow algorithm to partition the graph.
It starts from the noisy image on the left, and takes into account the spatial constraint between pixels, to obtain the binary image on the right.
I am trying to train a classifier to separate images taken by a particle physics detector into two classes. For each image, I also have a coordinate (x,y,z) describing where the particle interaction took place. That coordinate is very useful is understanding these images by eye, but doesn't have an obvious translation to weighting image pixels.
I've been trying some basic machine learning techniques in scikit-learn, feeding in data points with 103 features: the three axes of the coordinates, and the 10x10 pixels of the image. Those basic techniques aren't cutting it, unfortunately, so I thought I'd try to take advantage of the properties of convolutional neural networks. Since I've never tried that before, Keras seemed like an easy way to get started.
Looking at Keras, I see that I ought to provide an input shape. I could presumably use a input shape of (103), but if I understand CNN correctly, I'd lose all the advantages of CNN for images. Intuitively, what I want the input shape to be is (3)+(10,10). Is that a sensible concept in the world of CNN? Can it be done in Keras?
You might want to look into the Merge layer. In essence this allows you to use two independent inputs, maybe give them a few different processing layers and them combine them for the rest of the model.
With this you could, for example, do several convolutional layers to process the image and then simply merge it with the coordinate inputs.