Multiple Images as one input in Keras CNN - keras

I am trying to build an image classification model with Keras that should be able to classify cups as "good condition" or "defect". This was easy to do with a single image as input. However, I now want to try feeding 6 images, one from every angle(top, bottom, side etc), as input. What would be the best approach for this ? My initial idea was a np array of shape (6, width, height, 3), but this deemed unsuitable.

Related

Tensorflow Keras: Problems to handle variable length input, using generator?

We want to train our model on varying input dimensions. Every input in a given batch and across batches has different dimensions.
We cannot resize our input (since we’ll lose our microscopic features). Now, since we cannot resize our input, converting them into batches of numpy array becomes impossible. In order to handle this now I have made the list for the input and each list of element contained (height, width, 1). Height is variable size and width is constant.
Sometime my input excessively large. In order to do that I have plan to use model.fit_generator(). In this, We find the max height and width of input in a batch and pad every other input with zeros so that every input in the batch has an equal dimension. Now we can easily convert it to a numpy array or a tensor and pass it to the fit_generator(). The model automatically learns to ignore the zeros and learns features from the intended portion from the padded input. This way we have a batch with equal input dimensions but every batch has a different shape (due to difference in max height and width of input across batches).
Now until here, I described the things what I have learned and what I have plan to do with variable input data. But I am stuck with the following confusions:
1- I have plan to use CNN first and then LSTM on that. I am using tensorflow keras. There, we have the facility of padding and masking . However, As for as I know that LSTM can work on masking and padding ignore 0-padded values. However, I am concerned about the CNN (does CNN ignores 0-padded values), because my padded input will first feed to CNN. I have seen some discussion in the following links:
How to apply masking layer to sequential CNN model in Keras?
https://github.com/keras-team/keras/issues/411
In these link, they mentioned that Unfortunately masking is not yet supported by the Keras Conv layers. However, now we can see alot of development and advancements specifically in the form of tensorflow Keras. So I am wondering that now tensorflow keras can support masking input?
2- To use the generator, we can use custom keras generator. For that I went through a vary good tutorial. I made the mind to use this. But I am wondering is there any advance built-in facility in tensorflow keras to use generator and save me to write custom keras generator?

Torch model forward with a diferent image size

I am testing some well known models for computer vision: UNet, FC-DenseNet103, this implementation
I train them with 224x224 randomly cropped patches and do the same on the validation set.
Now when I run inference on some videos, I pass it the frames directly (1280x640) and it works. It runs the same operations on different image sizes and never gives an error. It actually gives a nice output, but the quality of the output depends on the image size...
Now it's been a long time since I've worked with neural nets but when I was using tensorflow I remember I had to crop the input images to the train crop size.
Why don't I need to do this anymore? What's happening under the hood?
It seems that the models that you are using have no linear layers. Because of this the output of the convolutional layers go straight into the softmax function. The softmax function doesn't take a specific shape for its input so it can take any shape as input. Because of this your model will work with any shape of image but the accuracy of your model will probably be far worse given different image shapes than the one you trained on.
There is always a specific input size in the documentation of the model. You should use this size. These are the current model limitations.
For UNets this may even be a ratio. I think it depends on implementation.
Just a note on resize:
transform.Resize((h,w))
transform.Resize(d)
In case of the (h, w), output size will be matched to this.
In the second case of d size, the smaller edge of the image will be matched to d.
For example, if height > width, then image will be re-scaled to (d * height / width, d)
The idea is to not ruin the aspect ratio of the image.

How to use ImageDataGenerator with multi-label masks for multi-class image segmentation?

In order to do multiclass segmentation the masks need to be one-hot-encoded. For example if I have a 100 images of shape 224x224x3 with 5 different classes I would have a set of masks with shape (100, 224, 224, 5) i.e the last dimension (the channel) refers to the class of the pixel. Take a grayscale masks that contains 6 classes where each pixel has the label 1-6, I can easily convert this to the categorical mask I need using tf.keras.utils.to_categorical.
If I use the ImageDataGenerator provided with keras I know I can create a generator for both images and masks then zip them together for the problem (as code shows below) but where i'm confused is how do I convert the masks into this categorical one-hot-encoded structure whilst using the ImageDataGenerator? The ImageDataGenerator only finds files in directories that are saved as images therefore I can't convert the masks and then save them down as numpy arrays (the one-hot-encoded masks) for the generator to pick up, as images can't have that have more than 4 channels right? Is there somehow of telling the generator to do this conversion? Or does this therefore limit the number of classes I can have in my problem?
One solution is to write my own custom generator with the sequence class which I have done but I'm keen on understanding if this is possible to do with Keras inbuilt ImageDataGenenerator? Could writing my a lambda layer on the network be the solution?
mask_categorical = tf.keras.utils.to_categoricl(mask) #converts 224x224 grayscale mask to one-hot encoding version
imgDataGen = ImageDataGenerator(rescale=1/255.)
maskDataGen = ImageDataGenerator()
imageGenerator =imageDataGen.flow_from_directory("dataset/image/",
class_mode=None, seed=40)
maskGenerator = maskDataGen.flow_from_directory("dataset/mask/",
class_mode=None, seed=40)
trainGenerator = zip(imageGenerator, maskGenerator)

what is the correct way to reshape image segmentation model output from 2D (num of classes, num of pixels) to 3D channel last images

I am using keras and python for satellite image segmentation. It is my understanding that to get (pixel level)predictions for image segmentation, model reshapes layer of dimension(-1,num_classes,height,width) to shape (-1,num_classes,height*width).This is then followed by applying activation function like softmax or sigmoid. My question is how to recover images after this step back in the format either channel first or channel last?
example code
o = (Reshape(( num_classes , outputHeight*outputWidth)))(o)
o = (Permute((2, 1)))(o)
o = (Activation('softmax'))(o)
I have tried adding following layer to the model at the end
o = (Reshape((outputHeight, outputWidth, num_classes)))(o)
Is this correct? will this reorient the image pixels in the same order as original or not?
Another alternative may be to use following code on individual images.
array.reshape(height, width, num_classes)
Which method should i use to get pixel level segmentation result?
No, if you are interested in an image segmentation, you should not flatten and then reshape your tensors. Instead, use a fully convolutional model, like the U-Net. You find a lot of example implementations of it on github, e.g. here

CNN-LSTM Image Classification

Is it possible to reshape 512x512 rgb image to (timestep, dim)? Otherwards, I am trying to convert this reshape layer: Reshape((23, 3887)) to 512 vice 299. Also, is there any documentation explaining how to determine input_dim and timestep for Keras?
It seems like your problem is similar to one that i had earlier today. Look at it here: Keras functional API: Combine CNN model with a RNN to to look at sequences of images
Now to add to the answer from the question i linked too. Let number_of_images be n. In your case the original data format would be (n, 512, 512, 3). All you then need to do decide how many images you want per sequence. Say you want a sequence of 5 images and have gotten 5000 images in total. Then reshaping to (1000, 5, 512, 512, 3) should do. This way the model sees 1000 sequences of 5 images.

Resources