Why Grad CAM need resize to 224 x 224 - conv-neural-network

I implemented grad CAM as per https://github.com/1Konny/gradcam_plus_plus-pytorch.
however, when I run the grad cam with different image size, I get dramatically different CAM. The model is trained on the 50x50 images.
GradCAM with 224x224 image
GradCAM with 50x50

You need to scale the image to match the size of the input image, which in your case is 50 x 50. However, if your model was trained on 224 x 224 sized images (say in a pre-trained network), I would stick to inputting 224 x 224 sized images.

Related

Can someone explain how to scaling up small images for CNN training in pytorch?

I'm training CNN on ImageNet, this dataset contains images that are as small as 96 * 96 pixels, and others are as big as 1600 * 1200 pixels. However, I know about scaling down big images but I cannot understand scaling up small images, how is it done and how does it affect training?
If you want to stick to PyTorch:
Follow the discussion and the link to the PyTorch tutorial.
https://discuss.pytorch.org/t/how-do-i-resize-imagenet-image-to-224-x-224/66979
Another option would be to use the ImUtils python package.
frame_resized = imutils.resize(frame, width=new_width)
The most popular option is to use OpenCV.
frame_resized = cv2.resize(arr1,(new_width, new_height))

What should be the size of input image for training a YOLOv3 Model Architecture CNN.?

I've implemented a YOLOv3 from scratch and I plan to fine-tune using MS-COCO weights for some different data.
The dataset I've chosen has images of 720*1280 size.
When I go through the YOLOv3 paper, 1st CONV2d layer is there with filter_size =3 and stride = 1, and output size is 256*256....
Can someone give me a walkthrough for how YOLO training part works in here?
From Yolov3 paper:
If best possible accuracy/mAP is what you want then use 608 x 608 as input layer size in the config.
If you want good inference/speed at the cost of accuracy then use, 320 x 320
If balanced model is what you want then use 416 x 416
Note that first layer automatically resizes your images to the size of first layer in Yolov3 CNN, so you need not convert your 1280 x 720 images to the input layer size.
Suggest you to read following things:
To understand how Yolov3 works, read this blog post.
To understand some basic stuff read from original site
Learn how to train your custom object detector here

How to use ImageDataGenerator with multi-label masks for multi-class image segmentation?

In order to do multiclass segmentation the masks need to be one-hot-encoded. For example if I have a 100 images of shape 224x224x3 with 5 different classes I would have a set of masks with shape (100, 224, 224, 5) i.e the last dimension (the channel) refers to the class of the pixel. Take a grayscale masks that contains 6 classes where each pixel has the label 1-6, I can easily convert this to the categorical mask I need using tf.keras.utils.to_categorical.
If I use the ImageDataGenerator provided with keras I know I can create a generator for both images and masks then zip them together for the problem (as code shows below) but where i'm confused is how do I convert the masks into this categorical one-hot-encoded structure whilst using the ImageDataGenerator? The ImageDataGenerator only finds files in directories that are saved as images therefore I can't convert the masks and then save them down as numpy arrays (the one-hot-encoded masks) for the generator to pick up, as images can't have that have more than 4 channels right? Is there somehow of telling the generator to do this conversion? Or does this therefore limit the number of classes I can have in my problem?
One solution is to write my own custom generator with the sequence class which I have done but I'm keen on understanding if this is possible to do with Keras inbuilt ImageDataGenenerator? Could writing my a lambda layer on the network be the solution?
mask_categorical = tf.keras.utils.to_categoricl(mask) #converts 224x224 grayscale mask to one-hot encoding version
imgDataGen = ImageDataGenerator(rescale=1/255.)
maskDataGen = ImageDataGenerator()
imageGenerator =imageDataGen.flow_from_directory("dataset/image/",
class_mode=None, seed=40)
maskGenerator = maskDataGen.flow_from_directory("dataset/mask/",
class_mode=None, seed=40)
trainGenerator = zip(imageGenerator, maskGenerator)

How multiple images are processed in CNN

In normal ANN each training sample is represented by a row of the matrix and in that way batches of training data can be processed but in CNN how multiple images are processed.
The same with ANN, you can stack up the images to n-dimensions tensor to be processed.
For CNNs that are trained on images, for example, say your dataset is RGB (3-channel) images that are 256x256 pixels. A single image can be represented by a 3 x 256 x 256 matrix. If you set your batch size to be 10, that means you’re concatenating 10 images together into a 10 x 3 x 256 x 256 matrix.
Tuning the batch size is one of the aspects of getting training right - if your batch size is too small, then there will be a lot of variance within a batch, and your training loss curve will bounce around a lot. But if it’s too large, your GPU will run out of memory to hold it, or training will progress too slowly to see if it’s the optimization is diverging early on.

Tensorflow: Convolve each image with a different kernel

In TensorFlow, how do I convolve each image in a minibatch with a different 2D kernel? Each minibatch of images has size [10000, 32, 32] and the corresponding filter has size [10000, 2, 2]---10000 kernels, each 2 pixels x 2 pixels. I'd like to get output with size [10000, 31, 31]. (I plan to set the stride lengths all to 1 and to use the "VALID" option to turn off padding, so the output images would have size 31x31 while the input images have size 32x32.)
In a related question, the solution was to add a "depth" dimension to the minibatch of images, and then to use conv3d rather than conv2d. But in that problem, the op seemed content to get just one image back as output, rather than one image as output for each sample in the minibatch.
Ah, the tf.nn.depthwise_conv2d function does exactly what I wanted. I don't think there was any way to use conv2d or conv3d for the task.

Resources