This example allows the classification of images with scikit-learn:
http://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html
However, it is important that all the images have the same size (width and height, as written in the comments).
How can I modify this code to allow classification of images with different sizes?
You will need to define your own Feature Extraction.
In example from above, every pixel is represent a feature. If your images of different sizes, most trivial (but certainly not the best) thing that you can do is pad all images to the size of largest image with, for example, white pixels.
Here an example how to add boarders to image.
Related
This is not a generic question about anchor boxes, or Faster-RCNN, or anything related to theory. This is a question about how anchor boxes are implemented in pytorch, as I am new to it. I have read this code, along with a lot of other stuff in the torch repo:
https://github.com/pytorch/vision/blob/main/torchvision/models/detection/anchor_utils.py
Is the "sizes" argument to AnchorGenerator with respect to the original image size, or with respect to the feature map being output from the backbone?
To be more clear and simplify, let's say I'm only ever interested in detecting objects that are 32x32 pixels in my input images. So my anchor box aspect ratio will definitely be 1.0 as height=width. But, is the size that I put into AnchorGenerator 32? Or do I need to do some math using the backbone (e.g. I have 2 2x2 max pooling layers with stride 2, so the size that I give AnchorGenerator should be 32/(2^2) = 8)?
Is the "sizes" argument to AnchorGenerator with respect to the
original image size, or with respect to the feature map being output
from the backbone?
the sizes argument is the size of each bounding box applied on the input image. If you are interested in detecting objects that are 32x32 pixels, you should use
anchor_generator = AnchorGenerator(sizes=((32,),),
aspect_ratios=((1.0,),))
I am here because I am in need of some advise...
I am working with face detection. I already tried some methods like de DLIB Detector, HoG, among others...
For now, I started to use the OpenCV DNN Detection based in the ResNet .caffemodel, but after a lot of attempts I realize that this model it is not very good for images over than 300x300 (HxW).
Note that my images are 1520x2592 (HxW). When I apply the resize, almost all information of the faces are lost because the faces in the original image are about 150x150 pixels, when resized for detection using DNN their size is about 30x20 (approx.).
Some approaches I already tried:
- Split figure in sub-figures
- Background subtraction
What I need to reach:
- Fast detection
- Reduce the number of lost faces (not detected)
Challenge:
- Big image with small faces in it
- A lot of area in the image not being used (but I can't change the location of the camera)
SSD-based networks are fully convolutional that means you can vary input size. Try to pass inputs of different sizes and choose one which give satisfying performance and accuracy. There is an example here: http://answers.opencv.org/question/202995/dnn-module-face-detection-poor-results-open-cv-343/
input = blobFromImage(img, 1.0, Size(1296, 760)); // x0.5
or
input = blobFromImage(img, 1.0, Size(648, 380)); // x0.25
I have a tensor named input with dimensions 64x21x21. It is a minibatch of 64 images, each 21x21 pixels. I'd like to crop each image down to 11x11 pixels. So the output tensor I want would have dimensions 64x11x11.
I'd like to crop each image around a different "center pixel." The center pixels are given by a 2-dimensional long tensor named center with dimensions 64x2. For image i, center[i][0] gives the row index and center[i][1] gives the column index for the pixel that should be at the center in the output. We can assume that the center pixel is always at least 5 pixels away from the border.
Is there an efficient way to do this in pytorch (on the gpu)?
UPDATE: Let me clarify that the center tensor is formed by a deep neural network. It acts as a "hard attention mechanism," to use the reinforcement learning term for it. After I "crop" an image, that subimage becomes the input to another neural network. That's why I want to do the cropping in Pytorch: because the operations before and after the cropping are in Pytorch. I'd like to avoid having to transfer anything from the GPU back to the CPU.
I raised the question over on the pytorch forums, and got an answer there from smth. The grid_sample function should totally solve the problem.
https://discuss.pytorch.org/t/cropping-a-minibatch-of-images-each-image-a-bit-differently/12247
torchvision contains transforms including RandomCrop, but it doesn't seem to fit your use case if you want the images cropped in a specific way. I would recon that PyTorch, a deep learning framework, is not the appropriate tool for cropping images.
Instead, have a look at this tutorial that uses pillow. You should be able to implement your use case with this. Also have a look at pillow-simd which does some operations faster.
I am microbiology student new to computer vision, so any help will be extremely appreciated.
This question involves microscope images that I am trying to analyze. The goal I am trying to accomplish is to count bacteria in an image but I need to pre-process the image first to enhance any bacteria that are not fluorescing very brightly. I have thought about using several different techniques like enhancing the contrast or sharpening the image but it isn't exactly what I need.
I want to reduce the noise(black spaces) to 0's on the RBG scale and enhance the green spaces. I originally was writing a for loop in OpenCV with threshold limits to change each pixel but I know that there is a better way.
Here is an example that I did in photo shop of the original image vs what I want.
Original Image and enhanced Image.
I need to learn to do this in a python environment so that I can automate this process. As I said I am new but I am familiar with python's OpenCV, mahotas, numpy etc. so I am not exactly attached to a particular package. I am also very new to these techniques so I am open to even if you just point me in the right direction.
Thanks!
You can have a look at histogram equalization. This would emphasize the green and reduce the black range. There is an OpenCV tutorial here. Afterwards you can experiment with different thresholding mechanisms that best yields the bacteria.
Use TensorFlow:
create your own dataset with images of bacteria and their positions stored in accompanying text files (the bigger the dataset the better).
Create a positive and negative set of images
update default TensorFlow example with your images
make sure you have a bunch of convolution layers.
train and test.
TensorFlow is perfect for such tasks and you don't need to worry about different intensity levels.
I initially tried histogram equalization but did not get the desired results. So I used adaptive threshold using the mean filter:
th = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 3, 2)
Then I applied the median filter:
median = cv2.medianBlur(th, 5)
Finally I applied morphological closing with the ellipse kernel:
k1 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,5))
dilate = cv2.morphologyEx(median, cv2.MORPH_CLOSE, k1, 3)
THIS PAGE will help you modify this result however you want.
Can the SVM work on data with different dimensions? (using libsvm)
If the images have different size, I can resize to a standard value.
But if they have different aspect ratios, it seems not to make sense to resize without keeping the original aspect ratio.
Or shall I pad the images with zeros to make them have the same aspect ratio?
Can the SVM work on data with different dimensions?
No it can't, but you already give an answer about how to overcome that (resizing the images).
But if they have different aspect ratios, it seems not to make sense to resize without keeping the original aspect ratio. Or shall I pad the images with zeros to make them have the same aspect ratio?
Agreed, not maintaining the aspect ratio just would make the problem even harder. Usually people pre-process all the images to one aspect ratio and use letterboxing or pillarboxing when necessary.