I have an image of size [500, 500]
I also have a tensor that was generated by a method of size [N X 2]. I need to access in a differentiable way, the outputs of the indexing of the image. So an element in the [NX2] contains a pixel position as a float, and I want to access that index in the image and get the value. Assuming that pixel is within the image.
Is this actually possible?
Related
This is not a generic question about anchor boxes, or Faster-RCNN, or anything related to theory. This is a question about how anchor boxes are implemented in pytorch, as I am new to it. I have read this code, along with a lot of other stuff in the torch repo:
https://github.com/pytorch/vision/blob/main/torchvision/models/detection/anchor_utils.py
Is the "sizes" argument to AnchorGenerator with respect to the original image size, or with respect to the feature map being output from the backbone?
To be more clear and simplify, let's say I'm only ever interested in detecting objects that are 32x32 pixels in my input images. So my anchor box aspect ratio will definitely be 1.0 as height=width. But, is the size that I put into AnchorGenerator 32? Or do I need to do some math using the backbone (e.g. I have 2 2x2 max pooling layers with stride 2, so the size that I give AnchorGenerator should be 32/(2^2) = 8)?
Is the "sizes" argument to AnchorGenerator with respect to the
original image size, or with respect to the feature map being output
from the backbone?
the sizes argument is the size of each bounding box applied on the input image. If you are interested in detecting objects that are 32x32 pixels, you should use
anchor_generator = AnchorGenerator(sizes=((32,),),
aspect_ratios=((1.0,),))
strong text
I want to read image (1) whose pixel values range is 0-50, using keras generator , but when I set color_mode=‘grayscale’, or even ‘rgb’ it converts all values berween 0 and 1 as shown in figure (2). Which Arguments of ImageDataGenerator or flow_from_directory to be set so that I could get image (1) pixel values range from keras generator ?
Here is the code I am using
Data_datagen=ImageDataGenerator(rescale=1./255) #included in our dependencies
seed=2020
train_left=Data_datagen.flow_from_directory('train_left/train/',target_size=(384,512), color_mode='grayscale',batch_size=2,shuffle=True,seed=seed)
img=train_left[100][0]
img=img[0]
plt.figure()
plt.imshow(img[:,:,:])
If the pixel values are from 0-50, it already will be a strange output since RGB and grayscale both expect values on the range 0-255 for each channel. It is likely that the reason your generator is doing this, is because that is the desired way to handle images in deep learning--scaling to [0, 1]. If you do not want to rescale it you should share code so we can see what exactly you are doing.
Without code, I would suspect you need to change this value in the generator.
rescale=1./255
So I am writing custom dataset for medical images, with .nii (NIFTI1 format), but there is a confusion.
My dataloader returns the shape torch.Size (1,1,256,256,51) . But NIFTI volumes use anatomical axes, different coordinate system, so it doesn’t make any sense to permute the axes, which I normally would with volume made of 2D images each stored separately in local drive with 51 slice images (or depth), as Conv3D follows the convention (N,C,D,H,W).
so torch.Size (1,1,256,256,51) (ordinarily 51 would be the depth) doesn’t follow the convention (N,C,D,H,W) , but I should not permute the axes as the data uses entirely different coordinate system ?
In pytorch 3d convolution layer naming of the 3 dimensions you do convolution on is not really important (e.g. this layer doesn't really have a special treatment for depth compared to height). All difference is coming from kernel_size argument (and also padding if you use that). If you permute the dimensions and correspondingly permute the kernel_size parameters nothing will really change. So you can either permute your input's dimensions using e.g. x.permute(0, 1, 4, 2, 3) or continue using your initial tensor with depth as the last dimension.
Just to clarify - if you wanted to use kernel_size=(2, 10, 10) on your DxHxW image, now you can instead to use kernel_size=(10, 10, 2) on your HxWxD image. If you want all your code explicitly assume that dimension order is always D, H, W then you can create tensor with permuted dimensions using x.permute(0, 1, 4, 2, 3).
Let me know if I somehow misunderstand the problem you have.
I apologise if this is a duplicate issue, but I've been having some issues with .nsize and sys.getsizeof().
In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.
When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:
# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)
# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)
Result:
size of first image: 60066 bytes
total size of all images: 36600 bytes
Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?
I'm running Python 3.6.7.
Try running images.dtype. What does it return? If it's dtype('O'), that explains your problem: images is not a list, but is instead a Numpy array of type object, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.
Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list [] in these situations.
You may actually be best off converting images to a 4D array, as this is the only way that images.nbytes will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z) it's actually pretty straightforward:
images = np.array([a for a in images])
Now images.shape will be (n, x, y, z), where n is the total number of images. You can access the 3D array that represents the ith image by just indexing images:
image_i = images[i]
Alternatively, you can convert images to a normal Python list:
images = images.to_list()
If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:
totalsize = sum(arr.nbytes for arr in images)
For example, I have 100 pictures whose resolution is the same, and I want to merge them into one picture. For the final picture, the RGB value of each pixel is the average of the 100 pictures' at that position. I know the getdata function can work in this situation, but is there a simpler and faster way to do this in PIL(Python Image Library)?
Let's assume that your images are all .png files and they are all stored in the current working directory. The python code below will do what you want. As Ignacio suggests, using numpy along with PIL is the key here. You just need to be a little bit careful about switching between integer and float arrays when building your average pixel intensities.
import os, numpy, PIL
from PIL import Image
# Access all PNG files in directory
allfiles=os.listdir(os.getcwd())
imlist=[filename for filename in allfiles if filename[-4:] in [".png",".PNG"]]
# Assuming all images are the same size, get dimensions of first image
w,h=Image.open(imlist[0]).size
N=len(imlist)
# Create a numpy array of floats to store the average (assume RGB images)
arr=numpy.zeros((h,w,3),numpy.float)
# Build up average pixel intensities, casting each image as an array of floats
for im in imlist:
imarr=numpy.array(Image.open(im),dtype=numpy.float)
arr=arr+imarr/N
# Round values in array and cast as 8-bit integer
arr=numpy.array(numpy.round(arr),dtype=numpy.uint8)
# Generate, save and preview final image
out=Image.fromarray(arr,mode="RGB")
out.save("Average.png")
out.show()
The image below was generated from a sequence of HD video frames using the code above.
I find it difficult to imagine a situation where memory is an issue here, but in the (unlikely) event that you absolutely cannot afford to create the array of floats required for my original answer, you could use PIL's blend function, as suggested by #mHurley as follows:
# Alternative method using PIL blend function
avg=Image.open(imlist[0])
for i in xrange(1,N):
img=Image.open(imlist[i])
avg=Image.blend(avg,img,1.0/float(i+1))
avg.save("Blend.png")
avg.show()
You could derive the correct sequence of alpha values, starting with the definition from PIL's blend function:
out = image1 * (1.0 - alpha) + image2 * alpha
Think about applying that function recursively to a vector of numbers (rather than images) to get the mean of the vector. For a vector of length N, you would need N-1 blending operations, with N-1 different values of alpha.
However, it's probably easier to think intuitively about the operations. At each step you want the avg image to contain equal proportions of the source images from earlier steps. When blending the first and second source images, alpha should be 1/2 to ensure equal proportions. When blending the third with the the average of the first two, you would like the new image to be made up of 1/3 of the third image, with the remainder made up of the average of the previous images (current value of avg), and so on.
In principle this new answer, based on blending, should be fine. However I don't know exactly how the blend function works. This makes me worry about how the pixel values are rounded after each iteration.
The image below was generated from 288 source images using the code from my original answer:
On the other hand, this image was generated by repeatedly applying PIL's blend function to the same 288 images:
I hope you can see that the outputs from the two algorithms are noticeably different. I expect this is because of accumulation of small rounding errors during repeated application of Image.blend
I strongly recommend my original answer over this alternative.
One can also use numpy mean function for averaging. The code looks better and works faster.
Here the comparison of timing and results for 700 noisy grayscale images of faces:
def average_img_1(imlist):
# Assuming all images are the same size, get dimensions of first image
w,h=Image.open(imlist[0]).size
N=len(imlist)
# Create a numpy array of floats to store the average (assume RGB images)
arr=np.zeros((h,w),np.float)
# Build up average pixel intensities, casting each image as an array of floats
for im in imlist:
imarr=np.array(Image.open(im),dtype=np.float)
arr=arr+imarr/N
out = Image.fromarray(arr)
return out
def average_img_2(imlist):
# Alternative method using PIL blend function
N = len(imlist)
avg=Image.open(imlist[0])
for i in xrange(1,N):
img=Image.open(imlist[i])
avg=Image.blend(avg,img,1.0/float(i+1))
return avg
def average_img_3(imlist):
# Alternative method using numpy mean function
images = np.array([np.array(Image.open(fname)) for fname in imlist])
arr = np.array(np.mean(images, axis=(0)), dtype=np.uint8)
out = Image.fromarray(arr)
return out
average_img_1()
100 loops, best of 3: 362 ms per loop
average_img_2()
100 loops, best of 3: 340 ms per loop
average_img_3()
100 loops, best of 3: 311 ms per loop
BTW, the results of averaging are quite different. I think the first method lose information during averaging. And the second one has some artifacts.
average_img_1
average_img_2
average_img_3
in case anybody is interested in a blueprint numpy solution (I was actually looking for it), here's the code:
mean_frame = np.mean(([frame for frame in frames]), axis=0)
I would consider creating an array of x by y integers all starting at (0, 0, 0) and then for each pixel in each file add the RGB value in, divide all the values by the number of images and then create the image from that - you will probably find that numpy can help.
I ran into MemoryErrors when trying the method in the accepted answer. I found a way to optimize that seems to produce the same result. Basically, you blend one image at a time, instead of adding them all up and dividing.
N=len(images_to_blend)
avg = Image.open(images_to_blend[0])
for im in images_to_blend: #assuming your list is filenames, not images
img = Image.open(im)
avg = Image.blend(avg, img, 1/N)
avg.save(blah)
This does two things, you don't have to have two very dense copies of the image while you're turning the image into an array, and you don't have to use 64-bit floats at all. You get similarly high precision, with smaller numbers. The results APPEAR to be the same, though I'd appreciate if someone checked my math.