Load .gif and retrieve physical dimensions - python-3.x

I am trying to load a .gif file and find the physical dimensions of entities in the file.
i.e I want to find the volume occupied by each cell in the 3D volume.
gif source
One could do the following to get the frames in GIF.(code ref.)
from PIL import Image, ImageSequence
img = Image.open(filename)
frames = []
for frame in ImageSequence.Iterator(img):
a = np.array(frame.convert('RGB').getdata(), dtype=np.uint8)
a = a.reshape(frame.size[1], frame.size[0],3)
frames.append(Picture(a))
return frames
I am not sure what has to be done next.
Could someone please offer some suggestions?

To put it simply you can NOT find the exact volume occupied by each cell in the 3D volume. It is impossible without the 3D object raw data. (just one example of this that there are multiple cells inside the object that you cant see clearly with human eye, so you cant get their data from a picture)
You maybe able to make a complicated algorithm that can get a rough estimation of the volume of the whole object, but it will be very difficult and the accuracy will be low, because there are multiple factors (for example you cant predect if the object is hollow or if it has holes inside it)

As Jabbar mentions, you won't be able to get an exact value, but with some computer vision processing, you should be able to get the voxel dimensions, and if you know the scale of the image, you should be able to scale that value to a physical volume.
First you need to run edge detection and some kind of blob detection to get the individual cells.
Generate a segmentation label for each slice. This is a 2D uint32 array which has a unique number for each entity (cell) you want the volume of.
With your per-layer segmentation labels, you need to correlate the label IDs of the same cell across multiple slices. This will probably be the hardest part, but it's probably ok if it isn't perfect.
Once you have a segmentation mask for each cell in each frame, you can generate a 3D segmentation mask - a 3D array for each cell, which is a boolean mask (True where the cell volume is, False elsewhere)
Sum that array to get the volume (in voxels) of the cell
Scale your voxel volume by the ratio of pixel width and slice depth to physical distance.

Related

how to compare contents of two images?

I have been struggling to find a proper image comparison technique. I have a directory in my system with couple of pictures and I am trying to recreate those pictures with same same objects, same lighting and same camera position. I want to know whether the correct camera frame is same as mentioned reference image.
for example, assume, we have a camera mounted in a fixed position, we took a picture using that camera and stored that picture with named 'reference.jpg', now when i run this image comparison algorithm, without changing the camera orientation or any of the surroundings, the algorithm should return the correlation between the referenced image and the current frame, in this scenario it must return something like 1 as nothing is changed and everything is same.
Until now i have been using SSIM technique, although the precision of the technique is very bad, for example if i take a picture and then run SSIM technique in a loop, the deviation between the correlation foctor is very high somewhere like it says 0.72 or so which is very bad for precision.

automatically repairing stuck pixels in scan data

I'm working on developing software for an electron microscope.This works by focusing a beam of electrons at a specific part of the sample and then recording the image using a sensor.
The scan data is saved as a 4D array where the first 2 dimensions are the location (x and y) at which the beam was focused and the other 2 dimensions are the raw sensor output which is a 2D image.
while analyzing the data, I realized that there are some stuck pixels which I would ideally be able to repair automatically via software.
Here is an example:
As you can see, the data shape is 256,256,256,256 which means we scanned 256x256 points and the sensor data is a 256x256 image.
On the right data browser window (called nav), you can see the scan location which is 0,0 (also marked on the left window by scanY and scanX). I drew circles around a few of the stuck pixels. here is another scan location for reference:
I can automatically detect these pixels by unraveling the sensor data and checking for locations where the scan values are always the same, but I'm not sure how to repair these.
My first guess was reading the data from all the pixels that are next to this pixel and averaging them, then storing the average value instead of the stuck value, but I'm not sure if this is a good approach.
How do professional software such as Photoshop "repair" or "hide" a defect in the picture? Are there any known algorithms for this issue? I did a bit of searching but didn't manage to find much.

What is a deep frame buffer?

In a real-time graphics application, I believe a frame buffer is the memory that holds the final rasterised image that will be displayed for a single frame.
References to deep frame buffers seem to imply there's some caching going on (vertex and material info), but it's not clear what this data is used for, or how.
What specifically is a deep frame buffer in relation to a standard frame buffer, and what are its uses?
Thank you.
Google is your friend.
It can mean two things:
You're storing more than just RGBA per pixel. For example, you might be storing normals or other lighting information so you can do re-lighting later.
Interactive Cinematic Relighting with Global Illumination
Deep Image Compositing
You're storing more than one color and depth value per pixel. This is useful, for example, to support order-independent transparency.
A z buffer is similar to a color buffer which is usually used to store the "image" of a 3D scene, but instead of storing color information (in the form a 2D array of rgb pixels), it stores the distance from the camera to the object visible through each pixel of the framebuffer.
Traditionally, z-buffer only sore the distance from the camera to the nearest object in the 3D for any given pixel in the frame. The good thing about this technique is that if 2 images have been rendered with their z-buffer, then they can be re-composed using a 2D program for instance, but pixels from the image A which are in "front" of the pixels from image "B", will be composed on top of the re-composed image. To decide whether these pixels are in front, we can use the information stored in the images' respective z-buffer. For example, imagine we want to compose pixels from image A and B at pixel coordinates (100, 100). If the distance (z value) stored in the z-buffer at coordinates (100, 100) is 9.13 for image A and 5.64 for image B, the in the recomposed image C, at pixel coordinates (100, 100) we shall put the pixel from the image B (because it corresponds to a surface in the 3D scene which is in front of the object which is visible through that pixel in image A).
Now this works great when objects are opaque but not when they are transparent. So when objects are transparent (such as when we render volumes, clouds, or layers of transparent surfaces) we need to store more than one z value. Also note, that "opacity" changes as the density of the volumetric object or the number of transparent layers increase. Anyway, just to say that a deep image or deep buffer is technically just like a z-buffer but rather than storing only one depth or z values it stores not only more than one depth value but also stores the opacity of the object at each one of these depth value.
Once we have stored this information, it is possible in post-production to properly (that is accurately) recompose 2 or more images together with transparencies. For instance if you render 2 clouds and that these clouds overlap in depth, then their visibility will be properly recomposed as if they had been rendered together in the same scene.
Why would we use such technique at all? Often because rendering scenes containing volumetric elements is generally slow. Thus it's good to render them seprately from other objects in the scene, so that if you need to make tweaks to the solid objects you do not need to re-render the volumetrics elements again.
This technique was mostly made popular by Pixar, in the renderer they develop and sell (Prman). Avatar (Weta Digital in NZ) was one of the first film to make heavy use of deep compositing.
See: http://renderman.pixar.com/resources/current/rps/deepCompositing.html
The cons of this technique: deep images are very heavy. It requires to store many depth values per pixels (and these values are stored as floats). It's not uncomon for such images to be larger than a few hundred to a a couple of gigabytes depending on the image resolution and scene depth complexity. Also you can recompose volume object properly but they won't cast shadow on each other which you would get if you were rendering objects together in the same scene. This make scene management slightly more complex that usual, ... but this is generally dealt with properly.
A lot of this information can be found on scratchapixel.com (for future reference).

Counting foreground objects in a binary image

I have an image sequence (video). I would like to count the number of objects in the image sequence. But the main objective is to count them once, meaning not just in each and every frame, since an object may exist in for several frames. My idea is to count the objects as they exit the screen, because of less occlusions. I am thinking of doing this by scanning the bottom part of the image for non zero pixels.
I have a CV_FILLED binary image (from rectangle function) where I want to do the scanning, then create an instance on an object if abject is found. But this scanning will not be scanning each and every pixel along the horizontal line, just certain sections.
Like we could do it over ranges, say certain columns, then skip by a margin.
A sample binary image I have is attached . This is an image obtained from the feed. I do not want to count only the objects in this image, but also those that are still coming.
A full picture of detected objects is attached here.Your guidance or constructive criticism is welcome
* I do not want to use CVBlob
If you don't want to use cvBlobLib, you could use the contour detection that is part of OpenCV.
There is a tutorial on the website.
The doc for the method is here. Your image seem pretty simple, but if you get blobs with occlusions and so you want to look at the CV_RETR_EXTERNAL constant to get only the outer contours.
That is what I usualy use, even though it needs a bit more work to use the results of the method.
Hope this helps.
If the squares do not overlap at the bottom, I suggest the following:
scan the very bottom row of the image and identify those connected pixels which are white. Each white line will correspond to one square. Save the center of the white line segment and its length. In the next frame, do the same and associate the corresponding line segments to the previous (same length and center very close). When you cannot find a corresponding line segment anymore, the square has moved out of the image which means you can increase your squares counter by one. Note that line segments at the right and left ends of the line will have decreasing length with every frame.
Thx guys. I managed to solve this already. I used small ROIs along the paths of the squares, and found countNonZero() within the ROI.
I kept on checking with boolean variables to see if the ROI still had the white pixels. If not, incremented counter. Worked well, and I was able to count.
Thx for your input...

Brightness and contrast in color image

Does, anyone know, how I can change brightness and contrast of color image. I know about vtkImageMapToWindowLevel, but after setting level or window of image in this class, the color image becomes grayscale.
Thanks for answers;
By definition, a color image is already color mapped, and you cannot change the brightness/contrast of the image without decomposition and recomposition.
First, define a pair of numbers called brightness and contrast in whatever way you want. Normally, I'd take brightness as the maximum value, and contrast as the ratio between minimum and maximum. Similarly, if you want to use Window/Level semantics, "level" is the minimum scalar value, and window is the difference between maximum and minimum.
Next, you find the scalar range - the minimum and maximum values in your desired output image, using the brightness and contrast. If you're applying brightness/contrast, the scalar range is:
Maximum = brightness
Minimum = Maximum / contrast
Assume a color lookup table (LUT), with a series of colors at different proportional values, say, in the range of 0 to 1. Now, since we know the brightness and contrast, we can setup the LUT with the lower value (range 0) mapping to "minimum" and the upper value (range 1) mapping to "maximum". When this is done, a suitable class, like vtkImageMapToColors can take the single-component input and map it to a 3 or 4 component image.
Now, all this can happen only for a single-component image, as the color LUT classes (vtkScalarsToColors and related classes) are defined only on single-component images.
If you have access to the original one-component image, and you're using vtkImageMapToColors or some similar class, I'd suggest handling it at that stage.
If you don't, there is one way I can think of:
Extract the three channels as three different images using vtkImageExtractComponents (you'll need three instances, each with the original image as input).
Independently scale the 3 channels using vtkImageShiftScale (shift by brightness, scale by contrast)
Combine the channels back using vtkImageAppendComponents
Another possibility is to use vtkImageMagnitude, which will convert the image back to grey-scale (by taking the magnitude of the three channels together), and re-applying the color table using vtkImageMapToColors and any of the vtkScalarsToColors classes as your lookup table.
The first method is better if your image is a real photograph or something similar, where the colors are from some 3-component source, and the second would work better if your input image is already using false colors (say an image from an IR camera, or some procedurally generated fractal that's been image mapped).

Resources