I need to find a way to acquire data from a picture for a new project I am trying to do. This involves tracking eye movements.
Have you checked out OpenCV? If that's not an option, consider any number of the image libraries available in Python -- just about all of them have a number of encoding representations, such as RGBA, HSV, Bayer, and LVU. Note that the four encodings that I've mentioned has different channels are are uncompressed, which would give you the full data from each frame you're analyzing.
Related
I am experimenting with video classification using Keras in Cloud ML Engine. My dataset consists in video sequences saved as separate images (eg. seq1_frame1.png, seq1.frame2.png...) which I have uploaded to a GCS bucket.
I use a csv file referencing the start of end frames of different subclips, and a generator which feeds batch of clips to the model. The generator is responsible for loading frames from the bucket, reading them as images, and concatenating them as numpy arrays.
My training is fairly long, and I suspect the generator is my bottleneck due to the numerous reading operations.
In the exemples I found online, people usually save pre-formatted clips as tfrecords files directly to GCS. I feel like this solution isn't ideal for very large datasets as it implies duplicating the data, even more so if we decide to extract overlapping subclips.
Is there something wrong in my approach ? And more importantly, is there a "golden-standard" for using large video datasets for machine learning ?
PS : I explained my setup for reference, but my question is not bound to Keras, generators or Cloud ML.
In this, you are almost always going to be trading time for space. You just have to work out which is more important.
In theory, for every frame, you have height*width*3 bytes. That's assuming 3 colour channels. One possible way you could save space is to use only one channel (probably choose green, or, better still, convert your complete dataset to greyscale). That would reduce your full size video data to one third size. Colour data in video tends to be at a lower resolution than luminance data so it might not affect your training, but it depends on your source files.
As you probably know, .png is a lossless image compression. Every time you load one, the generator will have to decompress first, and then concatenate to the clip. You could save even more space by using a different compression codec, but that would mean every clip would need full decompression and probably add to your time. You're right, the repeated decompression will take time. And saving the video uncompressed will take up quite a lot of space. There are places you could save space, though:
reduce to greyscale (or green scale as above)
temporally subsample frames (do you need EVERY consecutive frame, or could you sample every second one?)
do you use whole frames or just patches? Can you crop or rescale the video sequences?
are you using optical flow? It's pretty processor intensive, consider it as a pre-processing step, too, so you only have to do it once per clip (again this is trading space for time)
I'm trying to build a live gif, just for kicks, and I want to turn a 2D array of pixel data into a gif (or more specifically one frame of an animated gif). I found gifencoder and it works but it's slow as molasses (~800ms to encode a 500x500px gif). Every other solution I can find (e.g. things built on graphicsmagick or imagemagick) don't seem to have a way to accept input streams, but just already encoded images. I suppose I could just dump data to a .bmp, but that's a very roundabout way to accomplish this. The other thing I'm thinking is just lzw encoding the data but before I go digging into the technical aspects of that I'm just fishing here for other ideas.
I am doing some studies on eye vascularization - my project contains a machine which can detect the different blood vessels in the retinal membrane at the back of the eye. What I am looking for is a possibility to segment the picture and analyze each segmentation on it`s own. The Segmentation consist of six squares wich I want to analyze separately on the density of white pixels.
I would be very thankful for every kind of input, I am pretty new in the programming world an I actually just have a bare concept on how it should work.
Thanks and Cheerio
Sam
Concept DrawOCTA PICTURE
You could probably accomplish this by using numpy to load the image and split it into sections. You could then analyze the sections using scikit-image or opencv (though this could be difficult to get working. To view the image, you can either save it to a file using numpy, or use matplotlib to open it in a new window.
First of all, please note that in image processing "segmentation" describes the process of grouping neighbouring pixels by context.
https://en.wikipedia.org/wiki/Image_segmentation
What you want to do can be done in various ways.
The most common way is by using ROIs or AOIs (region/area of interest). That's basically some geometric shape like a rectangle, circle, polygon or similar defined in image coordinates.
The image processing is then restricted to only process pixels within that region. So you don't slice your image into pieces but you restrict your evaluation to specific areas.
Another way, like you suggested is to cut the image into pieces and process them one by one. Those sub-images are usually created using ROIs.
A third option which is rather limited but sufficient for simple tasks like yours is accessing pixels directly using coordinate offsets and several nested loops.
Just google "python image processing" in combination with "library" "roi" "cropping" "sliding window" "subimage" "tiles" "slicing" and you'll get tons of information...
I'm using MS Deep Zoom Composer to generate tiled image sets for megapixel sized images.
Right now I'm preparing a densely detailed black and white linedrawing.
The lack of gamma correction during resizing is very apparent;
while zooming the tiles appear to become brighter on higher zoom levels.
This makes the boundaries between tiles quite apparent during the loading stage.
While it does not in any way hurt usability it is a bit unsightly.
I am wondering if there are any alternatives to Deep Zoom Composer that do gamma correct resizing?
The vips deepzoom creator can do this.
You make a deepzoom pyramid like this:
vips dzsave somefile.tif pyr_name
and it'll read somefile.tif and write pyr_name.dzi and pyr_name_files, a folder containing the tiles. You can use a .zip extension to the pyramid name and it'll directly write an uncompressed zip file containing the whole pyramid --- this is a lot faster on Windows. There's a blog post with some more examples and explanation.
To make it shrink gamma corrected, you need to move your image to a linear colourspace for saving. The simplest is probably scRGB, that is, sRGB with linear light. You can do this with:
vips colourspace somefile.tif x.tif scrgb
and it'll write x.tif, an scRGB float tiff.
You can run the two operations in a single command by using .dz as the output file suffix. This will send the output of the colourspace transform to the deepzoom writer for saving. The deepzoom writer will use .jpg to save each tile, the jpeg writer knows that jpeg files can only be RGB, so it'll automatically turn the scRGB tiles back into plain sRGB for saving.
Put that all together and you need:
vips colourspace somefile.tif mypyr.dz scrgb
And that should build a pyramid with a linear-light shrink.
You can pass options to the deepzoom saver in square brackets after the filename, for example:
vips colourspace somefile.tif mypyr.dz[container=zip] scrgb
The blog post has the details.
update: the Windows binary is here, to save you hunting. Unzip somewhere, and vips.exe is in the /bin folder.
pamscale1 of the netpbm suite is quite well known not to screw up scaled images as you describe. It uses gamma correction instead of ill-concieved "high-quality filters" and other magic used to paper over incorrect scaling algorithms.
Of course you will need some scripting - it's not a direct replacement.
We maintain a list of DZI creation tools here:
http://openseadragon.github.io/examples/creating-zooming-images/
I don't know if any of them do gamma correction, but some of them might not have that issue to begin with. Also, many of them come with source, so you can add the gamma correction in yourself if need be.
i am trying to read an image with ITK and display with VTK.
But there is a problem that has been haunting me for quite some time.
I read the images using the classes itkGDCMImageIO and itkImageSeriesReader.
After reading, i can do two different things:
1.
I can convert the ITK image to vtkImageData using itkImageToVTKImageFilter and the use vtkImageReslicer to get all three axes. Then, i use the classes vtkImageMapper, vtkActor2D, vtkRenderer and QVTKWidget to display the image.
In this case, when i display the images, there are several problems with colors. Some of them are shown very bright, others are so dark you can barely see them.
2.
The second scenario is the registration pipeline. Here, i read the image as before, then use the classes shown in the ITK Software Guide chapter about registration. Then i resample the image and use the itkImageSeriesWriter.
And that's when the problem appears. After writing the image to a file, i compare this new image with the image i used as input in the XMedcon software. If the image i wrote ahs been shown too bright in my software, there no changes when i compare both of them in XMedcon. Otherwise, if the image was too dark in my software, it appears all messed up in XMedcon.
I noticed, when comparing both images (the original and the new one) that, in both cases, there are changes in modality, pixel dimensions and glmax.
I suppose the problem is with the glmax, as the major changes occur with the darker images.
I really don't know what to do. Does this have something to do with color level/window? The most strange thing is that all the images are very similar, with identical tags and only some of them display errors when shown/written.
I'm not familiar with the particulars of VTK/ITK specifically, but it sounds to me like the problem is more general than that. Medical images have a high dynamic range and often the images will appear very dark or very bright if the window isn't set to some appropriate range. The DICOM tags Window Center (0028, 1050) and Window Width (0028, 1051) will include some default window settings that were selected by the modality. Usually these values are reasonable, but not always. See part 3 of the DICOM standard (11_03pu.pdf is the filename) section C.11.2.1.2 for details on how raw image pixels are scaled for display. The general idea is that you'll need to apply a linear scaling to the images to get appropriate pixel values for display.
What pixel types do you use? In most cases, it's simpler to use a floating point type while using ITK, but raw medical images are often in short, so that could be your problem.
You should also write the image to the disk after each step (in MHD format, for example), and inspect it with a viewer that's known to work properly, such as vv (http://www.creatis.insa-lyon.fr/rio/vv). You could also post them here as well as your code for further review.
Good luck!
For what you describe as your first issue:
I can convert the ITK image to vtkImageData using itkImageToVTKImageFilter and the use vtkImageReslicer to get all three axes. Then, i use the classes vtkImageMapper, vtkActor2D, vtkRenderer and QVTKWidget to display the image.
In this case, when i display the images, there are several problems with colors. Some of them are shown very bright, others are so dark you can barely see them.
I suggest the following: Check your window/level in VTK, they probably aren't adequate to your images. If they are abdominal tomographies window = 350 level 50 should be a nice color level.