Is it normal for your gpu device not to support three channel formats? - graphics

I've noticed that the GPU I'm running on Vulkan doesn't support so many of the R8G8B8_UINT formats but does with the R8G8B8A8 FORMATS. Also the same thing with many others like R32G32B32_SFLOAT. I've noticed that's also the with other GPUs I've seen on OpenGPU database.
Is this normal? Why is this so? Is it normal with other graphics APIs? Is it to align values/texels to a "round/nicer/aligned" number of bytes? How are you supposed to get around this? I'm having trouble seeing how throughout your code you'll be interacting with images that you have no idea what colour format they are, which complicates things both on host code and in shaders.
Also if I have 3 channel colour format image on the host and I want to use with Vulkan and say R8G8B8 or R32G32B32, do I need to loop through the image manually and rearrange the texels?

24bpp formats are hard to optimize in graphics hardware. This is why there is no "R8G8B8" format even defined for the DirectX Graphics Infrastructure pixel formats used for Direct3D 10, 11.x, and 12. Almost all the work is around optimizing 32bpp formats which is why RGBA32, BGRA32, etc. are much more commonly supported. 64bpp and 128bpp formats are multiples of 32-bits, so it makes just the 24bpp format the 'odd-man out' in many cases. You often can find a B8G8R8X8 (32-bit) format which is about as close as you get to 24bpp.
Some hardware will find ways to support 24bpp formats (which were supported by older Direct3D releases for example), but it's generally less efficiently rendered.
Similar issues arise with 96bpp formats (R32G32B32 floating-point). For Direct3D Hardware Feature Levels, this format is always optional and when implemented it's typically done as three 32-bit floating-point planes, one for each color channel.

Related

How to know which Metal texture format to use? short or half?

Apple's Metal examples alternate between using texture2D<float> and texture2D<half>, and I believe that the default pixel format of a MTKView is bgra8unorm. What determines if I should use float or half? Do I need to specify something on the CPU, or will textures using 128-bits-in-all float4s be converted into halfs automatically? How about the other way around if I pass-in a texture with a bgra8format? I am asking because I am trying to load textures using MTKTextureLoader as well as from plain byte data, and I'm not sure what format to use for the plain byte data so things are consistent. May I have clarification?
It really depends on your use case.
Loading: You can probably safely load your data into a texture with the same format as that data. When your render destination has a different format, Metal will perform the conversion for you.
Intermediates: The format of intermediate textures should really depend on the "resolution" (as in "number of bits") you need, which depends on the input data and the color space. If you only handle sRGB data, 8-bit textures are probably enough (unless you do some complicated processing that requires a higher precision). If you want to support a wide gamut (e.g. in Display P3 color space), you need more precision (half should be fine) and also want to be able to store values outside of the [0...255] range. On iOS I'd recommend using half for memory efficiency (and since most devices don't support full float anyway), on macOS float is the default, I think.
View: The pixel format of the view should really depend on the display. Most of the screens support the Display P3 color space now. For that, you should use the bgra10_xr format, since it's optimized for that case. Otherwise bgra8unorm is fine.
In general, you should be using a texture format with the smallest memory footprint that fits your use case.

what is the equivalent of the DirectDraw Surface (DDS) format for opengl on linux?

DDS format has been made for directX right ? so it's should be optimized for it and not for openGL I guess.
So, there is another format(s) ? if yes, what format is a good choice ? what reason(s) ?
also, since I'm working on linux, I'm also concerned by making textures on linux. So I need a format who can be imported/exported by gimp.
The DDS format is useful for storing compressed textures. If you store the file in the same compression as it will be stored in the GPU memory, you don't need to decode and re-encode for GPU storage, instead you can just move it directly to memory.
The DDS format is basically used to store S3 Texture Compression data. The internal DDS formats DTX3 and DTX5 are for example S3TC formats that are also supported by OpenGL:
http://www.opengl.org/wiki/S3_Texture_Compression
DDS also can store pre-calculated mipmaps. Again this is a trade-off (larger file size) for reducing loading times, as the mipmaps could also be calculated at loading time.
As you can see, if you have the right code to parse the DDS file, e.g. the payload will be taken in its compressed form and not decoded on the host machine, then it is perfectly fine to use a DDS.
For an alternative, #CoffeeandCode pointed out the KTX format in his answer. These files use a different compression algorithm (see here). The advantage is that this compression is mandatory in newer OpenGL versions, while S3TC compression was always only available as an extension (and has patent problems). I don't know how they compare in quality and if you can expect OpenGL 4.3 on your target platforms.
My Take: If you are targeting recent hardware and OpenGL support (OpenGL ES 3 or OpenGL 4.3), you should use the KTX format and respective texture formats (libktx will generate the texture objects for you). If you need to be able to run on older hardware or happen to already have a lot of DDS data, you should probably stick with DDS for the time being.
There is nothing particularly D3D-"optimized" about DDS.
Once you read the header correctly, the (optionally) pre-compressed data is binary compatible with S3TC. Both DDS and OpenGL's equivalent (KTX) are capable of storing cubemap arrays and mipmaps in a single image file, that is their primary appeal.
With other formats, you may wind up using the driver to compress and/or generate mipmaps, and the driver usually does a poor job quality wise. The drivers are usually designed to do this quickly because they have a lot of textures that need to be compressed / mipmapped. You can use a higher quality mipmap downsample filter / compression implementation offline since the amount of time required to complete is rather unimportant.
The key benefits of DDS / KTX are:
Offline computation of mipmaps
Offline compression
Store all layers of a texture array/cubemap
Doing (1) and (2) offline can both improve image quality and reduce the overhead of loading textures at run-time. (3) is mostly for convenience, but a welcomed one.
I think the closest equivalent to DDS for DirectX is KTX, but even DDS works fine under OpenGL once parsed.

Recommendations for real-time pixel-level analysis of television (TV) video

[Note: This is a rewrite of an earlier question that was considered inappropriate and closed.]
I need to do some pixel-level analysis of television (TV) video. The exact nature of this analysis is not pertinent, but it basically involves looking at every pixel of every frame of TV video, starting from an MPEG-2 transport stream. The host platform will be server-class, multiprocessor 64-bit Linux machines.
I need a library that can handle the decoding of the transport stream and present me with the image data in real-time. OpenCV and ffmpeg are two libraries that I am considering for this work. OpenCV is appealing because I have heard it has easy to use APIs and rich image analysis support, but I have no experience using it. I have used ffmpeg in the past for extracting video frame data from files for analysis, but it lacks image analysis support (though Intel's IPP can supplement).
In addition to general recommendations for approaches to this problem (excluding the actual image analysis), I have some more specific questions that would help me get started:
Are ffmpeg or OpenCV commonly used in industry as a foundation for real-time
video analysis, or is there something else I should be looking at?
Can OpenCV decode video frames in real time, and still leave enough
CPU left over to do nontrivial image analysis, also in real-time?
Is sufficient to use ffpmeg for MPEG-2 transport stream decoding, or
is it preferable to just use an MPEG-2 decoding library directly (and if so, which one)?
Are there particular pixel formats for the output frames that ffmpeg
or OpenCV is particularly efficient at producing (like RGB, YUV, or YUV422, etc)?
1.
I would definitely recommend OpenCV for "real-time" image analysis. I assume by real-time you are referring to the ability to keep up with TV frame rates (e.g., NTSC (29.97 fps) or PAL (25 fps)). Of course, as mentioned in the comments, it certainly depends on the hardware you have available as well as the image size SD (480p) vs. HD (720p or 1080p). FFmpeg certainly has its quirks, but you would be hard pressed to find a better free alternative. Its power and flexibility quite impressive; I'm sure that is one of the reasons that the OpenCV developers decided to use it as the back-end for video decoding/encoding with OpenCV.
2.
I have not seen issues with high-latency while using OpenCV for decoding. How much latency can your system have? If you need to increase performance, consider using separate threads for capture/decoding and image analysis. Since you mentioned having multi-processor systems, this should take greater advantage of your processing capabilities. I would definitely recommend using the latest Intel Core-i7 (or possibly the Xeon equivalent) architecture as this will give you the best performance available today.
I have used OpenCV on several embedded systems, so I'm quite familiar with your desire for peak performance. I have found many times that it was unnecessary to process a full frame image (especially when trying to determine masks). I would highly recommend down-sampling the images if you are having difficultly processing your acquired video streams. This can sometimes instantly give you a 4-8X speedup (depending on your down-sample factor). Also on the performance front, I would definitely recommend using Intel's IPP. Since OpenCV was originally an Intel project, IPP and OpenCV blend very well together.
Finally, because image-processing is one of those "embarrassingly parallel" problem fields don't forget about the possibility of using GPUs as a hardware accelerator for your problems if needed. OpenCV has been doing a lot of work on this area as of late, so you should have those tools available to you if needed.
3.
I think FFmpeg would be a good starting point; most of the alternatives I can think of (Handbrake, mencoder, etc.) tend to use ffmpeg as a backend, but it looks like you could probably roll your own with IPP's Video Coding library if you wanted to.
4.
OpenCV's internal representation of colors is BGR unless you use something like cvtColor to convert it. If you would like to see a list of the pixel formats that are supported by FFmpeg, you can run
ffmpeg -pix_fmts
to see what it can input and output.
For the 4th question only:
video streams are encoded in a 422 format: YUV, YUV422, YCbCr, etc. Converting them to BGR and back (for re-encoding) eats up lots of time. So if you can write your algorithms to run on YUV you'll get an instant performance boost.
Note 1. While OpenCV natively supports BGR images, you can make it process YUV, with some care and knowledge about its internals.
By example, if you want to detect some people in the video, just take the upper half of the decoded video buffer (it contains the grayscale representation of the image) and process it.
Note 2. If you want to access the YUV image in opencv, you must use ffmpeg API directly in your app. OpenCV force the conversion from YUV to BGR in its VideoCapture API.

How to make colours on one screen look the same as another

Given two seperate computers, how could one ensure that colours are being projected roughly the same on each screen?
IE, one screen might have 50% brightness more than another, so colours appear duller on one screen. One artist on one computer might be seeing the pictures differently to another, it's important they are seeing the same levels.
Is there some sort of callibration technique via software you can do? Any techniques? Or is a hardware solution the only way?
If you are talking about lab-critical calibration (that is, the colours on one monitor need to exactly match the colours on another, and both need to match an external reference as closely as possible) then a hardware colorimeter (with its own appropriate software and test targets) is the only solution. Software solutions can only get you so far.
The technique you described is a common software-only solution, but it's only for setting the gamma curves on a single device. There is no control over the absolute brightness and contrast; you are merely ensuring that solid colours match their dithered equivalents. That's usually done after setting the brightness and contrast so that black is as black as it can be and white is as white as it can be, but you can still distinguish not-quite-black from black and not-quite-white from white. Each monitor, then, will be optimized for its own maximum colour gamut, but it will not necessarily match any other monitor in the shop (even monitors that are the same make and model will show some variation due to manufacturing tolerances and age/use). A hardware colorimeter will (usually) generate a custom colour profile for the device under test as it is at the time of testing, and there is generally and end-to-end solution built into the product (so your scanner, printer, and monitor are all as closely matched as they can be).
You will never get to an absolute end-to-end match in a complete system, but hardware will get you as close as you can get. Software alone can only get you to a local maximum for the device it's calibrating, independent of any other device.
What you need to investigate are color profiles.
Wikipedia has some good articles on this:
https://en.wikipedia.org/wiki/Color_management
https://en.wikipedia.org/wiki/ICC_profile
The basic thing you need is the color profile of the display on which the color was seen. Then, with the color profile of display #2, you can take the original color and convert it into a color that will look as close as possible (depends on what colors the display device can actually represent).
Color profiles are platform independent and many modern frameworks support them directly.
You may be interested in reading about how Apple has dealt with this issue:
Color Programming Topics
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/DrawColor/DrawColor.html
You'd have to allow or ask the individual users to calibrate their monitors. But there's enough variation across monitors - particularly between models and brands - that trying to implement a "silver bullet" solution is basically impossible.
As #Matt Ball observes calibrating your monitors is what you are trying to do. Here's one way to do it without specialised hardware or software. For 'roughly the same' visual calibration against a reference image is likely to be adequate.
Getting multiple monitors of varying quality/brand/capabilities to render a given image the same way is simply not possible.
IF you have complete control over the monitor, video card, calibration hardware/software, and lighting used then you have a shot. But that's only if you are in complete control of the desktop and the environment.
Assuming you are just accounting for LCDs, they are built different types of panels with a host of different capabilities. Brightness is just one factor (albeit a big one). Another is simply the number of colors they are capable of rendering.
Beyond that, there is the environment that the monitor is in. Even assuming the same brand monitor and calibration points, a person will perceive a different color if an overhead fluorescent is used versus an incandescent placed next to the monitor itself. At one place I was at we had to shut off all the overheads and provide exact lamp placement for the graphic artists. Picky picky. ;)
I assume that you have no control over the hardware used, each user has a different brand and model monitor.
You have also no control over operating system color profiles.
An extravagant solution would be to display a test picture or pattern, and ask your users to take a picture of it using their mobile or webcam.
Download the picture to the computer, and check whether its levels are valid or too out of range.
This will also ensure ambient light at the office is appropiate.

Framebuffer Documentation

Is there any documentation on how to write software that uses the framebuffer device in Linux? I've seen a couple simple examples that basically say: "open it, mmap it, write pixels to mapped area." But no comprehensive documentation on how to use the different IOCTLS for it anything. I've seen references to "panning" and other capabilities but "googling it" gives way too many hits of useless information.
Edit:
Is the only documentation from a programming standpoint, not a "User's howto configure your system to use the fb," documentation the code?
You could have a look at fbi's source code, an image viewer which uses the linux framebuffer. You can get it here : http://linux.bytesex.org/fbida/
-- It appears there might not be too many options possible to programming with the fb from user space on a desktop beyond what you mentioned. This might be one reason why some of the docs are so old. Look at this howto for device driver writers and which is referenced from some official linux docs: www.linux-fbdev.org [slash] HOWTO [slash] index.html . It does not reference too many interfaces.. although looking at the linux source tree does offer larger code examples.
-- opentom.org [slash] Hardware_Framebuffer is not for a desktop environment. It reinforces the main methodology, but it does seem to avoid explaining all the ingredients necessary to doing the "fast" double buffer switching it mentions. Another one for a different device and which leaves some key buffering details out is wiki.gp2x.org [slash] wiki [slash] Writing_to_the_framebuffer_device , although it does at least suggest you might be able use fb1 and fb0 to engage double buffering (on this device.. though for desktop, fb1 may not be possible or it may access different hardware), that using volatile keyword might be appropriate, and that we should pay attention to the vsync.
-- asm.sourceforge.net [slash] articles [slash] fb.html assembly language routines that also appear (?) to just do the basics of querying, opening, setting a few basics, mmap, drawing pixel values to storage, and copying over to the fb memory (making sure to use a short stosb loop, I suppose, rather than some longer approach).
-- Beware of 16 bpp comments when googling Linux frame buffer: I used fbgrab and fb2png during an X session to no avail. These each rendered an image that suggested a snapshot of my desktop screen as if the picture of the desktop had been taken using a very bad camera, underwater, and then overexposed in a dark room. The image was completely broken in color, size, and missing much detail (dotted all over with pixel colors that didn't belong). It seems that /proc /sys on the computer I used (new kernel with at most minor modifications.. from a PCLOS derivative) claim that fb0 uses 16 bpp, and most things I googled stated something along those lines, but experiments lead me to a very different conclusion. Besides the results of these two failures from standard frame buffer grab utilities (for the versions held by this distro) that may have assumed 16 bits, I had a different successful test result treating frame buffer pixel data as 32 bits. I created a file from data pulled in via cat /dev/fb0. The file's size ended up being 1920000. I then wrote a small C program to try and manipulate that data (under the assumption it was pixel data in some encoding or other). I nailed it eventually, and the pixel format matched exactly what I had gotten from X when queried (TrueColor RGB 8 bits, no alpha but padded to 32 bits). Notice another clue: my screen resolution of 800x600 times 4 bytes gives 1920000 exactly. The 16 bit approaches I tried initially all produced a similar broken image to fbgrap, so it's not like if I may not have been looking at the right data. [Let me know if you want the code I used to test the data. Basically I just read in the entire fb0 dump and then spit it back out to file, after adding a header "P6\n800 600\n255\n" that creates the suitable ppm file, and while looping over all the pixels manipulating their order or expanding them,.. with the end successful result for me being to drop every 4th byte and switch the first and third in every 4 byte unit. In short, I turned the apparent BGRA fb0 dump into a ppm RGB file. ppm can be viewed with many pic viewers on Linux.]
-- You may want to reconsider the reasons for wanting to program using fb0 (this might also account for why few examples exist). You may not achieve any worthwhile performance gains over X (this was my, if limited, experience) while giving up benefits of using X. This reason might also account for why few code examples exist.
-- Note that DirectFB is not fb. DirectFB has of late gotten more love than the older fb, as it is more focused on the sexier 3d hw accel. If you want to render to a desktop screen as fast as possible without leveraging 3d hardware accel (or even 2d hw accel), then fb might be fine but won't give you anything much that X doesn't give you. X apparently uses fb, and the overhead is likely negligible compared to other costs your program will likely have (don't call X in any tight loop, but instead at the end once you have set up all the pixels for the frame). On the other hand, it can be neat to play around with fb as covered in this comment: Paint Pixels to Screen via Linux FrameBuffer
Check for MPlayer sources.
Under the /libvo directory there are a lot of Video Output plugins used by Mplayer to display multimedia. There you can find the fbdev (vo_fbdev* sources) plugin which uses the Linux frame buffer.
There are a lot of ioctl calls, with the following codes:
FBIOGET_VSCREENINFO
FBIOPUT_VSCREENINFO
FBIOGET_FSCREENINFO
FBIOGETCMAP
FBIOPUTCMAP
FBIOPAN_DISPLAY
It's not like a good documentation, but this is surely a good application implementation.
Look at source code of any of: fbxat,fbida, fbterm, fbtv, directFB library, libxineliboutput-fbe, ppmtofb, xserver-fbdev all are debian packages apps. Just apt-get source from debian libraries. there are many others...
hint: search for framebuffer in package description using your favorite package manager.
ok, even if reading the code is sometimes called "Guru documentation" it can be a bit too much to actually do it.
The source to any splash screen (i.e. during booting) should give you a good start.

Resources