Fast pixel drawing library

Fast pixel drawing library - linux

My application produces an "animation" in a per-pixel manner, so i need to efficiently draw them. I've tried different strategies/libraries with unsatisfactory results especially at higher resolutions.
Here's what I've tried:
SDL: ok, but slow;
OpenGL: inefficient pixel operations;
xlib: better, but still too slow;
svgalib, directfb, (other frame buffer implementations): they seem perfect but definitely too tricky to setup for the end user.
(NOTE: I'm maybe wrong about these assertions, if it's so please correct me)
What I need is the following:
fast pixel drawing with performances comparable to OpenGL rendering;
it should work on Linux (cross-platform as a bonus feature);
it should support double buffering and vertical synchronization;
it should be portable for what concerns the hardware;
it should be open source.
Can you please give me some enlightenment/ideas/suggestions?

Are your pixels sparse or dense (e.g. a bitmap)? If you are creating dense bitmaps out of pixels, then another option is to convert the bitmap into an OpenGL texture and use OpenGL APIs to render at some framerate.
The basic problem is that graphics hardware will be very different on different hardware platforms. Either you pick an abstraction layer, which slows things down, or code more closely to the type of graphics hardware present, which isn't portable.

I'm not totally sure what you're doing wrong, but it could be that you are writing pixels one at a time to the display surface.
Don't do that.
Instead, create a rendering surface in main memory in the same format as the display surface to render to, and then copy the whole, rendered image to the display in a single operation. Modern GPU's are very slow per transaction, but can move lots of data very quickly in a single operation.

Looks like you are confusing window manager (SDL and xlib) with rendering library (opengl).
Just pick a window manager (SDL, glut, or xlib if you like a challenge), activate double buffer mode, and make sure that you got direct rendering.
What kind of graphical card do you have? Most likely it will process pixels on the GPU. Look up how to create pixel shaders in opengl. Pixel shaders are processing per pixel.

Related

What is the difference between filing the screen with pixels and graphics like OpenGL?

I've just started looking to graphics tools and how it could be made to be faster without compromising performance and then this question came to mind: what is the difference of filling the screen pixels with some colors coordinated by some optimized code to deal with "graphics" versus the actual 2D/3D graphics tools such as OpenGL, Unity, etc.?
What I want to mean by that is the following: I was seeing this video about .kkrieger fps game that used only 96kB of memory. Of course it produced huge amount of need for CPU and GPU to perform well. But what if instead of compact size files, there was a way to actually do some nice graphics with great performance without a need for a very expensive CPU+GPU combo? Is that possible anyway?

Copy D3D11 Texture2D to D2D1 Bitmap

I have a D3D11 device created, windows 10, latest edition, and a ID3D11Texture2D * created, GPU memory. I want to get the contents of this Texture2D stretched and drawn onto a region of an HWND. I don't want to use vertex buffers, I want to use "something else". I don't want to copy the bits down to the CPU, then bring them back up to GPU again. StretchDIBits or StretchBlt would be way too slow.
Let's pretend I want to use D2D1... I need a way to get my D3D11 texture2D copied or shared over to D2D1. Then, I want to use a D2D1 render target to stretch blit it to the HWND.
I've read the MS documents, and they don't make a lot of sense.
ideas?

If you already have a ID3D11Texture, why aren't you just using Direct3D to render it to a texture? That's what the hardware is designed to do very fast with high quality.
The DirectX Tool Kit SpriteBatch class is a good place to start for general sprite rendering, and it does indeed make use of VBs, shader, etc. internally.
Direct2D is really best suited to scenarios where you are drawing classic vector/presentation graphics, like circles, ellipses, arcs, etc. It's also useful as a way to use DirectWrite for high-quality, highly scalable fonts. For blitting rectangles, just use Direct3D which is what Direct2D has to use under the covers anyhow.
Note that if you require Direct3D Hardware Feature Level 10.0 or better, you can use a common trick which relies on the Vertex_IDin the vertex shader, so you can self-generate the geometry without any need for a VB or IB. See this code.

How does GPU programming differ from usage of the graphics card in games?

One way of doing GPU programming is OpenCL, which will work with parallelized, number-crunching operations.
Now think of your favorite 3D PC game. When the screen renders, what's going on? Did the developers hand-craft an OpenCL kernel (or something like it), or are they using pre-programmed functions in the graphics card?
Sorry to make this sound like a homework problem, I couldn't think of a better way to ask it.

H'okay, so, I'ma answer this in terms of history. Hopefully that gives a nice overview of the situation and lets you decide how to proceed.
Graphics Pipeline
3D graphics have an almost set-in-stone flow of calculations. You start with your transformation matrices, you multiply out your vertex positions (maybe generate some more on the fly), figure out what your pixels ought to be colored, then spit out the result. This is the (oversimplified) gist of 3D graphics. To change anything in it, you just twiddle one aspect of the pipeline a bit with a 'shader', i.e. little programmable elements with defined inputs and outputs so that they could be slotted into the pipeline.
Early GPGPU
Back when GPGPU was still in its infancy, the only way people had access to the massively parallel prowess of the GPU was through graphics shaders. For example, there were fragment shaders, which basically calculated what colors should be on each pixel of the screen (I'm kind of oversimplifying here, but that's what they did).
So, for example, you might use a vertex shader to chuck data about the screen before reducing a bunch of values in the fragment shader by taking advantage of color blending (effectively making the tricky transformation of mathematical problem space to... well, color space).
The gist of this is that old GPGPU stuff worked within the confines of 3D graphics, using the same 'pre-programmed functions in the graphics card' that the rest of the 3D graphics pipeline used.
It was painful to read, write, and think about (or at least, I found it so painful that I was dissuaded).
CUDA and OpenCL and [all of the other less popular GPGPU solutions]
Then some folks came along and said, "Wow, this is kind of dumb - we're stuck in the graphics pipeline when we want to be doing more general calculations!"
Thus GPGPU escaped from the confines of the graphics pipeline, and now we have OpenCL and CUDA and Brook and HSA and... Well, you get the picture.
tl;dr
The difference between GPGPU kernels and 3D graphics kernels are that the latter are stuck in a pipeline with (convenient) constraints attached to them, while the former have far more relaxed requirements, the pipeline is defined by the user, and the results don't have to be attached to a display (although they can be if you're masochistic like that).

When you run a game there may be two distinct systems operating on your GPU:
OpenGL renders images to your screen (graphics)
OpenCL does general-purpose computing tasks (compute)
OpenGL is programed with shaders. OpenCL is programmed with kernels.
If you would like to learn in more detail how games work on the GPU, I recommend reading about OpenCL, OpenGL, and game engine architecture.

Why do we still use Fixed function blending operations in D3D11 ect?

I was looking aground trying to understand why we are still using fixed function blending modes in newer 3D API's (like D3D11). In D3D10 fixed function Alpha Clipping was removed in favor of using the shaders. Why because its a much more powerful approach to almost any situation.
So why then can we not calculate or own blending operations (aka texture sample from the RenderTarget we are currently rendering into)?? Is there some hardware design issue in the video card pipelines that make this difficult to accomplish?
The reason this would be useful, is because you could do things like make refraction shaders run way faster as you wouldn't have to swap back and forth between two renderTargets for each refractive object overlay. Such as a refractive windowing system for an OS or game UI.
Where might be the best place to suggest an idea like this as this is not a discussion forum as I would love to see this in D3D12? Or is this already possible in D3D11?

So why then can we not calculate or own blending operations
Who says you can't? With shader_image_load_store (and the D3D11 equivalent), you can do pretty much anything you want with images. Provided that you follow the rules. That last part is generally what trips people up. Doing a full read/modify/write in a shader, such that later fragment shader invocations don't read the wrong value is almost impossible in the most general case. You have to restrict it by saying that each rendered object will not overlap with itself, and you have to insert a memory barrier between rendered objects (which can overlap with other rendered objects). Or you use the linked list approach.
But the point is this: with these mechanisms, not only have people implemented blending in shaders, but they've implemented order-independent transparency (via linked lists). Nothing is stopping you from doing what you want right now.
Well, nothing except performance of course. The fixed-function blender will always be faster because it can run in parallel with the fragment shader operations. The blending units are separate hardware from the fragment shaders, so you can be doing blending operations while simultaneously doing fragment shader ops (obviously from later fragments, not the ones being blended).
The read/modify/write mechanism in the blend hardware is designed specifically for blending, while the image_load_store is a more generic mechanism. And while generic may beat specific in the long-term of hardware evolution, for the immediate and near-future, you can expect fixed-function blending to beat image_load_store blending performance-wise every time.
You should use it only when you must. And even the, decide if you really, really need it.

Is there some hardware design issue in the video card pipelines that make this difficult to accomplish?
Yes, this is actually the case. If one could do blending in the fragment shader, this would introduce possible feedback loops, and this really complicates things. Blending is done in a separate hardwired stage for performance and parallelization reasons.

Advanced Text Rendering with Direct3D

Let me describe the "battlefield" of my task:
Multi-room audio/video chat with more than 1M users;
Custom Direct3D renderer;
What I need to implement is a TextOverVideo feature. The Text itself goes via network and is to be rendered on the recipient side with Direct3D renderer. AFAIK, it is commonly used in game development to create your own texture with letters/numbers and draw this items. Because our application must support many languages, we ought to use a standard. That's why I've been working with ID3DXFont interface but I've found out some unsatisfied limitations.
What I've faced is a lack of scalability. E.g. if user is resizing video window I have to RE-create D3DXFont with new D3DXFONT_DESC while he's doing that. I think it is unacceptable.
That is why the ONLY solution I see (due to my skills) is somehow render the text to a texture and therefore draw sprite with scaling, translation etc.
So, I'm not sure if I go into the correct direction. Please help with advice, experience, literature, sources...

Your question is a bit unclear. As I understand it, you want easily scalable font.
I think it is unacceptable
As far as I know, this is standard behavior for fonts - even for system fonts. They aren't supposed to be easily scalable.
Possible solutions:
Use ID3DXRenderTarget for rendering text onto texture. Font will be filtered when you scale it up too much. Some people will think that it looks ugly.
Write custom library that supports vector fonts. I.e. - it should be able to extract font outline from font, and build text from it. It will be MUCH slower than ID3DXFont (which is already slower than traditional "texture" fonts). Text will be easily scalable. Using this way, you are very likely to get visible artifacts ("noise") for small text. I wouldn't use that approach unless you want huge letters (40+ pixels). Freetype library may have functions for processing font outlines.
Or you could try using D3DXCreateText. This will create 3D text for ONE string. Won't be fast at all.
I'd forget about it. As long as user is happy about overall performance, improving font rendering routines (so their behavior looks nice to you) is not worth the effort.
--EDIT--
About ID3DXRenderTarget.
EVen if you use ID3DXRenderTarget, you'll need ID3DXFont. I.e. you use ID3DXFont to render text onto texture, and then use texture to blit text onto screen.
Because you said that performance is critical, you can delay creation of new ID3DXFont until user stops resizing video. I.e. When user starts resizing video, you use old font, but upscale it using texture. There will be filtering, of course. Once user stops resizing, you create new font when you have time. you probably can do that in separate thread, but I'm not sure about it. OR you could simply always render text in the same resolution as video. This way you won't have to worry about resizing it (it still will be filtered - along with the video). Some video players work this way.
Few more things about ID3DXFont. There is one problem with ID3DXFont - it is slow in situations where you need a lot of text (but you still need it, because it supports unicode, and writing texturefont with unicode support is pain). Last time I worked with it I optimized things by caching commonly used strings in the textures. I.e. any string that was drawn more than 3 frames in the row were rendered onto D3DFMT_A8R8G8B8 texture/render target, and then I've been copying that string from texture instead of using ID3DXFont. Strings that weren't rendered for a while, were removed from texture. That gave some serious boost. This solution, however is tricky - monitoring empty space in the texture, removing unused strings, and defragmenting the texture isn't exactly trivial (there is nothing exceptionally complicated, but it is easy to make a mistake). You won't need such complicated system unless your screen is literally covered by text.

ID3DXFont fonts are flat, always parallel to the screen. D3DXCreateText are meshes that can be scaled and rotated.
Texture fonts are fuzzy and don't look very clear. Not good for an app that uses lots of small text.
I am writing an app that can create 500 text meshes, each mesh averaging 3,000-5,000 vertices. The text meshes are created once, then are static. I get 700 fps on a GeForce 8800.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string