GPU-based video cards to accelerate your program calculations, How? - multithreading

I read in this article that a company has created a software capable of using multiple GPU-based video cards in parallel to process hundreds of billions fixed-point calculations per second.
The program seems to run in Windows. Is it possible from Windows to assign a thread to a GPU? Do they create their own driver and then interact with it? Any idea of how they do it?

I imagine that they are using a language like CUDA to program the critical sections of code on the GPUs to accelerate their computation.
The main function for the program (and its threads) would still run on the host CPU, but data are shipped off the the GPUs for processing of advanced algorithms. CUDA is an extension to C syntax, so it makes it easier to programmer than having to learn the older shader languages like Cg for programming general purpose calculations on a GPU.

A good place to start - GPGPU
Also, for the record, I don't think there is such a thing as non-GPU based graphic cards. GPU stands for graphics processing unit which is by definition the heart of a graphics card.

Related

OpenCL GPU Audio

There's not much on this subject, perhaps because it isn't a good idea in the first place.
I want to create a realtime audio synthesis/processing engine that runs on the GPU. The reason for this is because I will also be using a physics library that runs on the GPU, and the audio output will be determined by the physics state. Is it true that GPU only carries audio output and can't generate it? Would this mean a large increase in latency, if I were to read the data back on the CPU and output it to the soundcard? I'm looking for a latency between 10 and 20ms in terms of the time between synthesis and playback.
Would the GPU accelerate synthesis by any worthwhile amount? I'm going to have a large number of synthesizers running at once, each of which I imagine could take up their own parallel process. AMD is coming out with GPU audio, so there must be something to this.
For what it's worth, I'm not sure that this idea lacks merit. If DarkZero's observation about transfer times is correct, it doesn't sound like there would be much overhead in getting audio onto the GPU for processing, even from many different input channels, and while there are probably audio operations that are not very amenable to parallelization, many are very VERY parallelizable.
It's obvious for example, that computing sine values for 128 samples of output from a sine source could be done completely in parallel. Working in blocks of that size would permit a latency of only about 3ms, which is acceptable in most digital audio applications. Similarly, the many other fundamental oscillators could be effectively parallelized. Amplitude modulation of such oscillators would be trivial. Efficient frequency modulation would be more challenging, but I would guess it is still possible.
In addition to oscillators, FIR filters are simple to parallelize, and a google search turned up some promising looking research papers (which I didn't take the trouble to read) that suggest that there are reasonable parallel approaches to IIR filter implementation. These two types of filters are fundamental to audio processing and many useful audio operations can be understood as such filters.
Wave-shaping is another task in digital audio that is embarrassingly parallel.
Even if you couldn't take an arbitrary software synth and map it effectively to the GPU, it is easy to imagine a software synthesizer constructed specifically to take advantage of the GPU's strengths, and avoid its weaknesses. A synthesizer relying exclusively on the components I have mentioned could still produce a fantastic range of sounds.
While marko is correct to point out that existing SIMD instructions can do some parallelization on the CPU, the number of inputs they can operate on at the same time pales in comparison to a good GPU.
In short, I hope you work on this and let us know what kind of results you see!
DSP operations on modern CPUs with vector processing units (SSE on x86/x64 or NEON on ARM) are already pretty cheap if exploited properly. This is particularly the case with filters, convolution, FFT and so on - which are fundamentally stream-based operations. There are the type of operations where a GPU might also excel.
As it turns out, soft synthesisers have quite a few operations in them that are not stream-like, and furthermore, the tendency is to process increasingly small chunks of audio at once to target low latency. These are a really bad fit for the capabilities of GPU.
The effort involved in using a GPU - particularly getting data in and out - is likely to far exceed any benefit you get. Furthermore, the capabilities of inexpensive personal computers - and also tablets and mobile devices - are more than enough for many digital audio applications AMD seem to have a solution looking for a problem. For sure, the existing music and digital audio software industry is not about to start producing software that only targets a limited sub-set of hardware.
Typical transfer times for some MB to/from GPU take 50us.
Delay is not your problem, however parallelizing a audio synthesizer in the GPU may be quite difficult. If you don't do it properly it may take more time the processing rather than the copy of data.
If you are going to run multiple synthetizers at once, I would recommend you to perform each synthesizer in a work-group, and parallelize the synthesis process with the work-items available. It will not be worth to have each synthesizer in one work-item, since it is unlikely you will have thousand.
http://arxiv.org/ftp/arxiv/papers/1211/1211.2038.pdf
You might be better off using OpenMP for it's lower initialization times.
You could check out the NESS project which is all about physical modelling synthesis. They are using GPUs for audio rendering because it the process involves simulating an acoustic 3D space for whichever given sound, and calculating what happens to that sound within the virtual 3D space (and apparently GPUs are good at working with this sort of data). Note that this is not realtime synthesis because it is so demanding of processing.

How does GPU programming differ from usage of the graphics card in games?

One way of doing GPU programming is OpenCL, which will work with parallelized, number-crunching operations.
Now think of your favorite 3D PC game. When the screen renders, what's going on? Did the developers hand-craft an OpenCL kernel (or something like it), or are they using pre-programmed functions in the graphics card?
Sorry to make this sound like a homework problem, I couldn't think of a better way to ask it.
H'okay, so, I'ma answer this in terms of history. Hopefully that gives a nice overview of the situation and lets you decide how to proceed.
Graphics Pipeline
3D graphics have an almost set-in-stone flow of calculations. You start with your transformation matrices, you multiply out your vertex positions (maybe generate some more on the fly), figure out what your pixels ought to be colored, then spit out the result. This is the (oversimplified) gist of 3D graphics. To change anything in it, you just twiddle one aspect of the pipeline a bit with a 'shader', i.e. little programmable elements with defined inputs and outputs so that they could be slotted into the pipeline.
Early GPGPU
Back when GPGPU was still in its infancy, the only way people had access to the massively parallel prowess of the GPU was through graphics shaders. For example, there were fragment shaders, which basically calculated what colors should be on each pixel of the screen (I'm kind of oversimplifying here, but that's what they did).
So, for example, you might use a vertex shader to chuck data about the screen before reducing a bunch of values in the fragment shader by taking advantage of color blending (effectively making the tricky transformation of mathematical problem space to... well, color space).
The gist of this is that old GPGPU stuff worked within the confines of 3D graphics, using the same 'pre-programmed functions in the graphics card' that the rest of the 3D graphics pipeline used.
It was painful to read, write, and think about (or at least, I found it so painful that I was dissuaded).
CUDA and OpenCL and [all of the other less popular GPGPU solutions]
Then some folks came along and said, "Wow, this is kind of dumb - we're stuck in the graphics pipeline when we want to be doing more general calculations!"
Thus GPGPU escaped from the confines of the graphics pipeline, and now we have OpenCL and CUDA and Brook and HSA and... Well, you get the picture.
tl;dr
The difference between GPGPU kernels and 3D graphics kernels are that the latter are stuck in a pipeline with (convenient) constraints attached to them, while the former have far more relaxed requirements, the pipeline is defined by the user, and the results don't have to be attached to a display (although they can be if you're masochistic like that).
When you run a game there may be two distinct systems operating on your GPU:
OpenGL renders images to your screen (graphics)
OpenCL does general-purpose computing tasks (compute)
OpenGL is programed with shaders. OpenCL is programmed with kernels.
If you would like to learn in more detail how games work on the GPU, I recommend reading about OpenCL, OpenGL, and game engine architecture.

How to utilize 2d/3d Graphics Acceleration on a Single Board Computer

This may be a somewhat silly question, but if you are working with a single board computer that boasts that it has 2d/3d graphics acceleration, what does this actually mean?
If it supports DirectX or OpenGL obviously I could just use that framework, but I am not familiar with working from this end of things. I do not know if that means that it is capable of having those libraries included into the OS or if it just means that it does certain kinds of math more quickly (either by default or through some other process)
Any clarification on what this means or locations of resources I could use on such would be greatly appreciated.
On embedded system's, 2D/3D Graphics Acceleration could mean a lot of things. For instance, that framebuffer operations are accelerated through DirectFB, or that OpenGL ES is supported.
The fact is that the manufacturer of the board usually provides these libraries since the acceleration of the graphics itself is deeply connected to the hardware.
It's best to get in touch with your manufacturer and ask which graphics libraries they support that are hardware accelerated.
There are two very important features of 2D/3D graphics cards:
Take a load away from the CPU
Process that load much faster than the CPU can do because it has a special instruction set that was designed explicitly for calculations that are common in graphics (e.g. transformations)
Sometimes other jobs are passed on to the GPU because a job requires calculations that fit very good to the instructions of the GPU. E.g. a physics library requires lots of matrix calculation so a GPU could be used to do that. NVIDIA made PHYSIX to do exactly that. See this FAQ also
The minumum a graphics display required is to allow the setting of the state (colour) of individual pixels. This allows you to render any image within the resolution and colour depth of the display, but for complex drawing tasks and very high resolution displays this would be very slow.
Graphics acceleration refers to any graphics processing function off-loaded to hardware. At its simplest this may mean the drawing and filling of graphics primitives such as lines, and polygons, and 'blitting' - the moving of blocks of pixels from one location to another. Technically graphics acceleration has been largely replaced by graphics processors (GPUs), though the affect is the same - faster graphics. GPUs are more flexible since a hardware accelerator can accelerate only the set of operations they are hard wrired to perform, which may benefit some applications more than others.
Modern GPU graphics hardware performs far higher level graphics processing. It is also possible to use the GPU to perform more general purpose matrix computation tasks using interfaces such as Nvidia's CUDA, which can then accelerate computational tasks other than graphics but which require teh same kind of mathematical operations.
The Wikipedia "Graphics processing unit" article has a history of Graphics Accelerators and GPUs

Programming graphics in assembler?

I've developed a running Super Mario Sprite using Visual C++ 6.0 and DirectX. But this isn't very satisfying to me (raping a 3D-Multimedia-framework for displaying a 2D sprite only), so I would like to be able to program an animated sprite using C and assembler only.
So, when looking to old games (Wolfenstein, for example) it looks that most of the game is written in C and everytime it comes to graphics output there is the use of assembler.
Unfortunatly when trying to use this old assembler code there is always the error message "NTVDM.exe has found an invalid instruction" so this things don't seem to work nowadays.
Is there any tutorial on graphics programming in assembler that is still usefull?
(I don't want to use any bloated frameworks or libraries, I just want to develop everything on my own. WinAPI would be OK for creating a full screen window and for catching user input, but not for graphics because I read GDI is too slow for fast graphics.)
I'm using WindowsXP and MASM or A86.
I totally agree with samcl
The main reason for not using assembler anymore is that you cannot access the Videomemory anymore. Back in the early days (you mentioned Castle Wolfenstein) there was a special video mode called 0x13h where your graphic was just a block of memory(each pixel was a palette color ranging from 0-255<--1 Byte) You were able to access this memory through this specific video mode, however, today things are much more complicated
Today you have very fast Videomemory and using your CPU for accessing it will just tear down all performance, as you CPU is connected through PCI-Express/AGP/PCI/VESA-LOCALBUS/ISA (<- remembering anyone!?)
Graphicsprogramming is often a lot of read and write accesses(read pixel, check if it is transparent, multiply with alpha, write pixel, etc.)
The modern memory Interfaces are much slower than direct access inside the graphic card. That's why you really should use shaders, as Robert Gould suggests. In this way you can write faster and easier to understand code and it will not stall your GFX-Memory.
IF you are more interested in GFX Programming, you can wet your appetite with shadertoy, a community dedicated to shaderbased effects complete with WebGLbased Shadercode execution.
Also your beginner assembler code will be pretty lame. in quality as in performance. Trust me. It needs a lot of time for optimizing such primitive code. So your compiled C/C++ Code will outperform your handwritten asm easily.
If you are interested in Assembler, try to code something like diskaccess. This is where you can gain a lot of performance.
It sounds like you only use Assembler because you seem to think that this is necessary. This isn't the case. If you don't have any other reason for it (i.e. wanting to learn it), don't use Assembler here, unless you know exactly what you're doing.
For your average graphics engine, Assembler programming is completely unnecessary. Especially when it comes to a Super Mario style 2D sprite engine. Even “slow” scripting languages like Python are fast enough for such things nowadays.
Adding to that, if you don't know very precisely what you're doing, Assembler will not be faster than C (in fact, chances are it will be slower because you'll re-implement existing C functions less efficiently).
I'm guessing if you are already using C with DirectX, speed is not the issue, and that this is more of a learning exercise. For 2D under Windows, C and DirectX is going to be very fast indeed, and as Konrad Rudolph points out, hand cranked assembler is unlikely to be faster than a highly optimized SDK.
From a purely educational standpoint, it is an intersting activity, but quite complex. Back in the early days of home computers and the first PCS, the graphics screen appeared pretty much as a block of memory, where bytes corresponded to one or more coloured pixels. By changing the value of the screen memory you could plot points, and hence lines, and on to sprites etc... On modern PCs this tends not to be an option, in that you program a graphics card, usually via an SDK, to do the same job. The card then does the hard work, and you are provided with a much higher level of abstraction. If you really wanted to get a feel for what it was like back in the day, I would recommend an emulator. For a modern game, stick with your SDKs.
It is possible to program your own 2D engine in a recent version of Directx, if you wish to investigate this avenue. You can create a "screen space" aligned polygon, with no perspective correction, of which is texture mapped. You can then plot your sprites on a pixel-by-pixel basis onto this texture map.
As for mode 13h (Peter Parker), it brings back some memories!
__asm
{
mov ax,0x13
int 10h // 16-bit code only, not Windows
}
But of course this will fault in a 32-bit or 64-bit Windows program; 16-bit BIOS calls are not supported by the Windows kernel (which installs its own interrupt table as part of booting and switching the CPU to 64-bit mode.)
I would tend to avoid assembler with a barge pole, it can be particulary difficult to debug, and maintain; however if you wish to explore this subject in more detail, I can recommend Michal Abrash's Graphics Programming Black Book. It's a bit old, but a good read and will give you some insight into graphics programming techniques before 3D hardware.
Assembler for graphics was because, back then, most people lacked graphics card with 3d support, so it had to be done on the CPU, not anymore. Nowadays it's about shader programming. Shader languages allow you to cuddle up with the bare metal. So if anything you should try to code your 2d graphics to be shadered base, that way the experience will have value as a career skill.
Try CUDA for a starter.
My recommendation is to experiment. Take your sprite code and write in a number of forms, starting with C/GDI and C++/DirectDraw. Don't worry about assembler yet.
DirectX is your best bet for fast action graphics. Learn it, then figure out how to micro-optimize with assembler. In general, assembler isn't going to make your API calls faster. It is going to open up flexibility for faster computation for things like 3D rotation, texture mapping, shading, etc.
Start with DirectDraw. Here's a FAQ. Technically, DirectDraw is deprecated after DirectX 7, but you can still use it and learn from it. It'll allow you direct framebuffer modification, which is what you're probably looking for.
There's some helpful tutorials and forums at TripleBuffer Software.
Also consider upgrading your compiler to Visual C++ 2008 Express. VC++ 6 has a buggy compiler that can be problematic with trying to compile certain C++ libraries.

Learning about low-level graphics programming

I'm interesting in learning about the different layers of abstraction available for making graphical applications.
I see a lot of terms thrown around: At the highest level of abstraction, I hear about things like C#, .NET, pyglet and pygame. Further down, I hear about DirectX and OpenGL. Then there's DirectDraw, SDL, the Win32 API, and still other multi-platform libraries like WxWidgets.
How can I get a good sense of where one of these layers ends and where the next one begins? What is the "lowest possible level" way of creating a window in Windows, in C? What about C++? (A code sample would be divine.) What about in X11? Are the Windows implementations of OpenGL and DirectX built on top of the Win32 API? Where can I begin to learn about these things?
There's another question on SO where Programming Windows is suggested. What about for Linux? Is there an equivalent such book?
I'm aware that this is very low-level, and that there are many friendlier tools available, but I would like to at least learn the basics of what's going on beneath the surface. As much as I'd like to begin slinging windows and vectors right off the bat, starting with something like pygame is too high-level for me; I really need to make the full conceptual circuit of how you draw stuff on a computer.
I will certainly appreciate suggestions for books and resources, but I think it would be stupendously cool if the answers to this question filled up with lots of different ways to get to "Hello world" with different approaches to graphics programming. C? C++? Using OpenGL? Using DirectX? On Windows XP? On Ubuntu? Maybe I ask for too much.
The lowest level would be the graphics card's video RAM. When the computer first starts, the graphics card is typically set to the 80x25 character legacy mode.
You can write text with a BIOS provided interrupt at this point. You can also change the foreground and background color from a palette of 16 distinctive colors. You can use access ports/registers to change the display mode. At this point you could say, load a different font into the display memory and still use the 80x25 mode (OS installations usually do this) or you can go ahead and enable VGA/SVGA. It's quite complicated, that's what drivers are for.
Once the card's in the 'higher' mode you'd change what's on screen by accessing the memory mapped to the video card. It's stored horizontally pixel by pixel with some 'dirty regions' of pixels that aren't mapped to screen at the end of each line which you have to compensate for. But yeah, you could copy the pixels of an image in memory directly to the screen.
For things like DirectX, OpenGL. rather than write directly to the screen, commands are sent to the graphics card and it updates its screen automatically. Commands like "Hey you, draw this image I've loaded into the VRAM here, here and here" or "Draw these triangles with this transformation matrix..." take a fraction of the time compared to pixel by pixel . The CPU will thank you.
DirectX/OpenGL is a programmer friendly library for sending those commands to the card with all the supporting functions to help you get it done smoothly. A more direct approach would only be unproductive.
SDL is an abstraction layer so without bothering to read up on it I'd guess it would have different ways of working on each system. On one it might use semi-direct screen writing, another Direct3D, etc. Whatever's fastest as long as the code stays cross-platform..able.
The GDI/GDI+ and XWindow system. They're designed specifically to draw windows. Originally they drew using the pixel-by-pixel method (which was good enough because they'd only have to redraw when a button was pressed or a window moved, etc.) but now they use Direct3D/OpenGL for accelerated drawing (and special effects). Optimizations depend on the versions and implementations of these libraries.
So if you want the most power and speed, DirectX/openGL is the way to go. SDL is certainly useful for getting the most from a cross-platform environment and integrates with OpenGL anyway. The windowing system comes last but don't underestimate it. Especially with the stuff Microsoft's coming up with lately.
Michael Abrash's Graphics Programming 'Black Book' is a great place to start. Plus you can download it for free!
If you really want to start at the bottom then drawing a line is the most basic operation. Computer graphics is simply about filling in pixels on a grid (screen), so you need to work out which pixels to fill in to get a line that goes from (x0,y0) to (x1,y1).
Check out Bresenham's algorithm to get a feel for what is involved.
To be a good graphics and image processing programmer doesn't require this low level knowledge, but i do hate to be clueless about the insides of what i'm using. I see two ways to chase this - high-level down, or bottom-level up.
Top-down is a matter of following how the action traces from a high-level graphics operation such as to draw a circle, to the hardware. Get to know OpenGL well. Then the source to Mesa (free!) provides a peek at how OpenGL can be implemented in software. The source to Xorg would be next, first to see how the action goes from API calls through the client side to the X server. Finally you dive into a device driver that interfaces with hardware.
Bottom up: build your own graphics hardware. Think of ways it could connect to a computer - how to handle massive numbers of pixels through a few byte-size registers, how DMA would work. Write a device driver, and try designing a graphics library that might be useful for app programmers.
The bottom-up way is how i learned, years ago when it was a possibility with the slow 8-bit microprocessors. The direct experience with circuitry and hardware-software interfacing gave me a good appreciation of the difficult design decisions - e.g. to paint rectangles using clever hardware, in the device driver, or higher level. None of this is of practical everyday value, but provided a foundation of knowledge to understand newer technology.
see Open GPU Documentation section:
http://developer.amd.com/documentation/guides/Pages/default.aspx
HTH
On MSWindows it is easy: you use what the API provides, whether it is the standard windows programming API or the DirectX-family API's: that's what you use, and they are well documented.
In an X windows environment you use whatever X11-libraries that are provided. If you want to understand the principles behind windowing on X, I suggest that you do this, nevermind that many others tell you not to, it will really help you to understand graphics and windowing under X. You can read the documentation on X-programming (google for it). (After this exercise you would appreciate the higher level libraries!)
Apart from the above, at the absolutely lowest level (excluding chip-level) that you can go is to call the interrupts that switch to the various graphics modes available - there are several - and then write to the screen buffers, but for this you would have to use assembler, anything else would be too slow. Going this way will not be portable at all.
Another post mentions Abrash's Black Book - an excellent resource.
Edit: As for books on programming Linux: it is a community thing, there are many howto's around; also find a forum, join it, and as long as you act civilized you will get all the help you can ever need.
Right off the bat, I'd say "you're asking too much." From what little experience I've had, I would recommend reading some tutorials or getting a book on either directX or OpenGL to start out. To go any lower than that would be pretty complex. Most of the books I've seen in OGL or DX have pretty good introductions that explain what the functions/classes do.
Once you get the hang of one of these, you could always dig in to the libraries to see what exactly they're doing to go lower.
Or, if you really, absolutely MUST learn the LOWEST level... read the book in the above post.
libX11 is the lowest level library for X11. I believe the opengl/directx talk to the driver/hardware directly (or emulate unsupported ops), so they would be the lowest level library.
If you want to start with very low level programming, look for x86 assembly code for VGA and fire up a copy of dosbox or similar.
Vulkan api is an api which gives you very low level access to most if not all features of the gpu, computational and graphical, it works on amd and Nvidia gpus (not all)
you can also use CUDA, but it only works on Nvidia gpus and has access to computational features only, no video output.

Resources