Looking for cool LED graphics routines that don't require arrays - graphics

I have made a 24 x 15 LED matrix using an Arduino, shift registers and TLC5940s.
The Arduino Uno has a measly 32 KB of memory, so the graphics are not stored to arrays beforehand. Rather, I write algorithms to generate artistic animations using math equations.
Example code for a rainbow sine wave is:
for (int iterations = 0; iterations < times; iterations++)
{
val += PI/500;
for (int col = 0; col < NUM_COLS; col++)
{
digitalWrite(layerLatchPin, LOW);
shiftOut(layerDataPin, layerClockPin, MSBFIRST, colMasks[col] >> 16 );
shiftOut(layerDataPin, layerClockPin, MSBFIRST, colMasks[col] >> 8 );
shiftOut(layerDataPin, layerClockPin, MSBFIRST, colMasks[col] );
digitalWrite(layerLatchPin, HIGH);
Tlc.clear();
int rainbow1 = 7 + 7*sin(2*PI*col/NUM_COLS_TOTAL + val);
setRainbowSinkValue(rainbow1, k);
Tlc.update();
}
}
Where the setRainbowSinkValue sets one of the LEDS from 1 to 15 to a certain colour, and val shifts the wave to the right every iteration.
So I'm looking for simple graphics routines like this, in order to get cool animations, without having to store everything in arrays, as 15 x 24 x RGB quickly uses up all 32 KB of RAM.
I will try get an Arduino Mega, but let's assume this isn't an option for now.
How can I do it?

There are many effects you can get if you start to overlay simple functions like sin or cos. This guy creates the "plasma" effect which I think is always a cool thing to watch :)
Another way is to use noise functions to calculate the color of your pixels. You get a lot of examples if you google for "Arduino Perlin noise" (depending on your Arduino model you might not be able to get high framerates because Perlin noise requires some CPU power).

I've been working on similar graphics style projects with the Arduino and have considered a variety of strategies to deal with the limited. Personally I find algorithmic animations rather banal and generic unless they are combined with other things or directed in some way.
At any rate, the two approaches I have been working on:
defining a custom format to pack the data as bits and then using bitshifting to unpack it
storing simple SVG graphics in PROGMEM and then using sprite techniques to move them around the screen (with screen wrap around etc.). By using Boolean operations to merge multiple graphics together it's possible to get animated layer effects and build up complexity/variety.
I only use single color LEDs so things are simpler conceptually and datawise.

A good question but you're probably not going to find anything due to the nature of the platform.
You have the general idea to use algorithms to generate effects, so you should go ahead and write more crazy functions.
You could package your functions and make them available to everyone.
Also, if you allow it, use the serial port to communicate with a host that has more resources and can supply endless streams of patterns.
Using a transmitter and receiver will also work for connecting to another computer.

I will answer related questions but not exactly the question you asked because I am not a graphics expert....
First of all, don't forget PROGMEM, which allows you to store data in flash memory. There is a lot more flash than SRAM, and the usual thing to do, in fact, is to store extra data in flash.
Secondly, there are compression techniques that are available that will reduce your memory consumption. And these "compression" techniques are natural to the kinds of tasks you are doing anyway, so the word "compression" is a bit misleading.
First of all, we observe that because human perception of light intensity is exponential (shameless link to my own answer on this topic), depending on how exactly you use the LED drivers, you need not store the exact intensity. It looks like you are using only 8 bits of intensity on the TLC5940, not full 12. For 8 bits of LED driver intensity, you only have 8 or 9 different intensity values (because the intensity you tell the LED driver to use is 2^perceptible_intensity). 8 different values can be stored in only three bits. Storing three bit chunks in bytes can be a bit of a pain, but you can still treat each "pixel" in your array as a uint16_t, but store the entire color information. So you reduce your memory consumption to 2/3 of what it was. Furthermore, you can just palettize your image: each pixel is a byte (uint8_t), and indexes a place in a palette, which could be three bytes if you'd like. The palette need not be large, and, in fact, you don't have to have a palette at all, which just means having a palette in code: your code knows how to transform a byte into a set of intensities. Then you generate the actual intensity values that the TLC5940 needs right before you shift them out.

Related

How to know which Metal texture format to use? short or half?

Apple's Metal examples alternate between using texture2D<float> and texture2D<half>, and I believe that the default pixel format of a MTKView is bgra8unorm. What determines if I should use float or half? Do I need to specify something on the CPU, or will textures using 128-bits-in-all float4s be converted into halfs automatically? How about the other way around if I pass-in a texture with a bgra8format? I am asking because I am trying to load textures using MTKTextureLoader as well as from plain byte data, and I'm not sure what format to use for the plain byte data so things are consistent. May I have clarification?
It really depends on your use case.
Loading: You can probably safely load your data into a texture with the same format as that data. When your render destination has a different format, Metal will perform the conversion for you.
Intermediates: The format of intermediate textures should really depend on the "resolution" (as in "number of bits") you need, which depends on the input data and the color space. If you only handle sRGB data, 8-bit textures are probably enough (unless you do some complicated processing that requires a higher precision). If you want to support a wide gamut (e.g. in Display P3 color space), you need more precision (half should be fine) and also want to be able to store values outside of the [0...255] range. On iOS I'd recommend using half for memory efficiency (and since most devices don't support full float anyway), on macOS float is the default, I think.
View: The pixel format of the view should really depend on the display. Most of the screens support the Display P3 color space now. For that, you should use the bgra10_xr format, since it's optimized for that case. Otherwise bgra8unorm is fine.
In general, you should be using a texture format with the smallest memory footprint that fits your use case.

Why emulate for certain number of cycles?

I have seen in more than one places - the following way of emulating
i.e cycles is passed into emulate function
int CPU_execute(int cycles) {
int cycle_count;
cycle_count = cycles;
do {
/* OPCODE execution here */
} while(cycle_count > 0);
return cycles - cycle_count;
}
I am having hard time understand why would you do this approach for emulating i.e why would you emulate for certain number of cycles? Can you give some scenarios where this approach is useful.
Any help is heartily appreciated !!!
Emulators tend to be interested in fooling the software written for multiple chip devices — in terms of the Z80 and the best selling devices you're probably talking about at least a graphics chip and a sound chip in addition to the CPU.
In the real world those chips all act concurrently. There'll be some bus logic to allow them all to communicate but they're otherwise in worlds of their own.
You don't normally run emulation of the different chips as concurrent processes because the cost of enforcing synchronisation events is too great, especially in the common arrangement where something like the same block of RAM is shared between several of the chips.
So instead the most basic approach is to cooperatively multitask the different chips — run the Z80 for a few cycles, then run the graphics chip for the same amount of time, etc, ad infinitum. That's where the approach of running for n cycles and returning comes from.
It's usually not an accurate way of reproducing the behaviour of a real computer bus but it's easy to implement and often you can fool most software.
In the specific code you've posted the author has further decided that the emulation will round the number of cycles up to the end of the next whole instruction. Again that's about simplicity of implementation rather than being anything to do with the actual internals of a real machine. The number of cycles actually run for is returned so that other subsystems can attempt to adapt.
Since you mentioned z80, I happen to know just the perfect example of the platform where this kind of precise emulation is sometimes necessary: ZX Spectrum. The standard graphics output area on ZX Spectrum was a box of 256 x 192 pixels situated in the centre of the screen, surrounded by a fairly wide "border" area filled by a solid color. The color of the border was controlled by outputing a value into a special output port. The computer designer's idea was that one would simply choose the border color that is the most appropriate to what is happening on the main screen.
ZX Spectrum did not have a precision timer. But programmers quickly realised that the "rigid" (by modern standards) timings of z80 allowed one to do drawing that was synchronised with the movement of the monitor's beam. On ZX Spectrum one could wait for the interrupt produced at the beginning of each frame and then literally count the precise number of cycles necessary to achieve various effects. For example, a single full scanline on ZX Spectrum was scanned in 224 cycles. Thus, one could change the border color every 224 cycles and generate pixel-thick lines on the border.
Graphics capacity of the ZX Spectrum was limited in a sense that the screen was divided into 8x8 blocks of pixels, which could only use two colors at any given time. Programmers overcame this limitation by changing these two colors every 224 cycles, hence, effectively, increasing the color resolution 8-fold.
I can see that the discussion under another answer focuses on whether one scanline may be a sufficiently accurate resolution for an emulator. Well, some of the border scroller effects I've seen on ZX Spectrum are, basically, timed to a single z80-cycle. Emulator that wants to reproduce the correct output of such codes would also have to be precise to a single machine cycle.
If you want to sync your processor with other hardware it could be useful to do it like that. For instance, if you want to sync it with a timer you would like to control how many cycles can pass before the timer interrupts the CPU.

Fast pixel drawing library

My application produces an "animation" in a per-pixel manner, so i need to efficiently draw them. I've tried different strategies/libraries with unsatisfactory results especially at higher resolutions.
Here's what I've tried:
SDL: ok, but slow;
OpenGL: inefficient pixel operations;
xlib: better, but still too slow;
svgalib, directfb, (other frame buffer implementations): they seem perfect but definitely too tricky to setup for the end user.
(NOTE: I'm maybe wrong about these assertions, if it's so please correct me)
What I need is the following:
fast pixel drawing with performances comparable to OpenGL rendering;
it should work on Linux (cross-platform as a bonus feature);
it should support double buffering and vertical synchronization;
it should be portable for what concerns the hardware;
it should be open source.
Can you please give me some enlightenment/ideas/suggestions?
Are your pixels sparse or dense (e.g. a bitmap)? If you are creating dense bitmaps out of pixels, then another option is to convert the bitmap into an OpenGL texture and use OpenGL APIs to render at some framerate.
The basic problem is that graphics hardware will be very different on different hardware platforms. Either you pick an abstraction layer, which slows things down, or code more closely to the type of graphics hardware present, which isn't portable.
I'm not totally sure what you're doing wrong, but it could be that you are writing pixels one at a time to the display surface.
Don't do that.
Instead, create a rendering surface in main memory in the same format as the display surface to render to, and then copy the whole, rendered image to the display in a single operation. Modern GPU's are very slow per transaction, but can move lots of data very quickly in a single operation.
Looks like you are confusing window manager (SDL and xlib) with rendering library (opengl).
Just pick a window manager (SDL, glut, or xlib if you like a challenge), activate double buffer mode, and make sure that you got direct rendering.
What kind of graphical card do you have? Most likely it will process pixels on the GPU. Look up how to create pixel shaders in opengl. Pixel shaders are processing per pixel.

Programming graphics and sound on PC - Total newbie questions, and lots of them!

This isn't exactly specifically a programming question (or is it?) but I was wondering:
How are graphics and sound processed from code and output by the PC?
My guess for graphics:
There is some reserved memory space somewhere that holds exactly enough room for a frame of graphics output for your monitor.
IE: 800 x 600, 24 bit color mode == 800x600x3 = ~1.4MB memory space
Between each refresh, the program writes video data to this space. This action is completed before the monitor refresh.
Assume a simple 2D game: the graphics data is stored in machine code as many bytes representing color values. Depending on what the program(s) being run instruct the PC, the processor reads the appropriate data and writes it to the memory space.
When it is time for the monitor to refresh, it reads from each memory space byte-for-byte and activates hardware depending on those values for each color element of each pixel.
All of this of course happens crazy-fast, and repeats x times a second, x being the monitor's refresh rate. I've simplified my own likely-incorrect explanation by avoiding talk of double buffering, etc
Here are my questions:
a) How close is the above guess (the three steps)?
b) How could one incorporate graphics in pure C++ code? I assume the practical thing that everyone does is use a graphics library (SDL, OpenGL, etc), but, for example, how do these libraries accomplish what they do? Would manual inclusion of graphics in pure C++ code (say, a 2D spite) involve creating a two-dimensional array of bit values (or three dimensional to include multiple RGB values per pixel)? Is this how it would be done waaay back in the day?
c) Also, continuing from above, do libraries such as SDL etc that use bitmaps actual just build the bitmap/etc files into machine code of the executable and use them as though they were build in the same matter mentioned in question b above?
d) In my hypothetical step 3 above, is there any registers involved? Like, could you write some byte value to some register to output a single color of one byte on the screen? Or is it purely dedicated memory space (=RAM) + hardware interaction?
e) Finally, how is all of this done for sound? (I have no idea :) )
a.
A long time ago, that was the case, but it hasn't been for quite a while. Most hardware will still support that type of configuration, but mostly as a fall-back -- it's not how they're really designed to work. Now most have a block of memory on the graphics card that's also mapped to be addressable by the CPU over the PCI/AGP/PCI-E bus. The size of that block is more or less independent of what's displayed on the screen though.
Again, at one time that's how it mostly worked, but it's mostly not the case anymore.
Mostly right.
b. OpenGL normally comes in a few parts -- a core library that's part of the OS, and a driver that's supplied by the graphics chipset (or possibly card) vendor. The exact distribution of labor between the CPU and GPU varies somewhat though (between vendors, over time within products from a single vendor, etc.) SDL is built around the general idea of a simple frame-buffer like you've described.
c. You usually build bitmaps, textures, etc., into separate files in formats specifically for the purpose.
d. There are quite a few registers involved, though the main graphics chipset vendors (ATI/AMD and nVidia) tend to keep their register-level documentation more or less secret (though this could have changed -- there's constant pressure from open source developers for documentation, not just closed-source drivers). Most hardware has capabilities like dedicated line drawing, where you can put (for example) line parameters into specified registers, and it'll draw the line you've specified. Exact details vary widely though...
e. Sorry, but this is getting long already, and sound covers a pretty large area...
For graphics, Jerry Coffin's got a pretty good answer.
Sound is actually handled similarly to your (the OP's) description of how graphics is handled. At a very basic level, you have a "buffer" (some memory, somewhere).
Your software writes the sound you want to play into that buffer. It is basically an encoding of the position of the speaker cone at a given instant in time.
For "CD quality" audio, you have 44100 values per second (a "sample rate" of 44.1 kHz).
A little bit behind the write position, you have the audio subsystem reading from a read position in the buffer.
This read position will be a little bit behind the write position. The distance behind is known as the latency. A larger distance gives more of a delay, but also helps to avoid the case where the read position catches up to the write position, leaving the sound device with nothing to actually play!

When using Direct3D, how much math is being done on the CPU?

Context: I'm just starting out. I'm not even touching the Direct3D 11 API, and instead looking at understanding the pipeline, etc.
From looking at documentation and information floating around the web, it seems like some calculations are being handled by the application. That, is, instead of simply presenting matrices to multiply to the GPU, the calculations are being done by a math library that operates on the CPU. I don't have any particular resources to point to, although I guess I can point to the XNA Math Library or the samples shipped in the February DX SDK. When you see code like mViewProj = mView * mProj;, that projection is being calculated on the CPU. Or am I wrong?
If you were writing a program, where you can have 10 cubes on the screen, where you can move or rotate cubes, as well as viewpoint, what calculations would you do on the CPU? I think I would store the geometry for the a single cube, and then transform matrices representing the actual instances. And then it seems I would use the XNA math library, or another of my choosing, to transform each cube in model space. Then get the coordinates in world space. Then push the information to the GPU.
That's quite a bit of calculation on the CPU. Am I wrong?
Am I reaching conclusions based on too little information and understanding?
What terms should I Google for, if the answer is STFW?
Or if I am right, why aren't these calculations being pushed to the GPU as well?
EDIT: By the way, I am not using XNA, but documentation notes the XNA Math Library replaces the previous DX Math library. (i see the XNA Library in the SDK as a sheer template library).
"Am I reaching conclusions based on too little information and understanding?"
Not as a bad thing, as we all do it, but in a word: Yes.
What is being done by the GPU is, generally, dependent on the GPU driver and your method of access. Most of the time you really don't care or need to know (other than curiosity and general understanding).
For mViewProj = mView * mProj; this is most likely happening on the CPU. But it is not much burden (counted in 100's of cycles at the most). The real trick is the application of the new view matrix on the "world". Every vertex needs to be transformed, more or less, along with shading, textures, lighting, etc. All if this work will be done in the GPU (if done on the CPU things will slow down really fast).
Generally you make high level changes to the world, maybe 20 CPU bound calculations, and the GPU takes care of the millions or billions of calculations needed to render the world based on the changes.
In your 10 cube example: You supply a transform for each cube, any math needed for you to create the transform is CPU bound (with exceptions). You also supply a transform for the view, again creating the transform matrix might be CPU bound. Once you have your 11 new matrices you apply the to the world. From a hardware point of view the 11 matrices need to be copied to the GPU...that will happen very, very fast...once copied the CPU is done and the GPU recalculates the world based on the new data, renders it to a buffer and poops it on the screen. So for your 10 cubes the CPU bound calculations are trivial.
Look at some reflected code for an XNA project and you will see where your calculations end and XNA begins (XNA will do everything is possibly can in the GPU).

Resources