What is the "side scrolling hack" from old games? - graphics

I have heard that old arcade side scrolling games used a specific programming hack to enable performant side scrolling.
I understand that years ago the machines weren't powerful enough to repaint the whole screen every frame as it's done nowadays. There are techniques, such as dirty rectangles, which allow to minimise the screen area needed to repaint when the background is stationary and only the sprites move.
The above approach only works when the background doesn't change (and hence most of the screen pixels remain stationary).
Vertical scrolling games, like old school shoot'em ups, have the thing a bit more difficult with the background changing every frame due to the scroll. However, one could take advantage of the way pixels are fed to the display (line-by-line). I imagine that one could use a bigger buffer and shift the data pointer some lines "down" every frame, so that it will be redrawn starting from another position, thus giving the impression of a smooth scroll. Still only sprites (and a bit of the background at the edge of the screen) would need to be redrawn, which is a serious optimisation.
However, for side scrolling games, the thing is not that simple and obvious. Still, I'm aware that somebody, somewhere in the past, has though of an optimisation which (with some limitations) allowed the old machines to scroll the background horizontally without redrawing it every frame.
IIRC it was used in many old games, mostly 80's beat'em ups, as well as in demoscene productions
Can you describe this technique and name its author?

I have written games for the good old C64 doing exactly this. And there are basically two things to be aware of:
These games were NOT using bitmapped graphics, but instead used "remapped" character fonts, which means that chunks of 8x8 pixels were actually hurdled around as just one byte.
The next thing to note is that there was hardware support for displacing the whole screen seven pixels. Note that this didn't in any way affect any graphics - it just made everything sent to the TV a little bit displaced.
So 2) made it possible to really smooth scroll 7 pixels away. Then you moved every character around - which for a full screen was exactly 1000 bytes, which the computer could cope with, while at the same time you moved the scrolling register back 7 pixels. 8 - 7 = 1 means that it looked like you scrolled yet another single pixel... and then it just continued that way. So 1) and 2) combined made the illusion of true smooth scrolling!
After that a third thing came into play: raster interrupts. This means that the CPU gets an interrupt when the TV/monitor was about to begin drawing a scan line at a specified location. That technique made it possible to create split screen so that you weren't required to scroll the entire screen as opposed to my first description.
And to be even more into details: even if you didn't want a split screen, the raster interrupt was very important anyway: because it was just as important then as it is today (but today the framework hides this from you) to update the screen at the right time. Modifying the "scroll register" when the TV/monitor was updating anywhere on the visible area would cause an effect called "tearing" - where you clearly notice the two parts of the screen are one pixel off sync with each other.
What more is there to say? Well, the technique with remapped character sets made it possible to do some animations very easily. For example conveyors and cog wheels and stuff could be animated by constantly changing the appearance of the "characters" representing them on screen. So a conveyor spanning the entire screen width could look as it was spinning everywhere by just changing a single byte in the character map.

I did something similar way back in the 90s, using two different approaches.
The first one involved "windowing," which was supported by the VESA SVGA standard. Some cards implemented it correctly. Basically, if you had a frame buffer/video RAM larger than the displayable area, you could draw a large bitmap and give the system coordinates for a window within that area that you wanted to display. By changing those coordinates, you could scroll around without having to re-fill the frame buffer.
The other method relied on manipulating the BLT method used to get a completed frame into the frame buffer. Blitting a page to the frame buffer that was the same size as the screen is easy and efficient.
I found this old 286 assembler code (on a functioning 17 year old floppy!) that copied a 64000 byte (320x200) screen from an off-screen page to the video buffer:
Procedure flip; assembler;
{ This copies the entire screen at "source" to destination }
asm
push ds
mov ax, [Dest]
mov es, ax
mov ax, [Source]
mov ds, ax
xor si, si
xor di, di
mov cx, 32000
rep movsw
pop ds
end;
The rep movsw moved CX words (where a word is two bytes in this case). This was very efficient since it's basically a single instruction that tells the CPU to move the whole thing as quickly as possible.
However, if you had a larger buffer (say, 1024*200 for a side scroller), you could just as easily use a nested loop, and copy a single row of pixels per loop. In the 1024-pixel wide buffer, for instance, you could copy bytes:
start count
0+left 320
1024+left 320
...
255*1024+left 320
where left is the x coordinate within the large background image that you want to start at (left side of the screen).
Of course, in 16-bit mode, some magic and manipulation of segment pointers (ES, DS) was required to get a buffer larger than 64KB (in reality, multiple adjacent 64k buffers), but it worked pretty well.
There were probably better solutions to this problem (and definitely better ones to use today), but it worked for me.

Arcade games frequently featured customized video chips or discrete logic to allow scrolling without the CPU having to do (much) work. The approach would be similar to what danbystrom was describing on the C-64.
Basically the graphics hardware took care of fine scrolling characters (or tiles) and the CPU then handled replacing all tiles once the scrolling registers have reached their limit. I am currently looking at the Irem m-52 board which deals with multiple scrolling backgrounds in hardware. Schematics can be found online.

For right scrolling on the Commodore Amiga we used the Copper to right-shift the screen up to 16 pixels. When the screen had shifted we added 2 bytes to the start address of the screen buffer while on the right side we used the Blitter to copy graphics from the main memory to the screen buffer. We would set the screen buffer slightly larger than the screen view so that we could copy the graphics without you seeing a flickering effect from the copying on the right side of the viewport.

Related

SDL2 + openGL ES 2.0 frame rate performance boost with less CPU load

I am developing on a linux system using latest (at the moment) SDL2 (2.0.8) + openGL ES 2.0 (GLSL 1.0) eventually targeting a raspberry pi 3 board. I have so far done a few things like drawing text with freetype, drawing lines, text boxes (editable), text lists, waveform boxes (all i need to pass to a function is an array of vertices) and other shapes with glDrawArrays(). Now, there are things that need to be refreshed at, let's say, 10 times per sec and others that need 1 time per second. What would be the best approach to skip re-rendering everything at the rate of 10 times per sec? Because obviously openGL works by drawing everything from scratch on every 'frame'. However i know and you know that other approaches exist that include: rendering on top of the screen you already have or taking a screenshot and rendering on top of it only the fast changing things as well as other solutions. What do you thing would be the best approach to skip re-doing everything before calling SDL_GL_SwapWindow() ? How can i take a screen shot and render it on the invisible buffer then render only the fast changing objects and then call SDL_GL_SwapWindow() ?
This is a screen shot of the app so far drawing basic things
Thanks in advance.
i eventually had to realize that i should not have posted the question in the first place but since this is a place where people learn from others i now feel somewhat nicer :) . So, the thing i had to do was to simply stop clearing the invisible buffer (i will call it that for simplicity) and render on top of it only controls that change. Those that change are updated by covering the area that they take by a rectangle and then draw new stuff on that area. I have already done it and the frame rate just 'exploded'. I do not really think that there is a better approach since the way i do it requires no action at all. All i had to do was to add a few if conditions that selectively rendered or skipped every time the execution reached the point where functions iterate through the controls that have to be drawn on screen and therefore decide what to render and what not. However a well thought set of structures is required for every control instead of declaring and defining endlessly global variables which will only makes things confusing and difficult to maintain.
Regards to all.

What is the point of having metric mapping modes like MM_LOMETRIC, and MM_LOENGLISH?

Page number 47 of book Programming with MFC (second edition) by Jeff Prosise (chapter 2: Drawing in a window), has the following statement.
One thing to keep in mind when you use the metric mapping modes is that on display screens, 1 logical inch usually doesn't equal 1 physical inch. In other words, if you draw a line that's 100 units long in the MM_LOENGLISH mapping mode, the line probably won't be exactly 1 inch long.
My question is, if windows cannot give any guarantee on the physical dimensions of things we draw using metric mapping modes, then what is the point of having such a mapping mode? Are metric mapping modes relevant only for printers, and completely irrelevant for monitors?
In modern monitors, with digital ports like HDMI/Display port, can't windows OS get physical dimensions of the screen, thus making it possible to draw things using metric dimensions (inches, rather than pixels, note that the current resolution of the monitor will already be known to the OS)?
One of the ideas behind the logical inch is that viewing distance to a monitor was typically larger than the distance to a printed page, so it made sense to have the default of a logical inch on a typical monitor be a bit larger than a physical inch, especially in an era where WYSIWYG was taking off. Rather than put all of the burden to adjust for device resolution on the application, the logical inch lets WYSIWYG application developer think in terms of distances and sizes on the printed page and not have to work in pixels or dots which varied widely from device to device (and especially from monitor to printer).
Another issue was that, with the relatively limited resolutions of early monitors, it just wasn't practical to show legible text as small as typically printed text. For example, text was commonly printed at 6 lines per inch. At typical monitor resolutions, this might mean 12 pixels per line, which really limits font design and legibility (especially before anti-aliased and sub-pixel rendered text was practical). Making the logical inch default to 120-130% of an actual inch (on a typical monitor of the era) means lines of text would be 16 pixels high, making typographic niceties like serifs and italic more tenable (though still not pretty).
Also keep in mind that the user controls the logical inch and could very well set the logical inch so that it matches the physical inch if that suited their needs.
The logical units are still useful today, even as monitors have resolutions approaching those of older laser printers. Consider designing slides for a presentation that will be projected and also printed as handouts. The projection size is a function of the projector's optics and its distance from the screen. There's no way, even with two-way communication between the OS and the display device for the OS to determine the actual physical size (nor would it be useful for most applications).
I'm not a CSS expert, but it's my understanding that even when working in CSS's px units, you're working in a logical unit that may not be exactly the size of a physical pixel. It's supposed to take into account the actual resolution of the device and the typical viewing distance, allowing web designers to make the same 96-per-inch assumption that native application developers had long been using.

XOrg server code that draws the mouse pointer

I'm writing an OpenGL application in Linux using Xlib and GLX. I would like to use the mouse pointer to draw and to drag objects in the window. But no matter what method I use to draw or move graphic objects, there is always a very noticeable lag between the actual mouse pointer position (as drawn by the X server) and the positions of the objects I draw with the pointer coordinates I get from Xlib (XQueryPointer or the X events) or reading directly from /dev/input/event*
So my question is: what code is used by the XOrg server to actually draw the mouse pointer on the screen? So I could use the same type of code to place the graphics on the screen and have the mouse pointer and the graphic objects positions always perfectly aligned.
Even a pointer to the relevant source file(s) of XOrg would be great.
So my question is: what code is used by the XOrg server to actually draw the mouse pointer on the screen?
If everything goes right no code at all is drawing the mouse pointer. So called "hardware cursor" support has been around for decades. Essentially it's what's being known as a "sprite engine" in the hardware that takes some small picture and a pair of values (x,y) where it shall appear on the screen. At every frame the graphics hardware sends to the display the cursor image is overlaid at the specific position.
The graphics system is constantly updating the position values based on the input device movements.
Of course there is also graphics hardware that does not have this kind of "sprite engine". But the trick here is, to update often, to update fast and to update late.
But no matter what method I use to draw or move graphic objects, there is always a very noticeable lag between the actual mouse pointer position (as drawn by the X server) and the positions of the objects I draw with the pointer coordinates I get from Xlib (XQueryPointer or the X events) or reading directly from /dev/input/event*
Yes, that happens if you read it at integrate it into your image at the wrong time. The key ingredient to minimizing latency is to draw as late as possible and to integrate as much input for as long as long as possible before you absolutely have to draw things to meet the V-Sync deadline. And the most important trick is not draw what's been in the past, but to draw what will be the state of affairs right at the moment the picture appears on screen. I.e. you have to predict the input for the next couple of frames drawn and use that.
The Kalman filter has become the de-facto standard method for this.

How to present to a different window using IDXGISwapChain and ID3D11Device/ID3D11DeviceContext?

Previously, when I've built tools, I've used D3D version 9, where the call to Present() can take a target window and rectangle, and you can thus draw from a single device into many different windows. This is great when using D3D to accelerate desktop applications, and/or building tools rather than games!
I've also built a game renderer with D3D11 before, which is also great, because the state management and threading interfaces are well designed, and you can even target D3D 9 level hardware that's still pretty common in the wild (as opposed to D3D 10, which can only target 10-and-better).
However, now I want to build a tool with D3D11. Unfortunately, the IDXGISwapChain that comes back from D3D11CreateDeviceAndSwapChain() seems to "remember" its HWND, and only wants to present to that window. This is highly inconvenient, because I may have a large number of windows that each need fairly simple graphics drawn to them, and only in response to a WM_PAINT (again, this is for a tool, not a game).
What I want to do is to save back buffer RAM. Specifically, I used to be able to create a single back buffer, the size of the desktop, that I knew could cover all rendering needs, and then that would be the single copy allocated. Even if there are 10 overlapping windows, they all render through the same back buffer, so there's no waste of memory beyond the initial allocation. I can create textures that are not swap chains, and use them as "render targets," but I can't find a good way of presenting to an arbitrary rectangle of an arbitrary client window, without reading back the bitmap and copying it into a DIBSection, which would be really inefficient. Also, there is no way to create many swap chains, and having them share the same back buffer.
The best I can do is to create one swap chain per window, and resize the back buffer of each swap chain to be really small, except when I render to the swap chain, at which point I resize it to match the window. However, this seems inefficient, because resizing the targets is not a "free" operation AFAICT. So, is there a better way?
The answer I ended up with was to create one back buffer per separate display area, and not size it to the back buffer. I imagine that, in a world where desktop composition and transparency can happy to "anything" behind my back, that's probably helpful to the system.
Learn to love the VVM system, I guess :-) (VVM for Virtual Video Memory)

Get mouse deltas under linux (xorg)

Is there a convenient way to get mouse deltas (e.g. mickeys) under X/linux? I know that I could read from /dev/input/mice, but that requires root access and seems a bit too low-level for me.
If this is for a game, i.e. an application with an actual X window, the typical approach used to be:
Grab the mouse, so all mouse input goes to your window
Warp the mouse pointer to the center of your window, to give maximum amount of space to move
On each mouse movement event, subtract the center of the window from the reported position; this gives you a "delta event"
Goto 2
I write "used to be" because there might be better ways to solve this now, haven't looked into it for a while.
This of course won't give you a resolution that is higher than what X is reporting to applications, i.e. pixels. If you're after sub-pixel reporting, I think you need to go lower, perhaps read the device directly as you suggest.

Resources