Simulate a simple Graphic Card - graphics

Ok.I can find simulation designs for simple architectures.(Edit :definitely like not x86) For example use an int as the program counter , use a byte array as the Memory and so on.But how can I simulate the graphic card's(the simplest graphic card imaginable) functionality ?
like use an array to represent each pixel and "paint" each pixel one by one.
But when to paint- synchronized with CPU or asynchronously ? Who stores graphic data in that array ? Is there an instruction for storing a pixel and painting a pixel ?
Please consider all the question marks ('?') doesn't mean "you are asking a lot of questions" but explains the problem itself - How to simulate a Graphic Card ?
Edit : LINK to a basic implementation design for CPU+Memory simulation

Graphic cards typically carry a number of KBs or MBs of memory that stores colors of individual pixels that are then displayed on the screen. The card scans this memory a number of times per second turning the numeric representation of pixel colors into video signals (analog or digital) that the display understands and visualizes.
The CPU has access to this memory and whenever it changes it, the card eventually translates the new color data into appropriate video signals and the display shows the updated picture. The card does all the processing asynchronously and doesn't need much help from the CPU. From the CPU's point of view it's pretty much like write the new pixel color into the graphic card's memory at the location corresponding to the coordinates of the pixel and forget about it. It may be a little more complex in reality (due to poor synchronization artifacts such as tearing, snow and the like), but that's the gist of it.
When you simulate a graphic card, you need to somehow mirror the memory of the simulated card in the physical graphic card's memory. If in the OS you can have direct access to the physical graphic card's memory, it's an easy task. Simply implement writing to the memory of your emulated computer something like this:
void MemoryWrite(unsigned long Address, unsigned char Value)
{
if ((Address >= SimulatedCardVideoMemoryStart) &&
(Address - SimulatedCardVideoMemoryStart < SimulatedCardVideoMemorySize))
{
PhysicalCard[Address - SimulatedCardVideoMemoryStart] = Value;
}
EmulatedComputerMemory[Address] = Value;
}
The above, of course, assumes that the simulated card has exactly the same resolution (say, 1024x768) and pixel representation (say, 3 bytes per pixel, first byte for red, second for green and third for blue) as the physical card. In real life things can be slightly more complex, but again, that's the general idea.
You can access the physical card's memory directly in MSDOS or on a bare x86 PC without any OS if you make your code bootable by the PC BIOS and limit it to using only the BIOS service functions (interrupts) and direct hardware access for all the other PC devices.
Btw, it will probably be very easy to implement your emulator as a DOS program and run it either directly in Windows XP (Vista and 7 have extremely limited support for DOS apps in 32-bit editions and none in 64-bit editions; you may, however, install XP Mode, which is XP in a VM in 7) or better yet in something like DOSBox, which appears to be available for multiple OSes.
If you implement the thing as a Windows program, you will have to use either GDI or DirectX in order to draw something on the screen. Unless I'm mistaken, neither of these two options lets you access the physical card's memory directly such that changes in it would be automatically displayed.
Drawing individual pixels on the screen using GDI or DirectX may be expensive if there's a lot of rendering. Redrawing all simulated card's pixels every time when one of them gets changed amounts to the same performance problem. The best solution is probably to update the screen 25-50 times a second and update only the parts that have changed since the last redraw. Subdivide the simulated card's buffer into smaller buffers representing rectangular areas of, say, 64x64 pixels, mark these buffers as "dirty" whenever the emulator writes to them and mark them as "clean" when they've been drawn on the screen. You may set up a periodic timer driving screen redraws and do them in a separate thread. You should be able to do something similar to this in Linux, but I don't know much about graphics programming there.

Related

Texture Memory Usage

I am trying to find out how much texture memory does consumed from my application. There are following types of texture and calculations by me:
RGB textures -> textureWidth * textureHeight * 3 (memory usage)
RGBA textures -> textureWidth * textureHeight * 4 (memory usage)
As a result I am wondering that can graphics driver allocate much more memory than above calculated memory?
A few simple answers:
To the best of my knowledge, it's been around 2 decades since (the majority of) hardware devices supported packed 24bit RGB data. In modern hardware this is usually represented in an "XRGB" (or equivalent) format where there is one padding byte per pixel. It is painful in hardware to efficiently handle pixels that straddle cache lines etc. Further, since many applications (read "games") use texture compression, having support for fully packed 24bit seems a bit redundant.
Texture dimensions: If a texture's dimensions are not 'nice' for the particular hardware (e.g,. maybe, say, x is not a multiple of 16bytes, or, say, 4x4 or 8x8 blocks), then the driver may pad the physical size of the texture.
Finally, if you have MIP mapping (and you do want this for performance as well as quality reasons), it will expand the texture size by around 33%.
In addition to the answer from Simon F, it's also worth noting that badly written applications can force the driver to allocate memory for multiple copies of the same texture. This can occur if it attempts to modify the texture while it is still referenced by an in-flight rendering operation. This is commonly known as "resource copy-on-write" or "resource ghosting".
Blog here explains in more detail:
https://community.arm.com/developer/tools-software/graphics/b/blog/posts/mali-performance-6-efficiently-updating-dynamic-resources

For emulating the Gameboy, why does it matter that the memory is broken up into different areas?

So I'm writing a gameboy emulator, and I'm not 100% sure why other projects took the time to break up the memory into proper categories. I don't know if there is a major technical dilemma I'm missing (maybe handling illegal parameters in instructions?), but it seems like the only thing that matters is that the address given by a write instruction is retrievable by the proper read instruction. So for a sub question, if I'm working under the assumption that the assembly is perfectly legal (meaning nothing is trying to read/write where it can't), can I just make a big array and read and write to it?
Note that this is a conceptual question and that I am aware a big array would be a memory hog, I'm not necessarily looking for the best way to do it, simply trying to learn how it works and why other emulator developers did it the way they did.
You are going to have read only memory, read/write memory and memory mapped I/O (peripherals etc). So you need to decode the address to some extent to break it into the major categories, then for the peripherals you have to emulate all of those so you have to get very detailed in your address decoding.
For the peripherals you will need to detect a read/write to some address which you cannot do by simply landing the writes in an array (two writes of the same value for example make a difference, you cant just scan some array to look for changes you have to trigger on reads and writes and perform the hardware action).
If you wish to be cycle accurate you will also need to know the timings for the rams and roms in order to mimic those, depending on how many banks of each or if timing is dependent on that you will need to decode the address further.
Hardware decodes these addresses to the same level so if you are emulating hardware then you need to...emulate hardware...and do the same amount of address decoding.
I'm going to be gameboy specific here. Look at gameboy's address space map. The address space itself is divided, it's not that emulators do it. Hardware itself operates that way.
Here's some of the regions that can't be implemented as just an array:
0x0000-0x3FFF. First bank of a ROM. It's read-only but not quite. Read the next one
0x4000-0x7FFF. Switchable ROM bank, it's also not quite read-only. Cartridges that don't fit into gameboy's address space contain memory bank controller. ROM will write to some specific read-only ROM regions to actually select which ROM bank is mapped into 0x4000-0x7FFF address range. So you have to detect these writes and then redirect reads into the selected ROM bank.
0xA000-0xBFFF. Switchable RAM bank. Same thing as with switchable ROM banks but now for RAM. Cartridges may contain additional RAM that's being mapped into gameboy's address space. Which bank of the RAM is mapped is controlled, again, by writes to specific read-only regions.
0xFF00-0xFF4B. IO ports. Here you have hardware registers mapped into address space. Gameboy has several hardware components each with it's own registers and even memory (display controller, sound processor, timers etc). To control that hardware ROM reads and writes into the IO ports. You obviously have to detect these writes so you can emulate the hardware they correspond to. It's not just CPU and memory you have to emulate. I would even say that the least part of it and the easy one. For example, it much harder to get display controller and sound channels right. They have complicated logic, bugs and very tricky behaviour that's not documented very well but is crucial to achieve accurate emulation. Wave channel in particular gave me a hard time.

Bios always jumps on first sector (512B) why ? is this some kind of BIOS limitation?

First I am not the expert in Booting but would like to understand it better way.
1) System boots goes to BIOS and BIOS goes to first boot sector which is first 512 bytes and from there it reads the first 440 bytes. This is called as bootloader or boot strap, this code in turn does everything for us, it jumps (for next stages), reads partition table, follows fdisk signature (boot flag) if necessary etc. I have very basic question in mind. May be it will be a dumb question :(
2) why BIOS can read only first sector (512 Bytes), because of this tiny space boot loaders cant fit there, they have to jump stage to stage. Why cant BIOS read more than 512 bytes ? For time being say BIOS is able to read first 100MB. 100MB is more than enough to fit boot loader there, no need to jump, we can write nice full featured boot loaders for it which can give nice GUI to end user. (UEFI is exactly dong the same by creating separate partition)
3) Why only 512Bytes ? why BIOS cant go beyond that ?
4) I hear BIOS is designed for 8-bit processors something like (maybe I am completely wrong). Can you please explain me this ?
Sorry for the long description but I am new to stackoverflow. I wanted to add nice disk layout diagram but it says I should have atleast 10 reputations. Thanks in advance.
When PC's first came on the scene they had no more than 640k ram, and those were the deluxe models.
The first IBM XT (Affectionately named 'The Tank' because it was light military green and had a solid steel case) only had an 8k BIOS chip, with 512k (Half a megabyte) becoming the standard when the compatible wars started.
With these first PC's there was no concept of the hard disk, hard disks where large multi platter things that looked like a cake shelf with a glass cover over (Like you might see in a coffee shop) and were usually so heavy that they took 2 or 3 people to lift them.
Often these large platter cases where only ever connected to mainframes at the time, and where way too large to even consider being provided for a desktop PC, so the floppy disk was used instead.
The first ever round of these floppy disks could hold no more than approx 300k and were 5.25 inches square by about 2.5 millimeters thick, some were double sided, so could hold 600k. There was also a range of different types of software and disk encoder chips that could read/write different densities, but the bottom line was space on them was very, very limited.
Couple that with the fact, that most BIOSes at the time where only around about 16k to 32k in size, you had to fit as much as possible to get the machine up and running , and in the case of the IBM fit a rom based basic interpreter in there too so that with no external operating system, the computer was still usable for general computing tasks.
All of these restrictions meant that smaller was better.
Rather than have a flattened disk size with a large monolithic loader on, it was better to 'format' these floppy disks so that most of the space was user space, and allowed the end user to effectively customize the boot software (EG: remove parts of the OS they didn't use) so the initial loader that kick started everything was restricted to the first 512 bytes of the disk.
The other main reason was the sheer verity of different disk systems that were available back then (Remember this was way before the industry was standardized) so putting something right at the beginning of the disk was guaranteed to be found no matter how weirdly the running OS set the rest of the disk up for it's own use, because no seeking was involved, you didn't have to look at format marks and attempt to understand some weird directory format. You simply moved the drive head to it's rest position, then read 512 bytes, simple as that.
Once phoenix produced the first clone of the IBM Bios, and won over IBM in court when the company tried to sue them for theft of intellectual property, the flood gates opened. Almost overnight, everyone started to make BIOS systems, and the PC market as we know to day exploded into a mess of standards in interfaces of all different types.
Pretty soon, the vendor lock-in started, so the IEE/ANSI/ISO and other standards bodies started to lay down the law, by making a specification on how EVERYONE had to remain compatible with everyone else, those standards have held true right through too today's modern era of computing.
By the time we got massive hard drives on the scene and bootable CD's, USB's and all manner of other things, these standards (of which the 512b boot sector was part) where SO DEEPLY entrenched in the grander scheme of things, that it was impossible to change them.
Only a few brave companies dared venture into that territory and had limited success, Sun Micro Systems for example were one of the brave few. If you look at a Sun Raq3 (You can pick them up on Ebay for next to nothing) they have a boot loader that mimics exactly what the PC disk based loader does, but it boots out of rom until the second stage , where up on it immediately looks for a /boot partition on a standard Linux based disk layout, thus even though disks in these machines still have the standard 512 byte boot block, it's actually not used.
Hope that gives you the insight you want, given that I lived through a lot of it I can remember a lot of it too including the machines that didn't use the 512 byte boot block.

Is it expensive to do trilinear texture mapping in a shader?

Imagine a big procedural world that worths more than 300k triangles (about 5-10k per 8x8x8m chunk). Since it's procedural, I need to create all the data by code. I've managed to make a good normal smoothing algorithm, and now I'm going to use textures (everyone needs textures, you don't want to walk around simple-colored world, do you?). The problem is that I don't know where it's better to calculate UV sets on (I'm using triplanar texture mapping).
I have three approaches:
1. Do calculations on CPU, then upload to the GPU (mesh is not being modified every frame - 90% of time it stays static, calculations is done per chunk when chunk changes);
2. Do calculations on GPU using vertex shader (calculations is done for every triangle in every frame, but the meshes is kinda big - is it expensive to do so every frame?);
3. Move the algorithm onto OpenCL (calculations is done per chunk when chunk changes, I already use OpenCL to do meshing of my data) and call the kernel when the mesh is being changed (inexpensive, but all my OpenCL experience is based on modifying existing code, but still I have some C background, so it may take a long time before I'll get it to work);
Which approach is better in respect that my little experiment is already little heavy for a mid-range hardware?
I'm using C# (.NET 4.0) with SlimDX and Cloo.

In directx, if reuse a slot does gpu keep previous resource in its memory? Also can original processor resources be safely altered?

I was writing this question about directx and the following questions were part of it, but I realized I needed to separate them out.
If something isn't in a "slot" (register) on the GPU, will it have to be retransferred to the GPU to be used again, i.e. if put texture A in register t0, then later put texture B in register t0, is texture t0 no longer available on the GPU? Or is it still resident in the GPU memory, but I will have to place a call to load it into a texture register to get at it? Or something else entirely?
In a similar vein do calls to PSSetShaders, or PSSetShaderResource, or IASetVertexBuffers, etc.... block and copy data to the GPU before returning, so after the call one can alter or even free the resources they were based on because it is now resident on the GPU?
I guess this is more than one question, but I expect I'll get trouble if I try asking too many directx questions in one day (thought I think these are honestly decent questions about which the msdn documentation remains pretty silent, even if they are all newbie questions).
if put texture A in regsiter t0, then later put texture B in register t0, is texture t0 no longer available on the GPU?
It is no longer bound to the texture register so will not get applied to any polygons. You will have to bind it to a texture register again to use it.
Or is it still resident in the GPU memory, but I will have to place a call to load it into a texture register to get at it?
Typically they will stay in video memory until enough other resources have been loaded in that it needs to reclaim the memory. This was more obvious in DirectX 9 when you had to specify which memory pool to place resource in. Now everything is effectively in what was the D3DPOOL_MANAGED memory pool in Direct3D 9. When you set the texture register to use the texture it will be fast as long as the textures is still in video memory.
In a similar vein do calls to PSSetShaders, or PSSetShaderResource, or IASetVertexBuffers, etc.... block and copy data to the GPU before returning, so after the call one can alter or even free the resources they were based on because it is now resident on the GPU?
DirectX manages the resources for you and tries to keep them in video memory as long as it can.

Resources