How to make ARM9 custom device emulator? - linux

I am working on ARM 9 processor with 266 Mhz with fpu support and 32 MB RAM, I run linux on it.I want to emulate it on pc ( I have both linux and windows availabe on pc ). I want to profile my cycle counts, run my cross-compiled executables directly in emulator. Is there any opensource project available to create emulator easily, How much change/code/effort does I need to write to make custom emulator with it ? It would be great if you provide me tutorials ot other reference to get kick-start.
Thanks & Regards,
Sunny.

Do you want to emulate just the processor or an entire machine?
Emulate a CPU is very easy, just define a structure containing all CPU registers, create an array to simulate RAM and then just emulate like this:
cpu_ticks = 0; // counter for cpu cycles
while (true) {
opcode = RAM[CPU.PC++]; // Fetch opcode and increment program counter
switch (opcode) {
case 0x12: // invented opcode for "MOV A,B"
CPU.A = CPU.B;
cpu_ticks += 4; // imagine you need 4 ticks for this operation
set_cpu_flags_mov();
break;
case 0x23: // invented opcode for "ADD A, #"
CPU.A += RAM[CPU. PC++]; // get operand from memory
cpu_ticks += 8;
set_cpu_flags_add();
break;
case 0x45: // invented opcode for "JP Z, #"
if (CPU.FLAGS.Z) CPU.PC=RAM[CPU.PC++]; // jump
else CPU.PC++; // continue
cpu_ticks += 12;
set_cpu_flags_jump();
break;
...
}
handle_interrupts();
}
Emulate an entire machine is much much harder... you need to emulate LCD controllers, memory mapped registers, memory banks controllers, DMAs, input devices, sound, I/O stuff... also probably you need a dump from the bios and operative system... I don't know the ARM processor but if it has pipelines, caches and such things, things get more complicated for timing.
If you have all hardware parts fully documented, there's no problem but if you need to reverse engineer or guess how the emulated machine works... you will have a hard time.
Start here: http://infocenter.arm.com/help/index.jsp and download the "Technical Reference Manual" for your processor.
And for general emulation questions: http://www.google.es/search?q=how+to+write+an+emulator

You should give a look at QEMU.
I don't understand however, why do you need a complete emulator ?
You can already a lot of profiling without emulator. What are the gains you expect from having a system emulator ?

Related

Accessing memory pointers in hardware registers

I'm working on enhancing the stock ahci driver provided in Linux in order to perform some needed tasks. I'm at the point of attempting to issue commands to the AHCI HBA for the hard drive to process. However, whenever I do so, my system locks up and reboots. Trying to explain the process of issuing a command to an AHCI drive is far to much for this question. If needed, reference this link for the full discussion (the process is rather defined all over because there are several pieces, however, ch 4 has the data structures necessary).
Essentially, one writes the appropriate structures into memory regions defined by either the BIOS or the OS. The first memory region I should write to is the Command List Base Address contained in the register PxCLB (and PxCLBU if 64-bit addressing applies). My system is 64 bits and so I'm trying to getting both 32-bit registers. My code is essentially this:
void __iomem * pbase = ahci_port_base(ap);
u32 __iomem *temp = (u32*)(pbase + PORT_LST_ADDR);
struct ahci_cmd_hdr *cmd_hdr = NULL;
cmd_hdr = (struct ahci_cmd_hdr*)(u64)
((u64)(*(temp + PORT_LST_ADDR_HI)) << 32 | *temp);
pr_info("%s:%d cmd_list is %p\n", __func__, __LINE__, cmd_hdr);
// problems with this next line, makes the system reboot
//pr_info("%s:%d cl[0]:0x%08x\n", __func__, __LINE__, cmd_hdr->opts);
The function ahci_port_base() is found in the ahci driver (at least it is for CentOS 6.x). Basically, it returns the proper address for that port in the AHCI memory region. PORT_LST_ADDR and PORT_LST_ADDR_HI are both macros defined in that driver. The address that I get after getting both the high and low addresses is usually something like 0x0000000037900000. Is this memory address in a space that I cannot simply dereference it?
I'm hitting my head against the wall at this point because this link shows that accessing it in this manner is essentially how it's done.
The address that I get after getting both the high and low addresses
is usually something like 0x0000000037900000. Is this memory address
in a space that I cannot simply dereference it?
Yes, you are correct - that's a bus address, and you can't just dereference it because paging is enabled. (You shouldn't be just dereferencing the iomapped addresses either - you should be using readl() / writel() for those, but the breakage here is more subtle).
It looks like the right way to access the ahci_cmd_hdr in that driver is:
struct ahci_port_priv *pp = ap->private_data;
cmd_hdr = pp->cmd_slot;

Converting EFI memory Map to E820 map

I am new to Linux and learing about how Linux comes to know about the avaible Physical Mmeory .I came to know there are some BIOS system call int 0x15 which willl gives you E20 memory Map.
Now I find a piece of code where its says its defination for converting EFI memory map to E820 Memory map.What does above mean??
Is it meant underlaying motherboard firmware is EFI based but since this code runs on x86 we need to convert it to E820 Memory Map
If so ,does x86 knows only about E820 memory maps??
What is difference between E820 and EFI memory maps??
Looking forward to get detailed answer on same.
In both cases, what you have is your firmware (BIOS or EFI) which is responsible for detecting what memory (and how much) is actually physically plugged in, and the operating system which needs to know this information in some format.
Is it meant underlaying motherboard firmware is EFI based but since this code runs on x86 we need to convert it to E820 Memory Map
Your confusion here is that EFI and x86 are incompatible - they aren't. EFI firmware has its own mechanisms for reporting available memory - specifically, you can use the GetMemoryMap boot service (before you invoke ExitBootServices) to retrieve the memory map from the firmware. However, critically, this memory map is in the format the EFI firmware wishes to report (EFI_MEMORY_DESCRIPTOR) rather than E820. In this scenario, you would not also attempt int 15h, since you already have the information you need.
I suspect what the Linux kernel does is to use the E820 format as its internal representation of memory on the x86 architecture. However, when booting EFI, the kernel must use the EFI firmware boot services, but chooses the convert the answer it gets back to the E820 format.
This is not a necessary thing for a kernel you are writing to do. You simply need to know how memory is mapped.
It is also the case that some bootloaders will provide this information for you, for example GRUB. Part of the multiboot specification allows you to instruct the bootloader that it must provide this information to your kernel.
For more on this, the ever-useful osdev wiki has code samples etc. The relevant sections for getting memory maps from grub are here.
Further points:
The OS needs to understand what memory is mapped where for several reasons. One is to avoid using physical memory where firmware services reside, but the other is for communication with devices who share memory with the CPU. The video buffer is a common example of this.
Secondly, listing the memory map in EFI is not too difficult. If you haven't already discovered it, the UEFI shell that comes with some firmware has a memmap command to display the memory map. If you want to implement this yourself, a quick and dirty way to do that looks like this:
EFI_STATUS EFIAPI PrintMemoryMap(EFI_SYSTEM_TABLE* SystemTable)
{
EFI_STATUS status = EFI_SUCCESS;
UINTN MemMapSize = sizeof(EFI_MEMORY_DESCRIPTOR)*16;
UINTN MemMapSizeOut = MemMapSize;
UINTN MemMapKey = 0; UINTN MemMapDescriptorSize = 0;
UINT32 MemMapDescriptorVersion = 0;
UINTN DescriptorCount = 0;
UINTN i = 0;
uint8_t* buffer = NULL;
EFI_MEMORY_DESCRIPTOR* MemoryDescriptorPtr = NULL;
do
{
buffer = AllocatePool(MemMapSize);
if ( buffer == NULL ) break;
status = gBS->GetMemoryMap(&MemMapSizeOut, (EFI_MEMORY_DESCRIPTOR*)buffer,
&MemMapKey, &MemMapDescriptorSize, &MemMapDescriptorVersion);
Print(L"MemoryMap: Status %x\n", status);
if ( status != EFI_SUCCESS )
{
FreePool(buffer);
MemMapSize += sizeof(EFI_MEMORY_DESCRIPTOR)*16;
}
} while ( status != EFI_SUCCESS );
if ( buffer != NULL )
{
DescriptorCount = MemMapSizeOut / MemMapDescriptorSize;
MemoryDescriptorPtr = (EFI_MEMORY_DESCRIPTOR*)buffer;
Print(L"MemoryMap: DescriptorCount %d\n", DescriptorCount);
for ( i = 0; i < DescriptorCount; i++ )
{
MemoryDescriptorPtr = (EFI_MEMORY_DESCRIPTOR*)(buffer + (i*MemMapDescriptorSize));
Print(L"Type: %d PhsyicalStart: %lx VirtualStart: %lx NumberofPages: %d Attribute %lx\n",
MemoryDescriptorPtr->Type, MemoryDescriptorPtr->PhysicalStart,
MemoryDescriptorPtr->VirtualStart, MemoryDescriptorPtr->NumberOfPages,
MemoryDescriptorPtr->Attribute);
}
FreePool(buffer);
}
return status;
}
This is a reasonably straightforward function. GetMemoryMap complains bitterly if you don't pass in a large enough buffer, so we keep incrementing the buffer size until we have enough space. Then we loop and print. Be aware that sizeof(EFI_MEMORY_DESCRIPTOR) is in fact not the difference between structs in the output buffer - use the returned size calculation shown above, or you'll end up with a much larger table than you really have (and the address spaces will all look wrong).
It wouldn't be massively difficult to decide on a common format with E820 from this table.

instruction set emulator guide

I am interested in writing emulators like for gameboy and other handheld consoles, but I read the first step is to emulate the instruction set. I found a link here that said for beginners to emulate the Commodore 64 8-bit microprocessor, the thing is I don't know a thing about emulating instruction sets. I know mips instruction set, so I think I can manage understanding other instruction sets, but the problem is what is it meant by emulating them?
NOTE: If someone can provide me with a step-by-step guide to instruction set emulation for beginners, I would really appreciate it.
NOTE #2: I am planning to write in C.
NOTE #3: This is my first attempt at learning the whole emulation thing.
Thanks
EDIT: I found this site that is a detailed step-by-step guide to writing an emulator which seems promising. I'll start reading it, and hope it helps other people who are looking into writing emulators too.
Emulator 101
An instruction set emulator is a software program that reads binary data from a software device and carries out the instructions that data contains as if it were a physical microprocessor accessing physical data.
The Commodore 64 used a 6502 Microprocessor. I wrote an emulator for this processor once. The first thing you need to do is read the datasheets on the processor and learn about its behavior. What sort of opcodes does it have, what about memory addressing, method of IO. What are its registers? How does it start executing? These are all questions you need to be able to answer before you can write an emulator.
Here is a general overview of how it would look like in C (Not 100% accurate):
uint8_t RAM[65536]; //Declare a memory buffer for emulated RAM (64k)
uint16_t A; //Declare Accumulator
uint16_t X; //Declare X register
uint16_t Y; //Declare Y register
uint16_t PC = 0; //Declare Program counter, start executing at address 0
uint16_t FLAGS = 0 //Start with all flags cleared;
//Return 1 if the carry flag is set 0 otherwise, in this example, the 3rd bit is
//the carry flag (not true for actual 6502)
#define CARRY_FLAG(flags) ((0x4 & flags) >> 2)
#define ADC 0x69
#define LDA 0xA9
while (executing) {
switch(RAM[PC]) { //Grab the opcode at the program counter
case ADC: //Add with carry
A = X + RAM[PC+1] + CARRY_FLAG(FLAGS);
UpdateFlags(A);
PC += ADC_SIZE;
break;
case LDA: //Load accumulator
A = RAM[PC+1];
UpdateFlags(X);
PC += MOV_SIZE;
break;
default:
//Invalid opcode!
}
}
According to this reference ADC actually has 8 opcodes in the 6502 processor, which means you will have 8 different ADC in your switch statement, each one for different opcodes and memory addressing schemes. You will have to deal with endianess and byte order, and of course pointers. I would get a solid understanding of pointer and type casting in C if you dont already have one. To manipulate the flags register you have to have a solid understanding of bitwise operations in C. If you are clever you can make use of C macros and even function pointers to save yourself some work, as the CARRY_FLAG example above.
Every time you execute an instruction, you must advance the program counter by the size of that instruction, which is different for each opcode. Some opcodes dont take any arguments and so their size is just 1 byte, while others take 16-bit integers as in my MOV example above. All this should be pretty well documented.
Branch instructions (JMP, JE, JNE etc) are simple: If some flag is set in the flags register then load the PC to the address specified. This is how "decisions" are made in a microprocessor and emulating them is simply a matter of changing the PC, just as the real microprocessor would do.
The hardest part about writing an instruction set emulator is debugging. How do you know if everything is working like it should? There are plenty of resources for helping you. People have written test codes that will help you debug every instruction. You can execute them one instruction at a time and compare the reference output. If something is different, you know you have a bug somewhere and can fix it.
This should be enough to get you started. The important thing is that you have A) A good solid understanding of the instruction set you want to emulate and B) a solid understanding of low level data manipulation in C, including type casting, pointers, bitwise operations, byte order, etc.

Linux Kernel: udelay() returns too early?

I have a driver which requires microsecond delays. To create this delay, my driver is using the kernel's udelay function. Specifically, there is one call to udelay(90):
iowrite32(data, addr + DATA_OFFSET);
iowrite32(trig, addr + CONTROL_OFFSET);
udelay(30);
trig |= 1;
iowrite32(trig, addr + CONTROL_OFFSET);
udelay(90); // This is the problematic call
We had reliability issues with the device. After a lot of debugging, we traced the problem to the driver resuming before 90us has passed. (See "proof" below.)
I am running kernel version 2.6.38-11-generic SMP (Kubuntu 11.04, x86_64) on an Intel Pentium Dual Core (E5700).
As far as I know, the documentation states that udelay will delay execution for at least the specified delay, and is uninterruptible. Is there a bug is this version of the kernel, or did I misunderstand something about the use of udelay?
To convince ourselves that the problem was caused by udelay returning too early, we fed a 100kHz clock to one of the I/O ports and implemented our own delay as follows:
// Wait until n number of falling edges
// are observed
void clk100_delay(void *addr, u32 n) {
int i;
for (i = 0; i < n; i++) {
u32 prev_clk = ioread32(addr);
while (1) {
u32 clk = ioread32(addr);
if (prev_clk && !clk) {
break;
} else {
prev_clk = clk;
}
}
}
}
...and the driver now works flawlessly.
As a final note, I found a discussion indicating that frequency scaling could be causing the *delay() family of functions to misbehave, but this was on a ARM mailing list - I assuming such problems would be non-existent on a Linux x86 based PC.
I don't know of any bug in that version of the kernel (but that doesn't mean that there isn't one).
udelay() isn't "uninterruptible" - it does not disable preemption, so your task can be preempted by a RT task during the delay. However the same is true of your alternate delay implementation, so that is unlikely to be the problem.
Could your actual problem be a DMA coherency / memory ordering issue? Your alternate delay implementation accesses the bus, so this might be hiding the real problem as a side-effect.
The E5700 has X86_FEATURE_CONSTANT_TSC but not X86_FEATURE_NONSTOP_TSC. The TSC is the likely clock source for the udelay. Unless bound to one of the cores with an affinity mask, your task may have been preempted and rescheduled to another CPU during the udelay. Or the TSC might not be stable during lower-power CPU modes.
Can you try disabling interrupts or disabling preemption during the udelay? Also, try reading the TSC before and after.

How can I write directly to the screen?

I'm a teenager who has become very interested in assembly language. I'm trying to write a small operating system in Intel x86 assembler, and I was wondering how to write directly to the screen, as in without relying on the BIOS or any other operating sytems. I was looking through the sources of coreboot, Linux, and Kolibri, among others, in the hopes of finding and understanding some piece of code that does this. I have not yet succeeded in this regard, though I believe I'll take another look at the Linux source code, it being the most understandable to me of the sources I've searched through.
If anybody knows this, or knows where in some piece of source code that I could look at, I would appreciate it if they told me.
Or better yet, if someone knows how to identify what I/O port on an Intel x86 CPU connects to what piece of hardware, that would be appreciated too. The reason I need to ask this, is that in neither the chapter for input/output in the Intel 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture, nor in the sections for the IN or OUT instruction in Volume 3, could I find any of this information. And because it has been too arduous to search for the relevant instructions in the sources that I have.
PART 1
For old VGA modes, there's a fixed address to write to the (legacy) display memory area. For text modes this area starts at 0x000B8000. For graphics modes it starts at 0x000A0000.
For high-resolution video modes (e.g. those set by the VESA/VBE interface) this doesn't work because the size of legacy display memory area is limited to 64 KiB and most high-resolution video modes need a lot more space (e.g. 1024 * 768 * 32-bpp = 2.25 MiB). To get around that there's 2 different methods supported by VBE.
The first method is called "bank switching", where only part of the video card's display memory is mapped into the legacy area at any time (and you can change which part is mapped). This can be quite messy - for example, to draw one pixel you might need to calculate which bank the pixel is in, then switch to that bank, then calculate which offset in the bank. To make this worse, for some video modes (e.g. 24-bpp video modes where there's 3 bytes per pixel) only the first part of a pixel's data might be in one bank and the second part of the same pixel's data is in a different bank. The main benefit of this is that it works with real mode addressing, as the legacy display memory area is below 0x00100000.
The second method is called "Linear Framebuffer" (or just "LFB"), where the video card's entire display memory area can be accessed without any messy bank switching. You have to ask the VESA/VBE interface where this area is (and it's typically in the "PCI hole" somewhere between 0xC0000000 and 0xFFF00000). This means you can't access it in real mode, and need to use protected mode or long mode or "unreal mode".
To find the address of a pixel when you're using an LFB mode, you'd do something like "pixel_address = display_memory_address + y * bytes_per_line + x * bytes_per_pixel". The "bytes_per_line" comes from the VESA/VBE interface (and may not be the same as "horizontal_resolution * bytes_per_pixel" because there can be padding between horizontal lines).
For "bank switched" VBE/VESA modes, it becomes something more like:
pixel_offset = y * bytes_per_line + x * bytes_per_pixel;
bank_number = pixel_offset / bank_size;
pixel_starting_address_within_bank = pixel_offset % bank_size;
For some old VGA modes (e.g. the 256-colour "mode 0x13") it's very similar to LFB, except there is no padding between lines and you can do "pixel_address = display_memory_address + (y * horizontal_resolution + x) * bytes_per_pixel". For text modes it's basically the same thing, except 2 bytes determine each character and its attribute - e.g. "char_address = display_memory_address + (y * horizontal_resolution + x) * 2". For other old VGA modes (monochrome/2-colour, 4-colour and 16-colour modes) the video card's memory is arranged completely differently. It's split into "planes" where each plane contains one bit of the pixel, and (for e.g.) to update one pixel in a 16-colour mode you need to write to 4 separate planes. For performance reasons the VGA hardware supports different write modes and different read modes, and it can get complicated (too complicated to describe adequately here).
PART 2
For I/O ports (on 80x86, "PC compatibles"), there's 3 general categories. The first is "de facto standard" legacy devices which use fixed I/O ports. This includes things like the PIC chips, ISA DMA controller, PS/2 controller, PIT chip, serial/parallel ports, etc. Almost anything that describes how to program each of these devices will tell you which I/O ports the device uses.
The next category is legacy/ISA devices, where the I/O ports the devices use is determined by jumpers on the card itself, and there's no sane way to determine which I/O ports they use from software. To get around this the end-user has to tell the OS which I/O ports each device uses. Thankfully this crusty stuff has all become obsolete (although that doesn't necessarily mean that nobody is using it).
The third category is "plug & play", where there's some method of asking the device which I/O ports it uses (and in most cases, changing the I/O ports the device uses). An example of this is PCI, where there's a "PCI configuration space" that tells you lots of information about each PCI device. For this categories, there is no way anyone can determine which devices will be using which I/O ports without doing it at run-time, and changing some BIOS settings can cause any/all of these devices to change I/O ports.
Also note that an Intel CPU is only a CPU. Nothing prevents those CPUs from being used in something that is radically different to a "PC compatible" computer. Intel's CPU manuals will never tell you anything about hardware that exists outside of the CPU itself (including the chipset or devices).
Part 3
Probably the best place to go for more information (that's intended for OS developers/hobbyists) is http://osdev.org/ (their wiki and their forums).
To write directly to the screen, you should probably write to the VGA Text Mode area. This is a block of memory which is a buffer for text mode.
The text-mode screen consists of 80x25 characters; each character is 16 bits wide. If the first bit is set the character will blink on-screen. The next 3 bits then detail the background color; the final 4 bits of the first byte are the foreground (or the text character)'s color. The next 8 bits are the value of the character. This is usually code-page 737 or 437, but it could vary from system to system.
Here is a Wikipedia page detailing this buffer, and here is a link to codepage 437
Almost all BIOSes will set the mode to text mode before your system is booted, but some laptop BIOSes will not boot into text mode. If you are not already in text mode, you can set it with int10h very simply:
xor ah, ah
mov al, 0x03
int 0x10
(The above code uses BIOS interrupts, so it has to be run in Real Mode. I suggest putting this in your bootsector.)
Finally, here is a set of routines I wrote for writing strings in protected mode.
unsigned int terminalX;
unsigned int terminalY;
uint8_t terminalColor;
volatile uint16_t *terminalBuffer;
unsigned int strlen(const char* str) {
int len;
int i = 0;
while(str[i] != '\0') {
len++;
i++;
}
return len;
}
void initTerminal() {
terminalColor = 0x07;
terminalBuffer = (uint16_t *)0xB8000;
terminalX = 0;
terminalY = 0;
for(int y = 0; y < 25; y++) {
for(int x = 0; x < 80; x++) {
terminalBuffer[y * 80 + x] = (uint16_t)terminalColor << 8 | ' ';
}
}
}
void setTerminalColor(uint8_t color) {
terminalColor = color;
}
void putCharAt(int x, int y, char c) {
unsigned int index = y * 80 + x;
if(c == '\r') {
terminalX = 0;
} else if(c == '\n') {
terminalX = 0;
terminalY++;
} else if(c == '\t') {
terminalX = (terminalX + 8) & ~(7);
} else {
terminalBuffer[index] = (uint16_t)terminalColor << 8 | c;
terminalX++;
if(terminalX == 80) {
terminalX = 0;
terminalY++;
}
}
}
void writeString(const char *data) {
for(int i = 0; data[i] != '\0'; i++) {
putCharAt(terminalX, terminalY, data[i]);
}
}
You can read up about this on this page.
A little beyond my scope but you might want to look into VESA.
This is not so simple. While BIOS provides INT 10h to write text to the screen, graphics differs from one adapter to the next. For example, you can find information for VGA http://www.wagemakers.be/english/doc/vga here. Some ancient SVGA adapters http://www.intel-assembler.it/portale/5/assembly-game-programming-encyclopedia/assembly-game-programming-encyclopedia.asp here.
For general I/O ports, you have to go through the BIOS, which means interrupts. Many blue moons ago, I used the references from Don Stoner to help writing some real-mode assembly, but I burnt out on it after a few months and forgot most of what I knew.

Resources