68000 Assembly Language - How to know wether an address is an absoulute long or short operand - 68000

For example: MOVE.W $1234,$8000
Could someone tell me what the source is using ( Long or short ) and what the destination is using (Long or short). Can you explain how to found this out.
Thanks.

It is probably whatever the assembler decides to use.
To force it, use an appropriate suffix:
move.w ($1234).w, ($8000).l
to use short (also called "near") source but long (aka "far") destination address.
In my (semi-ancient) experience, you don't need to care about this very often, just let the assembler do its job.

Unless explicitly specified by hinting the assembler (the notation may slightly differ depending on the assembler used, $1234.w would hint the assembler to use short mode), it depends on the assembler you're using what is done by default.
A common and sensible choice is to use the shorter variant where possible; e.g. anyhing between -32768 to 32767 inclusive is assembled as short, anything else as long. Applying this rule, $1234 would be assembled as short, while $8000 would assemble as long (because $8000.w would yield an effective address of $FFFF8000 when evaluated by the processor; as explicitly stated in the 68k family manual, address operands less than 32 bits in size, are sign extended to 32 bits before being used).

Related

How to do modulo with less memory in ARM embedded Rust

I have an embedded project in Rust on the STM32F446 MCU. Consider the next line:
leds::set_g(self.next_update_time % 2000 == 0)
The modulo is used and reading online, it appears that the Cortex M4 doesn't have a modulo instruction. Instead, a function gets added to the binary that does this in software. Using cargo bloat (based on Google's Bloaty), it can be found.
File .text Size Crate Name
...
0.1% 6.9% 990B compiler_builtins __udivmoddi4
...
Much to my surprise, it takes just under a kilobyte of memory. I think that's a lot. The code behind it is quite long as well, see this link. I assume this implementation is made to be fast. Luckily I have the memory to spare.
Using opt-level = 'z' doesn't change this.
But what if I couldn't afford this, how could I let it take up less memory?
Of course resorting to a solution like this would work, but then I'd lose the ability to use the % operator.
Not sure how clever the Rust linker is, but in many embedded linker implementations you would be able to swap in your own implementation of __udivmodi4 which used a smaller (but slower) method in preference to the version provided by the compiler.
In general generic division and modulo are expensive on embedded platforms, but division by a constant can often be specialized with a "fixed" implementation by a smart compiler (often with special cases for common divisors - 3, 5, 7, 10, etc).
If you can control the application then changing the code to divide or modulo by 2^N is obviously preferable (it collapses to either a "right shift" instruction for divide, or an "and" instruction for modulo). E.g. in this case 2048 might be acceptably close to 2000, and turns 1 KB of code into 4 bytes of code.
FWIW the Rust version of this does seem a little on the fat side - the GCC implementation for example is much smaller.

What does Int use three bits for? [duplicate]

Why is GHC's Int type not guaranteed to use exactly 32 bits of precision? This document claim it has at least 30-bit signed precision. Is it somehow related to fitting Maybe Int or similar into 32-bits?
It is to allow implementations of Haskell that use tagging. When using tagging you need a few bits as tags (at least one, two is better). I'm not sure there currently are any such implementations, but I seem to remember Yale Haskell used it.
Tagging can somewhat avoid the disadvantages of boxing, since you no longer have to box everything; instead the tag bit will tell you if it's evaluated etc.
The Haskell language definition states that the type Int covers at least the range [−229, 229−1].
There are other compilers/interpreters that use this property to boost the execution time of the resulting program.
All internal references to (aligned) Haskell data point to memory addresses that are multiple of 4(8) on 32-bit(64-bit) systems. So, references need only 30bits(61bits) and therefore allow 2(3) bits for "pointer tagging".
In case of data, the GHC uses those tags to store information about that referenced data, i.e. whether that value is already evaluated and if so which constructor it has.
In case of 30-bit Ints (so, not GHC), you could use one bit to decide if it is either a pointer to an unevaluated Int or that Int itself.
Pointer tagging could be used for one-bit reference counting, which can speed up the garbage collection process. That can be useful in cases where a direct one-to-one producer-consumer relationship was created at runtime: It would result directly in memory reuse instead of a garbage collector feeding.
So, using 2 bits for pointer tagging, there could be some wild combination of intense optimisation...
In case of Ints I could imagine these 4 tags:
a singular reference to an unevaluated Int
one of many references to the same possibly still unevaluated Int
30 bits of that Int itself
a reference (of possibly many references) to an evaluated 32-bit Int.
I think this is because of early ways to implement GC and all that stuff. If you have 32 bits available and you only need 30, you could use those two spare bits to implement interesting things, for instance using a zero in the least significant bit to denote a value and a one for a pointer.
Today the implementations don't use those bits so an Int has at least 32 bits on GHC. (That's not entirely true. IIRC one can set some flags to have 30 or 31 bit Ints)

6502 and little-endian conversion

For fun I'm implementing an NES emulator. I'm currently reading through documentation for the 6502 CPU and I'm a little confused.
I've seen documentation stating because the 6502 is little-endian so when using absolute addressing mode you need to swap the bytes. I'm writing this on an x86 machine which is also little-endian, so I don't understand why I couldn't simply cast to a uint16_t*, dereference that, and let the compiler work out the details.
I've written some simple tests in google test and they seem to agree with me.
// implementation of READ16
#define READ16(addr) (*(uint16_t*)addr)
TEST(MemMacro, READ16) {
uint8_t arr[] = {0xFF,0xCC};
uint8_t *mem = (&arr[0]);
EXPECT_EQ(0xCCFF, READ16(mem));
}
This passes, so it appears my supposition is correct, but I thought I'd ask someone with more experience than I.
Is this correct for pulling out the operand in 6502 absolute addressing mode? Am I possibly missing something?
It will work for simple cases on little-endian systems, but tying your implementation to those feels unnecessary when the corresponding portable implementation is simple. Sticking to the macro, you could do this instead:
#define READ16(addr) (addr[0] + (addr[1] << 8))
(Just to be pedantic, you should also make sure that addr[1] can't be out-of-bounds, and would need to add some more parentheses if addr could be a complex expression.)
However, as you keep developing your emulator, you will find that it's most natural to use a pair of general-purpose read_mem() and write_mem() functions that operate on single bytes. Remember that the address space is split up into multiple regions (RAM, ROM, and memory-mapped registers from the PPU and APU), so having e.g. a single array that you index into won't work well. The fact that memory regions can be remapped by mappers also complicates things. (You won't have to worry about that for simple games though -- I recommend starting with Donkey Kong.)
What you need to do is to figure out what region or memory-mapped register the address belongs to inside your read_mem() and write_mem() functions (this is called address decoding), and do the right thing for the address.
Returning to the original question, the fact that you'll end up using read_mem() to read the individual bytes of the address anyway means that the uint16_t casting trickery is even less likely to be useful. This is the simplest and most robust approach w.r.t. handling corner cases, and what every emulator I've seen does in practice (Nestopia, Nintendulator, and FCEUX).
In case you've missed it, the #nesdev channel on EFNet is very active and a good resource by the way. I assume you're already familiar with the NESDev wiki. :)
I've also been working on an emulator which can be found here.

How do I take operands as registers from the byte value?

I have a fairly simple program so far to start off my emulation experience. I load in an instruction and determine how many (if any) operands there are, then I grab those operands and use them. For things like jumps and pushes it's somewhat straightforward until I get to registers.. How do I know when an operand is a register? Or how can I tell if it's the value at an address instead of just an address (i.e when they use something like ld (hl),a)
I'm rather new to emulation and all, but I have a decent bit of experience with assembly, even for the z80.
Question
How do I tell the difference between what is meant as a register and what is meant as an address or dereference of an address?
Because you decode the instruction. For example in ld (hl), a, which is 0x77, or 0b01110111, the first 01 tell you it's an ld reg8, reg8 and that you have to decode two groups of 3 bits, each a reg8. So 110 and 111, and you look them up in the reg8 decoding table, where 110 means (hl) and 111 means a. Alternatively you could just make a Giant Switch of Death and directly decode 0x77 to ld (hl), a, but that's more of a difference in implementation than anything deep or significant.
The instruction completely specifies what the operands are, so this "how do I tell" question strikes me as a bit silly - the answer is already staring you right in the face when you're decoding the instruction.
See also: decoding z80 opcodes

Why is bounds checking not implemented in some of the languages?

According to the Wikipedia (http://en.wikipedia.org/wiki/Buffer_overflow)
Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array. Bounds checking can prevent buffer overflows.
So, why are 'Bounds Checking' not implemented in some of the languages like C and C++?
Basically, it's because it means every time you change an index, you have to do an if statement.
Let's consider a simple C for loop:
int ary[X] = {...}; // Purposefully leaving size and initializer unknown
for(int ix=0; ix< 23; ix++){
printf("ary[%d]=%d\n", ix, ary[ix]);
}
if we have bounds checking, the generated code for ary[ix] has to be something like
LOOP:
INC IX ; add `1 to ix
CMP IX, 23 ; while test
CMP IX, X ; compare IX and X
JGE ERROR ; if IX >= X jump to ERROR
LD R1, IX ; put the value of IX into register 1
LD R2, ARY+IX ; put the array value in R2
LA R3, Str42 ; STR42 is the format string
JSR PRINTF ; now we call the printf routine
J LOOP ; go back to the top of the loop
;;; somewhere else in the code
ERROR:
HCF ; halt and catch fire
If we don't have that bounds check, then we can write instead:
LD R1, IX
LOOP:
CMP IX, 23
JGE END
LD R2, ARY+R1
JSR PRINTF
INC R1
J LOOP
This saves 3-4 instructions in the loop, which (especially in the old days) meant a lot.
In fact, in the PDP-11 machines, it was even better, because there was something called "auto-increment addressing". On a PDP, all of the register stuff etc turned into something like
CZ -(IX), END ; compare IX to zero, then decrement; jump to END if zero
(And anyone who happens to remember the PDP better than I do, don't give me trouble about the precise syntax etc; you're an old fart like me, you know how these things slip away.)
It's all about the performance. However, the assertion that C and C++ have no bounds checking is not entirely correct. It is quite common to have "debug" and "optimized" versions of each library, and it is not uncommon to find bounds-checking enabled in the debugging versions of various libraries.
This has the advantage of quickly and painlessly finding out-of-bounds errors when developing the application, while at the same time eliminating the performance hit when running the program for realz.
I should also add that the performance hit is non-negigible, and many languages other than C++ will provide various high-level functions operating on buffers that are implemented directly in C and C++ specifically to avoid the bounds checking. For example, in Java, if you compare the speed of copying one array into another using pure Java vs. using System.arrayCopy (which does bounds checking once, but then straight-up copies the array without bounds-checking each individual element), you will see a decently large difference in the performance of those two operations.
It is easier to implement and faster both to compile and at run-time. It also simplifies the language definition (as quite a few things can be left out if this is skipped).
Currently, when you do:
int *p = (int*)malloc(sizeof(int));
*p = 50;
C (and C++) just says, "Okey dokey! I'll put something in that spot in memory".
If bounds checking were required, C would have to say, "Ok, first let's see if I can put something there? Has it been allocated? Yes? Good. I'll insert now." By skipping the test to see whether there is something which can be written there, you are saving a very costly step. On the other hand, (she wore a glove), we now live in an era where "optimization is for those who cannot afford RAM," so the arguments about the speed are getting much weaker.
The primary reason is the performance overhead of adding bounds checking to C or C++. While this overhead can be reduced substantially with state-of-the-art techniques (to 20-100% overhead, depending upon the application), it is still large enough to make many folks hesitate. I'm not sure whether that reaction is rational -- I sometimes suspect that people focus too much on performance, simply because performance is quantifiable and measurable -- but regardless, it is a fact of life. This fact reduces the incentive for major compilers to put effort into integrating the latest work on bounds checking into their compilers.
A secondary reason involves concerns that bounds checking might break your app. Particularly if you do funky stuff with pointer arithmetic and casting that violate the standard, bounds checking might block something your application is currently doing. Large applications sometimes do amazingly crufty and ugly things. If the compiler breaks the application, then there's no point in pointing blaming the crufty code for the problem; people aren't going to keep using a compiler that breaks their application.
Another major reason is that bounds checking competes with ASLR + DEP. ASLR + DEP are perceived as solving, oh, 80% of the problem or so. That reduces the perceived need for full-fledged bounds checking.
Because it would cripple those general purpose languages for HPC requirements. There are plenty of applications where buffer overflows really do not matter one iota, simply because they do not happen. Such features are much better off in a library (where in fact you can already find examples for C/C++).
For domain specific languages it may make sense to bake such features into the language definition and trade the resulting performance hit for increased security.

Resources