Can you add two address registers in 68000 using ADDA - 68000

For example if I write "ADDA A1, A5" would this be valid?

Sure you can. That's what the instruction is good for.
Note you must put the .l postfix after the instruction if you want a full 32-bit addition. Without the postfix, the assembler will assume word (16-bit) size and add the sign-extended word source operand to the destination.

Related

RISC-V: S-format instructions table

I have this table of S-format instructions. Can you explain to me what imm[11:5] and funct3 are? I know funct indicates its size in bits and sometimes it is 000 or 010. I don't know exactly why it's there. Also, imm[11:5] is also 7-bits of all 0s.
Please help!
imm[4:0] and imm[11:5] denote closed-intervals into the bit-representation of the immediate operand.
The S-format is used to encode store instructions, i.e.:
sX rs2, offset(r1)
There are different types of store instructions, e.g. store-byte (sb), store-half-word (sh), store-word (sw), etc. The funct3 part is used to encode the type (i.e. 0b00 -> sb, 0b010 -> sw, 0b011 -> sd, etc.). This allows to just use one (major) opcode while still having multiple types of store instructions - instead of having to waste several (major) opcodes. IOW, funct3 encodes the minor opcode of the instruction.
The immediate operand encodes the offset. If you ask yourself why it's split like this - it allows to increase similarity of the remaining parts in the encoding with other instruction formats. For example, the opcode, rs1, and funct3 parts are located at the exact same place in the R-type, I-type and B-type instruction formats. The rs2 part placement is shared with the R-type and B-type instruction formats. Those similarities help to simplify the instruction decoder.
That means the offset is 12 bit wide and in pseudo-code:
offset = sign_ext(imm[11:5] << 5 | imm[4:0])
See also the first figure in Section 2.6 (Load and Store Instruction) of the RISC-V Base specification (2019-06-08 ratified):

Obfuscation of checksum guards

As part of my project, I have to insert small codes in a C program called checksum guards. What these guards do is they calculate the checksum value of a portion of code using a function(add, xor, etc.) which operates on the instruction opcodes. So, if somebody has tampered with the instructions(add, modify, delete) in that region of code, the checksum value will change and intrusion will be detected.
Here is the research paper which talks about this technique:
https://www.cerias.purdue.edu/assets/pdf/bibtex_archive/2001-49.pdf
Here is the guard template:
guard:
add ebp, -checksum
mov eax, client_addr
for:
cmp eax, client_end
jg end
mov ebx, dword[eax]
add ebp, ebx
add eax, 4
jmp for
end:
Now, I have two questions.
Would putting the guards in the assembly better than putting it in the source program?
Assuming I put them in the assembly(at an appropriate place) what kind of obfuscation should I use to prevent the guard template to be easily visible? (Since when I have more than 1 guard, the attacker should not easily find out all the guard templates and disable all the guards together as that would leave the code with no security)
Thank you in advance.
From attacker's (without sources) point of view the first question doesn't matter; he's tampering with the final binary machine code, whether it was produced from .c or .s will make zero difference. So I would worry mainly how to generate the correct binary with appropriate checksums. I'm not aware of any way how to get proper checksum inside the C source. But I can imagine to have some external tool running over assembler files created by C compiler, in some post-process way - before compiling the .s files into .o. But... Keep in mind, that some calls and addresses are just relative offsets, and the binary loaded into memory is patched by the OS loader according to linker's table, to make those point to the real memory addresses. Thus the data bytes will change (opcodes will stay fixed).
Yours guard template doesn't take that into account, and does checksum whole opcodes with data bytes as well (Some advanced guards have opcodes definitions, and checksum/encrypt/decipher only the opcodes themselves without operand bytes).
Otherwise it's neat, that the result is damaged ebp value, ruining any C code around (*) working with stack variables. But it's still artificial test, you can simply comment out both add ebp,-checksum and add ebp,ebx making the guard harmless.
(*) notice you have to put the guard in between some classic C code to get some real runtime problems from invalid ebp value. If you would put it at the end of subroutine, which ends with pop ebp, everything would work well.
So to the second question:
You definitely want more malicious ways to guard correct value, than only ebp damage. Usually the hardest (to remove) way is to make checksum value part of some calculation, eventually skewing results just slightly, so serious usage of the SW will be impossible, but it will take time to notice by the user.
You can also use some genuine code loop to add your checksumming to it, so simply skipping whole loop will skip also valid code (but I can imagine this one only added by hand into generated assembly from C, so you will have to redo it after every new compilation of particular C source).
Then the particular guard template can be obfuscated by any imaginable mutation (different registers used, modified order of instructions, instruction variants), try to search about viruses with mutation encoding to get some ideas.
And I didn't read that paper, but from the Figures I would say the main point is to make those guarding areas to overlap, so patching off one of them will affect another one, which sounds to me like that extra sugar to make it somewhat functional (although this still looks like normal challenge to 8bit game crackers ;), not even "hard" level). But that also means you would need either very cunning external tool to calculate that cyclic tree of dependencies, and insert the guard templates in correct order, or do it again manually completely.
Of course when doing manually, you have to do it after each new C compilation, so it's worth of the effort only on something very precious and expensive, or rock solid stable, where you will not produce another revision for next 10y or so... :D

68000 Assembly Language - How to know wether an address is an absoulute long or short operand

For example: MOVE.W $1234,$8000
Could someone tell me what the source is using ( Long or short ) and what the destination is using (Long or short). Can you explain how to found this out.
Thanks.
It is probably whatever the assembler decides to use.
To force it, use an appropriate suffix:
move.w ($1234).w, ($8000).l
to use short (also called "near") source but long (aka "far") destination address.
In my (semi-ancient) experience, you don't need to care about this very often, just let the assembler do its job.
Unless explicitly specified by hinting the assembler (the notation may slightly differ depending on the assembler used, $1234.w would hint the assembler to use short mode), it depends on the assembler you're using what is done by default.
A common and sensible choice is to use the shorter variant where possible; e.g. anyhing between -32768 to 32767 inclusive is assembled as short, anything else as long. Applying this rule, $1234 would be assembled as short, while $8000 would assemble as long (because $8000.w would yield an effective address of $FFFF8000 when evaluated by the processor; as explicitly stated in the 68k family manual, address operands less than 32 bits in size, are sign extended to 32 bits before being used).

How do I take operands as registers from the byte value?

I have a fairly simple program so far to start off my emulation experience. I load in an instruction and determine how many (if any) operands there are, then I grab those operands and use them. For things like jumps and pushes it's somewhat straightforward until I get to registers.. How do I know when an operand is a register? Or how can I tell if it's the value at an address instead of just an address (i.e when they use something like ld (hl),a)
I'm rather new to emulation and all, but I have a decent bit of experience with assembly, even for the z80.
Question
How do I tell the difference between what is meant as a register and what is meant as an address or dereference of an address?
Because you decode the instruction. For example in ld (hl), a, which is 0x77, or 0b01110111, the first 01 tell you it's an ld reg8, reg8 and that you have to decode two groups of 3 bits, each a reg8. So 110 and 111, and you look them up in the reg8 decoding table, where 110 means (hl) and 111 means a. Alternatively you could just make a Giant Switch of Death and directly decode 0x77 to ld (hl), a, but that's more of a difference in implementation than anything deep or significant.
The instruction completely specifies what the operands are, so this "how do I tell" question strikes me as a bit silly - the answer is already staring you right in the face when you're decoding the instruction.
See also: decoding z80 opcodes

Decoding 68k instructions

I'm writing an interpreted 68k emulator as a personal/educational project. Right now I'm trying to develop a simple, general decoding mechanism.
As I understand it, the first two bytes of each instruction are enough to uniquely identify the operation (with two rare exceptions) and the number of words left to be read, if any.
Here is what I would like to accomplish in my decoding phase:
1. read two bytes
2. determine which instruction it is
3. extract the operands
4. pass the opcode and the operands on to the execute phase
I can't just pass the first two bytes into a lookup table like I could with the first few bits in a RISC arch, because operands are "in the way". How can I accomplish part 2 in a general way?
Broadly, my question is: How do I remove the variability of operands from the decoding process?
More background:
Here is a partial table from section 8.2 of the Programmer's Reference Manual:
Table 8.2. Operation Code Map
Bits 15-12 Operation
0000 Bit Manipulation/MOVEP/Immediate
0001 Move Byte
...
1110 Shift/Rotate/Bit Field
1111 Coprocessor Interface...
This made great sense to me, but then I look at the bit patterns for each instruction and notice that there isn't a single instruction where bits 15-12 are 0001, 0010, or 0011. There must be some big piece of the picture that I'm missing.
This Decoding Z80 Opcodes site explains decoding explicitly, which is something I haven't found in the 68k programmer's reference manual or by googling.
I've decided to simply create a look-up table with every possible pattern for each instruction. It was my first idea, but I discarded it as "wasteful, inelegant". Now, I'm accepting it as "really fast".

Resources