Decoding 68k instructions - emulation

I'm writing an interpreted 68k emulator as a personal/educational project. Right now I'm trying to develop a simple, general decoding mechanism.
As I understand it, the first two bytes of each instruction are enough to uniquely identify the operation (with two rare exceptions) and the number of words left to be read, if any.
Here is what I would like to accomplish in my decoding phase:
1. read two bytes
2. determine which instruction it is
3. extract the operands
4. pass the opcode and the operands on to the execute phase
I can't just pass the first two bytes into a lookup table like I could with the first few bits in a RISC arch, because operands are "in the way". How can I accomplish part 2 in a general way?
Broadly, my question is: How do I remove the variability of operands from the decoding process?
More background:
Here is a partial table from section 8.2 of the Programmer's Reference Manual:
Table 8.2. Operation Code Map
Bits 15-12 Operation
0000 Bit Manipulation/MOVEP/Immediate
0001 Move Byte
...
1110 Shift/Rotate/Bit Field
1111 Coprocessor Interface...
This made great sense to me, but then I look at the bit patterns for each instruction and notice that there isn't a single instruction where bits 15-12 are 0001, 0010, or 0011. There must be some big piece of the picture that I'm missing.
This Decoding Z80 Opcodes site explains decoding explicitly, which is something I haven't found in the 68k programmer's reference manual or by googling.

I've decided to simply create a look-up table with every possible pattern for each instruction. It was my first idea, but I discarded it as "wasteful, inelegant". Now, I'm accepting it as "really fast".

Related

RISC-V: S-format instructions table

I have this table of S-format instructions. Can you explain to me what imm[11:5] and funct3 are? I know funct indicates its size in bits and sometimes it is 000 or 010. I don't know exactly why it's there. Also, imm[11:5] is also 7-bits of all 0s.
Please help!
imm[4:0] and imm[11:5] denote closed-intervals into the bit-representation of the immediate operand.
The S-format is used to encode store instructions, i.e.:
sX rs2, offset(r1)
There are different types of store instructions, e.g. store-byte (sb), store-half-word (sh), store-word (sw), etc. The funct3 part is used to encode the type (i.e. 0b00 -> sb, 0b010 -> sw, 0b011 -> sd, etc.). This allows to just use one (major) opcode while still having multiple types of store instructions - instead of having to waste several (major) opcodes. IOW, funct3 encodes the minor opcode of the instruction.
The immediate operand encodes the offset. If you ask yourself why it's split like this - it allows to increase similarity of the remaining parts in the encoding with other instruction formats. For example, the opcode, rs1, and funct3 parts are located at the exact same place in the R-type, I-type and B-type instruction formats. The rs2 part placement is shared with the R-type and B-type instruction formats. Those similarities help to simplify the instruction decoder.
That means the offset is 12 bit wide and in pseudo-code:
offset = sign_ext(imm[11:5] << 5 | imm[4:0])
See also the first figure in Section 2.6 (Load and Store Instruction) of the RISC-V Base specification (2019-06-08 ratified):

DEFLATE (RFC1951) dynamic huffman "incomplete length"

I've been studying RFC1951 and 'puff.c', and have a question about the issue of "incomplete length".
As near I can tell, defining a "dynamic" Huffman code table that allows for more codes than specified by HLIT+257 will produce an error, at least by puff.c. For example, an error is produced by 'puff.c' if, as a simple debugging test, I were to use a Huffman table of all 9-bit codes to define only 257 lit/lens. Is this outcome purposeful or a bug? And can I assume that any "inflator" based on the 'zlib' library will produce the same error?
I can't find any specification in RFC 1951 that should REQUIRE the use of a sufficiently tight Huffman code. Certainly, I can see that using an "under-subscribed" Huffman table might be inefficient, in terms of compression, but I'm not sure why such a table should be prohibited from use.
My interest isn't simply hypothetical. I really want to use an under-subscribed, literal-only, Huffman code (but NOT the example cited above) to compress some application specific images into PNG files. But I want to make sure it will work with any PNG image viewer.
The RFC specifies that the codes are Huffman codes, which by definition are complete codes. (Complete means that all bit patterns are used.)
zlib will reject incomplete or oversubscribed codes, except in the special case noted in the RFC:
If only one distance code is used, it is encoded using one bit, not
zero bits; in this case there is a single code length of one, with one
unused code.
There the incomplete code 0 for the single symbol, with code 1 unused, is permitted.
(That, by the way, is unnecessary. If there is only one distance symbol, then you don't need any bits to specify it. You know that that distance symbol must be used with any length. If that symbol needs extra bits, then those extra bits immediately follow the length. But, oh well -- for that case Phil Katz put an extraneous zero bit in every match, and now we're stuck with it.)
The fact that the RFC even had to note this special case is another clue that incomplete codes are not accepted otherwise.
There is sort of another exception in deflate, in that the fixed literal/length code is incomplete, with two unused codes at the end.
The bottom line is, no, you will not be able to use an incomplete code in a dynamic header (except the special case) and expect zlib or any compliant deflate decoder to be able to decode it.
As for why this strictness is useful, constraints on dynamic headers permit rapid detection of non-deflate streams or corrupted deflate streams. Similarly, a dynamic header with no end code is not permitted by zlib, so as to avoid the case of a bogus dynamic header permitting any following random bits to be decodable forever, never detecting an error. The unused fixed codes also help in this regard, since eventually they trigger an error in random input.
By the way, if you want to define a fixed, complete Huffman code for your case, it's super simple, and would reduce the size of almost all of your codes by one bit. Just encode eight bits for the symbols 0..253, using that symbol number directly as the code (reversing the bits of course), and nine bits for symbols 254..257, using the codes 508..511 (bits reversed).

How do I take operands as registers from the byte value?

I have a fairly simple program so far to start off my emulation experience. I load in an instruction and determine how many (if any) operands there are, then I grab those operands and use them. For things like jumps and pushes it's somewhat straightforward until I get to registers.. How do I know when an operand is a register? Or how can I tell if it's the value at an address instead of just an address (i.e when they use something like ld (hl),a)
I'm rather new to emulation and all, but I have a decent bit of experience with assembly, even for the z80.
Question
How do I tell the difference between what is meant as a register and what is meant as an address or dereference of an address?
Because you decode the instruction. For example in ld (hl), a, which is 0x77, or 0b01110111, the first 01 tell you it's an ld reg8, reg8 and that you have to decode two groups of 3 bits, each a reg8. So 110 and 111, and you look them up in the reg8 decoding table, where 110 means (hl) and 111 means a. Alternatively you could just make a Giant Switch of Death and directly decode 0x77 to ld (hl), a, but that's more of a difference in implementation than anything deep or significant.
The instruction completely specifies what the operands are, so this "how do I tell" question strikes me as a bit silly - the answer is already staring you right in the face when you're decoding the instruction.
See also: decoding z80 opcodes

Compressed trie implementation?

I am going through a Udacity course and in one of the lectures (https://www.youtube.com/watch?v=gPQ-g8xkIAQ&feature=player_embedded), the professor gives the function high_common_bits which (taken verbatim from the lecture) looks like this in pseudocode:
function high_common_bits(a,b):
return:
- high order bits that a+b in common
- highest differing bit set
- all remaining bits clear
As an example:
a = 10101
b = 10011
high_common_bits(a,b) => 10100
He then says that this function is used in highly-optimized implementations of tries. Does anyone happen to know which exact implementation he's referring to?
If you are looking for a highly optimized bitwise compressed trie (aka Radix Tree). The BSD routing table uses one in it's implementation. The code is not easy to read though.
He was talking about Succinct Tries, tries in which each node requires only two bits to store (the theoretical minimum).
Steve Hanov wrote a very approachable blog post on Succinct Tries here. You can also read the original paper by Guy Jacobson (written as recently as 1989) which introduced them here.
A compressed trie stores a prefix in one node, then branches from that node to each possible item that's been seen that starts with that prefix.
In this case he's apparently doing a bit-wise trie, so it's storing a prefix of bits -- i.e., the bits at the beginning that the items have in common go in one node, then there are two branches from that node, one to a node for the next bit being a 0, and the other for the next bit being a 1. Presumably those nodes will be compressed as well, so they won't just store the next single bit, but instead store a number of bits that all matched in the items inserted into the trie so far.
In fact, the next bit following a given node may not be stored in the following nodes at all. That bit can be implicit in the link that's followed, so the next nodes store only bits after that.

Word, Doubleword, Quadword

It's my second question, one after another. That's the problem with assembly (x86 - 32bit) too.
"Programming from the Ground Up" says that 4bytes are 32bits and that's a word.
But Intel's "Basic Architecture" guide says, that word is 16bits (2 bytes) and 4 bytes is a dualword.
Memory uses 4bytes words, to get to another word I have to skip next 4 bytes, on each word I can make 4 offsets (0-3) to read a byte, so it's wrong with Intel's name, but this memory definition goes from Intel, so what's there bad?
And how to operate on words, dualword, quadwords in assembly? How to define the number as quadword?
To answer your first question, the processor word size is a function of the architecture. Thus, a 32-processor has a 32-bit word. In software types, including assembly, usually there is need to identify the size unambigously, so word type for historical reasons is 16-bits. So probably both sources are correct, if you read them in context: the first one is referring to the processor word, while the Intel guide is referring to the word type.
We've got different "word"s: program words, memory words, OS-specific words, architecture-specific words (program space word, flash word, eeprom word), even address words.
It's just a matter of convention what size the word word refers to.
I usually find the size of the word by looking at the number of hex digits the context is using to show them. Intel's most common type, 4 digits (0x0000), is two bytes.
And for further information, even byte is a convention. In many systems in the past bytes have been 7 or 9 bits. Most architectures nowadays have 8-bit bytes. The correct name for an always-8-bit structure is an octet.

Resources