How to disassemble these instructions - riscv

I am writing a little disassembler using riscv-spec-v2.0 and have some questions about the following instructions and how to correctly disassemble them:
1.
FENCE instruction has "pred" and "succ" bit fields in imm
2.
AMO instructions have "aq" and "rl" bits in in funct7
3.
Float instructions have a "rm" bit field in funct3
All of these bit fields seem to lack mappings in the assembler.
E.g. page 50 just says "FENCE" but not what to do with the intermediate.
Or page 33 has an example of putting .aq or .rl at the end but not what to do if both are present.
4.
SCALL, SBREAK are the same as ECALL, EBREAK
but there is also ERET: so why not drop SCALL and SBREAK
and just use ECALL, EBREAK and ERET because other wise it
is hard to disassemble these opcodes.

The current RISC-V assembler is terse for common defaults:
"FENCE" with no arguments is treated as a full fence (all bits set)
OK to have both on same instruction
Rounding mode not shown if not specified
ECALL and EBREAK will be the new standard names (will be clarified in the revised user ISA manual)

Related

RISC-V user level reference or reference implementation

Summary: What is the definitive reference or reference implementation for the RISC-V user-level ISA?
Context: The RISC-V website has "The RISC-V Instruction Set Manual" which explains the user-level instructions very well, but does not give an exact specification for them. I am trying to build a user-level ISA simulator now and intend to write an FPGA implementation later, so the exact behavior is important to me.
A reference implementation would be sufficient, but should preferably be as simple as possible -- i.e. I would try to understand a pipelined implementation only as a last resort. What is important is to have an understanding of the specified ISA and not of a single CPU implementation or compiler implementation.
One example to show my problem is the AUIPC instruction: The prose explanation says that "AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then places the result in register rd." I wanted to know whether this refers to the old or new PC, i.e. the position of the AUIPC instruction or the next instruction. I looked at the "RISCV Angel" implementation, but that seems to mask out the lower bits of the (old) PC -- not just of the immediate -- which I could not find any reason for in the spec, not even in the change history of the spec (since Angel is a bit older). Instead of an answer, I now have two questions about AUIPC. Many other instructions pose similar problems to me.
AFAICT the RISC-V Instruction Set Manual you cite is the closest thing there is to a definitive reference. If there are things that are unclear or incorrect in there then you could open issues at the Github site where that document is maintained: https://github.com/riscv/riscv-isa-manual
As far as AIUPC is concerned, the answer is implied, but not stated explicitly, by this sentence at the bottom of page 9 in the current manual:
There is one additional user-visible register: the program counter pc holds the address of the current instruction.
Based on that statement I would expect that the pc value that is seen and manipulated by the AIUPC instruction is the address of the AIUPC instruction itself.
This interpretation is supported by the discussion of the JALR instruction:
The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the least-signicant bit of the result to zero. The address of the instruction following the jump (pc+4) is written to register rd.
Given that the address of the following instruction is expressed as pc+4, it seems clear that the pc value visible during the execution of JALR is the address of the JALR instruction itself.
The latest draft of the manual (at https://github.com/riscv/riscv-isa-manual/releases/download/draft-20190321-ba17106/riscv-spec.pdf) makes the situation slightly clearer. In place of this in the current manual:
AUIPC appends 12 low-order zero bits to the 20-bit U-immediate, sign-extends the result to 64 bits, then adds it to the pc and places the result in register rd.
the latest draft says:
AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc of the AUIPC instruction, then places the result in register rd.

How to tell compiler to pad a specific amount of bytes on every C function?

I'm trying to practice some live instrumentation and I saw there was a linker option -call-nop=prefix-nop, but it has some restriction as it only works with GOT function (I don't know how to force compiler to generate GOT function, and not sure if it's good idea for performance reason.) Also, -call-nop=* cannot pad more than 1 byte.
Ideally, I'd like to see a compiler option to pad any specific amount of bytes, and compiler will still perform all the normal function alignment.
Once I have this pad area, I can at run time to reuse these padding area to store some values or redirect the control flow.
P.S. I believe Linux kernel use similar trick to dynamically enable some software tracepoint.
-pg is intended for profile-guided optimization. The correct option for this is -fpatchable-function-entry
-fpatchable-function-entry=N[,M]
Generate N NOPs right at the beginning of each function, with the function entry point before the Mth NOP. If M is omitted, it defaults to 0 so the function entry points to the address just at the first NOP. The NOP instructions reserve extra space which can be used to patch in any desired instrumentation at run time, provided that the code segment is writable. The amount of space is controllable indirectly via the number of NOPs; the NOP instruction used corresponds to the instruction emitted by the internal GCC back-end interface gen_nop. This behavior is target-specific and may also depend on the architecture variant and/or other compilation options.
It'll insert N single-byte 0x90 NOPs and doesn't make use of multi-byte NOPs thus performance isn't as good as it should, but you probably don't care about that in this case so the option should work fine
I achieved this goal by implement my own mcount function in an assembly file and compile the code with -pg.

Linux x86 CPU Instruction Layout Confusion

In x86, I understand multi-byte objects are stored in memory little endian style.
Now generally speaking, when it comes to CPU instructions, the OPCODE determines the purpose of the instruction and data/memory addresses may follow the opcode in it's encoded format. My understanding is the Opcode portion of the instruction should be the most significant byte and thus appear at the highest address of any given instruction encoding representation.
Can someone explain the memory layout on this x86 linux gdb example? I would imagine that the opcode 0xb8 would appear at a higher address due to it being the most significant byte.
(gdb) disassemble _start
Dump of assembler code for function _start:
0x08048080 <+0>: mov eax,0x11223344
(gdb) x/1xb _start+0
0x8048080 <_start>: 0xb8
(gdb) x/1xb _start+1
0x8048081 <_start+1>: 0x44
(gdb) x/1xb _start+2
0x8048082 <_start+2>: 0x33
(gdb) x/1xb _start+3
0x8048083 <_start+3>: 0x22
(gdb) x/1xb _start+4
0x8048084 <_start+4>: 0x11
It appears the instruction mov eax, 0x11223344 is encoding as 0x11 0x22 0x33 0x44 0xb8.
Questions.
1.) How does the CPU know how many bytes the instruction will take up if the first byte it see's is not an opcode?
2.) I'm wondering if perhaps x86 cpu instructions do not even have endian-ness and are considering some type of string? (probably way off here)
x86 is a variable length instruction set, you start with a single byte which has no endianness, it is wherever it is.
Then depending on the opcode there may be more bytes and those might for example be a 32 bit immediate, and (if that group of bytes is an immediate or address of some sort) THOSE bytes will be little endian. Say you have the five bytes ABCDE (no endianess, think array or string). The A byte is the opcode, the B byte would then be the lower 8 bits of the immediate and the E the upper 8 bits of the immediate.
Opcode is a hard to use term, in these older 8/16 bit CISC processors like x86 the entire byte was an opcode that you basically looked up in a table to see what it meant (and inside the processor they did use a table to figure out how to execute it). When you look at MIPS or ARM or other (certainly RISC) instruction sets like those, only a portion of the 32 bits are the "opcode" and in neither of those cases is it the same set of bits from one instruction to another, you have to look at various places in the instruction (yes there is overlap to make the decoding sane), MIPS is a lot more consistent you have one blob in one place you look at but one of those patterns requires you to look at another blob of bits to fully decode. ARM you start at a common bit and as you work your way across you are further decoding the instruction, then you may have to grab some random looking spots to fully decode. The rest of the bits are operands, what register to use or immediate or whatever the kind of thing that in a CISC you needed a look up table for (are implied by the opcode but not defined by bits in the opcode).
1) the next byte after the prior instruction will be interpreted as an opcode even if not intended to be one (if execution continues to that byte and doesnt branch). I dont remember my x86 table off hand to know if there are any undefined instructions or not, if undefined then it will hit a handler, otherwise it will decode what it finds as machine code and if it is not properly formed instructions will likely crash, sometimes you get lucky and it just messes something up and keeps going, or even more lucky and you cant tell that it almost crashed.
2) you are right for these 8/16 bit CISC or similar instruction sets they are treated more like strings that you parse through linearly.

How do I do bcalls in hex?

So I have a TI-84 Plus C Silver Edition. I just started working on writing assembly programs on it using the opcodes. I found a good reference chart here, but was wondering how to do bcalls, specifically how to print a character to the screen. It seems as if the hex code for the call is 3 bytes long, but call takes in 2 bytes. So how do I call it?
Also, does anyone know the memory location programs are loaded into when they are run for my calculator? I haven't been able to find it yet.
Based on the definitions here: http://wikiti.brandonw.net/index.php?title=84PCSE:OS:Include_File , a "bcall" is a RST 28 instruction followed by the specific number of the bcall. So to print a character you would do (given that PutC is 44FB):
rst 28h
dw 44FBh
Presumably the character to print is in A register.
TI uses rst 28h for their bcall which translates to hexadecimal as EF. Bcalls are 2 bytes, but keep in mind that the Z80 and eZ80 are little-endian processors. So as mentioned earlier, _PutC is 44FB, so you would have to use the FB first, then the 44, making bcall(_PutC) equivalent to EFFB44.
I think the calc you are using has an eZ80. While the eZ80 is backwards compatible with the Z80 instruction set, the table to which you linked is not complete for the eZ80. If you are looking to get really wild, you can use the docs provided by Zilog here though I must warn you that if you are not fairly comfortable with Z80 Assembly, the reading material will be too dense.

operand of LIDT is displacement/absolute address

I stumbled upon a statement in Intel Software developers manual:
"For LGDT, LIDT, LLDT, LTR, SGDT, SIDT, SLDT, STR, the exit qualification receives the value of the instruction’s displacement field, which is sign-extended to 64 bits if necessary (32 bits on processors that do not support Intel 64 architecture). If the instruction has no displacement (for example, has a register operand), zero is stored into the exit qualification. "
Now if I have an instruction LIDT 0xf290, then is "0xf290" a displacement? I think answer is yes.
So, my confusion is what all constitute as displacement? I was under impression that displacement is something which is calculated with respect to current eip value.
For eg. jmp xxx (In intrasegment jumps this will be a displacement. But for intersegment jumps, it should be absolute address.) If that is the case then why LIDT loads a relative address?
A displacement is just an offset from some origin, which may be a Base+Index*Scale, or 0. The other operand x86 has that can hold large values is immediate, which is useful for things like adding constants (e.g. ADD $42, %eax).
Incidentally, it appears that relative jumps use the immediate field, probably because they modify EIP by a constant.

Resources