x64 Jump instructions - 64-bit

I understand the near jumps that are just offsets with either an immediate value, register, or address. There should be far (absolute addresses) jumps with immediate value, register, or address too right? The far instructions list m16:16, m16:32, and m16:64 but what do these mean? This is on page 853 of the Intel x64 manual.

It is not the case the "far == absolute"; while all far jumps are absolute, not all absolute jumps are far (see FF /4, call near absolute, optionally indirect).
The far jump is an absolute jump that also writes to cs, the m16 part goes into cs and the other part is the address jumped to. The precise semantics of the far jump are quite complicated.
So the possible jumps are:
near | far
-----------------------
relative | yes | no
absolute | yes | yes
indirect | yes | yes

Related

in RISC-V, what is the difference between jump and tail

RISC-V assembly features two mnemonics jump and tail, both of which perform an unconditional jump to another symbol. What is the difference between the two?
Both are pseudo-instructions that get expanded by the assembler but the difference is unclear.
It seems that the GNU assembler understands tail XXX as
auipc x6, (something appropriate for XXX)
jr x6, (something appropriate for XXX)
whereas jump XXX, RR is understood as
auipc RR, (something appropriate for XXX)
jr RR, (something appropriate for XXX)
In short, jump lets you choose the temporary register that gets clobbered by the computation of the destination.
In any case, the GNU linker removes the auipc if the target is close enough.

RISC-V user level reference or reference implementation

Summary: What is the definitive reference or reference implementation for the RISC-V user-level ISA?
Context: The RISC-V website has "The RISC-V Instruction Set Manual" which explains the user-level instructions very well, but does not give an exact specification for them. I am trying to build a user-level ISA simulator now and intend to write an FPGA implementation later, so the exact behavior is important to me.
A reference implementation would be sufficient, but should preferably be as simple as possible -- i.e. I would try to understand a pipelined implementation only as a last resort. What is important is to have an understanding of the specified ISA and not of a single CPU implementation or compiler implementation.
One example to show my problem is the AUIPC instruction: The prose explanation says that "AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then places the result in register rd." I wanted to know whether this refers to the old or new PC, i.e. the position of the AUIPC instruction or the next instruction. I looked at the "RISCV Angel" implementation, but that seems to mask out the lower bits of the (old) PC -- not just of the immediate -- which I could not find any reason for in the spec, not even in the change history of the spec (since Angel is a bit older). Instead of an answer, I now have two questions about AUIPC. Many other instructions pose similar problems to me.
AFAICT the RISC-V Instruction Set Manual you cite is the closest thing there is to a definitive reference. If there are things that are unclear or incorrect in there then you could open issues at the Github site where that document is maintained: https://github.com/riscv/riscv-isa-manual
As far as AIUPC is concerned, the answer is implied, but not stated explicitly, by this sentence at the bottom of page 9 in the current manual:
There is one additional user-visible register: the program counter pc holds the address of the current instruction.
Based on that statement I would expect that the pc value that is seen and manipulated by the AIUPC instruction is the address of the AIUPC instruction itself.
This interpretation is supported by the discussion of the JALR instruction:
The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the least-signicant bit of the result to zero. The address of the instruction following the jump (pc+4) is written to register rd.
Given that the address of the following instruction is expressed as pc+4, it seems clear that the pc value visible during the execution of JALR is the address of the JALR instruction itself.
The latest draft of the manual (at https://github.com/riscv/riscv-isa-manual/releases/download/draft-20190321-ba17106/riscv-spec.pdf) makes the situation slightly clearer. In place of this in the current manual:
AUIPC appends 12 low-order zero bits to the 20-bit U-immediate, sign-extends the result to 64 bits, then adds it to the pc and places the result in register rd.
the latest draft says:
AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc of the AUIPC instruction, then places the result in register rd.

How to disassemble these instructions

I am writing a little disassembler using riscv-spec-v2.0 and have some questions about the following instructions and how to correctly disassemble them:
1.
FENCE instruction has "pred" and "succ" bit fields in imm
2.
AMO instructions have "aq" and "rl" bits in in funct7
3.
Float instructions have a "rm" bit field in funct3
All of these bit fields seem to lack mappings in the assembler.
E.g. page 50 just says "FENCE" but not what to do with the intermediate.
Or page 33 has an example of putting .aq or .rl at the end but not what to do if both are present.
4.
SCALL, SBREAK are the same as ECALL, EBREAK
but there is also ERET: so why not drop SCALL and SBREAK
and just use ECALL, EBREAK and ERET because other wise it
is hard to disassemble these opcodes.
The current RISC-V assembler is terse for common defaults:
"FENCE" with no arguments is treated as a full fence (all bits set)
OK to have both on same instruction
Rounding mode not shown if not specified
ECALL and EBREAK will be the new standard names (will be clarified in the revised user ISA manual)

Emulation: Unconditional jumps and PC increment through CPU cycles

I'm writing a simple GB emulator (wow, now that's something new, isn't it), since I'm really taking my first steps in emu.
What i don't seem to understand is how to correctly implement the CPU cycle and unconditional jumps.
Consider the command JP nn (unconditional jump to memory adress pointed out), like JP 1000h, if I have a basic loop of:
increment PC
read opcode
execute command
Then after the JP opcode has been read and the command executed (read 1000h from memory and set PC = 1000h), the PC gets incremented and becomes 1001h, thus resulting in bad emulation.
tl;dr How do you emulate jumps in emulators, so that PC value stays correct, when having cpu loops that increment PC?
The PC should be incremented as an 'atomic' operation every time it is used to return a byte. This means immediate operands as well as opcodes.
In your example, the PC would be used three times, once for the opcode and twice for the two operand bytes. By the time the CPU has fetched the three bytes and is in a position to load the PC, the PC is already pointing to the next instruction opcode after the second operand but, since actually implementing the instruction reloads the PC, it doesn't matter.
Move increment PC to the end of the loop, and have it performed conditionally depending on the opcode?
I know next to nothing about emulation, but two obvious approaches spring to mind.
Instead of hardcoding PC += 1 into the main loop, let the evaluation if each opcode return the next PC value (or the offset, or a flag saying whether to increment it, or etc). Then the difference between jumps and other opcodes (their effect on the program counter) is definable along with everything else about them.
Knowing that the main loop will always increment the PC by 1, just have the implementation of jumps set the PC to target - 1 rather than target.

What does #plt mean here?

0x00000000004004b6 <main+30>: callq 0x400398 <printf#plt>
Anyone knows?
UPDATE
Why two disas printf give me different result?
(gdb) disas printf
Dump of assembler code for function printf#plt:
0x0000000000400398 <printf#plt+0>: jmpq *0x2004c2(%rip) # 0x600860 <_GLOBAL_OFFSET_TABLE_+24>
0x000000000040039e <printf#plt+6>: pushq $0x0
0x00000000004003a3 <printf#plt+11>: jmpq 0x400388
(gdb) disas printf
Dump of assembler code for function printf:
0x00000037aa44d360 <printf+0>: sub $0xd8,%rsp
0x00000037aa44d367 <printf+7>: mov %rdx,0x30(%rsp)
0x00000037aa44d36c <printf+12>: movzbl %al,%edx
0x00000037aa44d36f <printf+15>: mov %rsi,0x28(%rsp)
0x00000037aa44d374 <printf+20>: lea 0x0(,%rdx,4),%rax
0x00000037aa44d37c <printf+28>: lea 0x3f(%rip),%rdx # 0x37aa44d3c2 <printf+98>
It's a way to get code fix-ups (adjusting addresses based on where code sits in virtual memory, which may be different across different processes) without having to maintain a separate copy of the code for each process. The PLT, or procedure linkage table, is one of the structures which makes dynamic loading and linking easier to use (another is the GOT, or global offsets table).
Refer to the following diagram, which shows both your calling code and the library code (that you call) mapped to different virtual addresses in two different processes, A and B. There is only one copy of each piece of code in real memory, with the different virtual addresses within each process mapping to that real address):
Process A
Addresses (virtual):
0x1234 0x8888
+-------------+ +---------+ +---------+
| | | Private | | |
| | | PLT/GOT | | |
| Shared | +---------+ | Shared |
===== application =============== library =====
| code | +---------+ | code |
| | | Private | | |
| | | PLT/GOT | | |
+-------------+ +---------+ +---------+
0x2020 0x6666
Process B
When the shared library is brought in to the address space, entries are constructed in the process-specific (private) PLT and/or GOT which will, on first use, perform some fix-up to make things faster. Subsequent usage will then bypass the fix-up as it will no longer be needed.
The process goes something like this.
printf#plt is actually a small stub which (eventually) calls the real printf function, modifying things on the way to make subsequent calls faster.
The real printf function is mapped into an arbitrary location in a given process (virtual address space), as is the code that is trying to call it.
So, in order to allow proper code sharing of calling code (left side above) and called code (right side), you cannot apply any fix-ups to the calling code directly since that will "damage" how it works in the other processes (that wouldn't matter if it mapped to the same location in every process but that's a bit of a restriction, especially if something else had already been mapped there).
So the PLT is a smaller process-specific area at a reliably-calculated-at-runtime address that isn't shared between processes, so any given process is free to change it however it wants to, without adverse effects on other processes.
Let's follow the process through in a bit more detail. The diagram above doesn't show the address of the PLT/GOT since it can be found using a location relative to the current program counter. This is evidenced by your PC-relative lookup:
<printf#plt+0>: jmpq *0x2004c2(%rip) ; 0x600860 <_GOT_+24>
By using position independent code in the called library, along with the PLT/GOT, the first call to the function printf#plt (so in the PLT) is a multi-stage operation, in which the following actions take place:
It calls the GOT version (via a pointer) which initially points back to some set-up code in the PLT.
That set-up code loads the relevant shared library if not yet done, then modifies the GOT pointer so that subsequent calls go directly to the real printf (at the process-specific virtual address) rather than the PLT set-up code.
It then calls the loaded printf code at that address.
On subsequent calls, because the GOT pointer has been modified, the multi-stage approach is simplified:
It calls the GOT version (via pointer), which now points to the real printf.
A good article can be found here, detailing how glibc is loaded at run time.
Not sure, but probably what you have seen makes sense. The first time you run the disas command the printf is not yet called so it's not resolved. Once your program calls the printf method the first time the GOT is updated and now the printf is resolved and the GOT points to the real function. Thus, the next call to the disas command shows the real printf assembly.

Resources