can anyone explain what the * in the gnu assembler does? Example:
jmp *0x804a004
This is an entry in a procedure linkage table (plt), maybe someone can clarify what this instruction does and what the * stands for.
I think the "*" means that the address to call or jmp is absolute. If you don't specify it, "as" will assume that the operand is relative to the program counter (PC relative addressing).
Related
This question is about Linux (Ubuntu) executables.
I'll detail things as I understand them to make it clearer if anything's off (so please correct me where applicable):
The GOT acts an extra level of indirection to enable accessing data from a text section which needs to be position-independent, for instance because the text section might be readonly and the actual addresses of the data may be unknown at (static) linking time.
The GOT then holds addresses to the actual data's location, which is known at loading time, and so the dynamic linker (which is invoked by the loader) is able to modify the appropriate GOT entries and make them point to the actual data.
The main thing that confuses me – not the only one at the moment, mind you :-) – is that this means the addresses in the text section now point to a value "of a different type":
If there was no GOT, I'd have expected this address (for instance in a RIP-relative addressing mode) to point to the actual value I'm after. With a GOT, though, I expect it to point to the appropriate GOT entry, which in turn holds the address to the value I'm after. In this case, there's an extra "dereferencing" required here.
Am I somehow misunderstanding this? If I use RIP-relative addressing, shouldn't the computed address (RIP+offset) be the actual address used in the instruction? So (in AT&T syntax):
mov $fun_data(%rip), %rax
To my understanding, without GOT, this should be "rax = *(rip + (fun_data - rip))", or in short: rax = *fun_data.
With GOT, however, I expect this to be equivalent to rax = **fun_data, since *fun_data is just the GOT entry to the real fun_data.
Am I wrong about this, or is it just that the loader somehow knows to access the real data if the pointer is into the GOT? (In other words: that in a PIE, I suppose, some pointers effectively become pointers-to-pointers?)
Am I wrong about this
No.
or is it just that the loader somehow knows to access the real data if the pointer is into the GOT?
The compiler knows that double dereference is required.
Compile this source with and without -fPIC and observe for yourself:
extern int dddd;
int fn() { return dddd; }
Without -fPIC, you get (expected):
movl dddd(%rip), %eax
With -fPIC you get "double dereference":
movq dddd#GOTPCREL(%rip), %rax # move pointer to dddd into RAX
movl (%rax), %eax # dereference it
I am following the gem5 to add the custom instruction. My question is how to the interpret operands mentioned in "const struct riscv_opcode riscv_opcodes[]" in riscv-opc.h.
For an example :
{"mod", "I", "d,s,t", MATCH_MOD, MASK_MOD, match_opcode, 0 }
.how "d,s,t" are interpret here?
Can anyone explain the this whole statement
refLink :https://nitish2112.github.io/post/adding-instruction-riscv/
According to the comment at the top of the array describing the instructions :
/* name, isa, operands, match, mask, match_func, pinfo. */
The line says that
{"mod", "I", "d,s,t",
mod belongs to the Integer ISA and that it is a triadic instruction, meaning that it takes 3 registers whose symbolic names are d,s,t.
d being the destination register, s and t being source registers.
This question is almost 4 years old, but since I spent a lot of time figuring out similar things, I would like to post my knowledge in case anyone needs it.
"mod" is the instruction label, "I" is the type of the instruction and in this case, it is an integer instruction. It takes three registers, the destination "d" register and the source registers "s" and "t".
The MASK_MOD and MATCH_MOD are used for the instruction matching in the assembler. The instruction matching is done by the function match_opcode that you are passing in the next parameter. This function does the instruction matching is done as follows:
((insn ^ MATCH_MOD) & MASK_MOD) == 0
This means that the instruction (32 bits in length) is XOR'ed with the MATCH_MOD and then the result is AND'ed with the MASK_MOD. The result must always be zero to match the "mod" instruction that you are adding. This means you have to define the instruction opcode, FUNCT7, and FUNCT3 accordingly in the riscv-opcodes/opcodes file included in RISCV-GNU-Toolchain. You should also define the MASK_MOD and MATCH_MOD in riscv-isa-sim/riscv/encoding.h.
lea 0x1c(%ebp),%eax
So, I understand vaguely what the lea instruction does, and I know those are registers, but what is this structure: 0x1c(%ebp)? I got this code out of objdump.
It is one of the many x86 addressing modes. Specifically, this is referred to as "displacement" addressing.
Since you said you used objdump and didn't specify that you used the -M flag, I'm going to assume this in the GAS syntax (as opposed to Intel syntax). This means that the first operand is the source, and the second operand is the destination.
The lea 0x1C(%ebp),%eax instruction means, "Take the value in %ebp, add 0x1C (28 in decimal), then store that value in %eax".
I am trying to understand what the following code segment from tls.h in glibc is doing and why:
/* Macros to load from and store into segment registers. */
# define TLS_GET_FS() \
({ int __seg; __asm ("movl %%fs, %0" : "=q" (__seg)); __seg; })
I think I understand the basic operation it is moving the value stored in the fs register to __seg. However, I have some questions:
My understanding is the fs is only 16-bits. Is this correct? What happens when the value gets moved to a quadword memory location? Does this mean the upper bits get set to 0?
More importantly I think that the scope of the variable __seg that gets declared at the start of the segment is limited to this segment. So how is __seg useful? I'm sure that the authors of glibc have a good reason for doing this but I can't figure out what it is from looking at the source code.
I tried generating assembly for this code and I got the following?
#APP
# 13 "fs-test.cpp" 1
movl %fs, %eax
# 0 "" 2
#NO_APP
So in my case it looks like eax was used for __seg. But I don't know if that is always what happens or if it was just what happened in the small test file that I compiled. If it is always going to use eax why wouldn't the assembly be written that way? If the compiler might pick other registers then how will the programmer know which one to access since __seg goes out of scope at the end of the macro? Finally I did not see this macro used anywhere when I grepped for it in the glibc source code, so that further adds to my confusion about what its purpose is. Any explanation about what the code is doing and why is appreciated.
My understanding is the fs is only 16-bits. Is this correct? What happens when the value gets moved to a quadword memory location? Does this mean the upper bits get set to 0?
Yes.
the variable __seg that gets declared at the start of the segment is limited to this segment. So how is __seg useful?
You have to read about GCC statement-expression extension. The value of statement expression is the value of the last expression in it. The __seg; at the end would be useless, unless one assigns it to something else, like this:
int foo = TLS_GET_FS();
Finally I did not see this macro used anywhere when I grepped for it in the glibc source code
The TLS_{GET,SET}_FS in fact do not appear to be used. They probably were used in some version, then accidentally left over when the code referencing them was removed.
I am struggling to find a way to retrieve first character of the first command line argument in GAS. To clarify what I mean here how I do it in NASM:
main:
pop ebx
pop ebx
pop ebx ; get first argument string address into EBX register
cmp byte [ebx], 45 ; compare the first char of the argument string to ASCII dash ('-', dec value 45)
...
EDIT: Literal conversion to AT&T syntax and compiling it in GAS won't produce expected results. EBX value will not be recognized as a character.
I'm not sure to understand why you want, in 2011, to code an entire application in assembly (unless fun is your main motivation, and coding thousands of assembly lines is fun to you).
And if you do that, you probably don't want to call the entry point of your program main (in C on Gnu/Linux, that function is called from crt0.o or similar), but more probably start.
And if you want to understand the detailed way of starting an application in assembly, read the Assembly Howto and the Linux ABI supplement for x86-64 and similar documents for your particular system.
Ok I figured it out myself. Entry point should NOT be called main, but _start. Thanks Basile for a hint, +1.