Confusion on assembly output of virtual table in Visual C++ 2015

Confusion on assembly output of virtual table in Visual C++ 2015 - visual-c++

I'm confused by the assembly output of Visual C++ 2015 (x86).
I want to know the virtual table layout in VC, so I write the following simple class with a virtual function.
#include <stdio.h>
struct Foo
{
virtual int GetValue()
{
uintptr_t vtbl = *(uintptr_t *)this;
uintptr_t slot0 = ((uintptr_t *)vtbl)[0];
uintptr_t slot1 = ((uintptr_t *)vtbl)[1];
printf("vtbl = 0x%08X\n", vtbl);
printf(" [0] = 0x%08X\n", slot0);
printf(" [1] = 0x%08X\n", slot1);
return 0xA11BABA;
}
};
extern "C" void Check();
int main()
{
Foo *pFoo = new Foo;
int x = pFoo->GetValue();
printf("x = 0x%08X\n", x);
printf("\n");
Check();
}
And to check the layout, I implement an assembly function (the magic name comes from the assembly output vtab.asm of vtab.cpp, and is the mangled version of Foo::GetValue).
.model flat
extern _printf : proc
extern ?GetValue#Foo##UAEHXZ : proc
.const
FUNC_ADDR db "Address of Foo::GetValue = 0x%08X", 10, 0
.code
_Check proc
push ebp
mov esp, ebp
push offset ?GetValue#Foo##UAEHXZ
push offset FUNC_ADDR
call _printf
add esp, 8
pop ebp
ret
_Check endp
end
Then, I compile and run.
ml /c check.asm
cl /Fa vtab.cpp check.obj
vtab
And get the following output on my computer.
vtbl = 0x00FF2174
[0] = 0x00FE1300
[1] = 0x6C627476
x = 0x0A11BABA
Address of Foo::GetValue = 0x00FE1300
It clearly shows the virtual function GetValue is at offset 0 of the virtual table. But the assembly output of vtab.cpp seems to imply GetValue is at offset 4 (see the following comments start with three semicolons).
; COMDAT ??_7Foo##6B#
CONST SEGMENT
??_7Foo##6B# DD FLAT:??_R4Foo##6B# ; Foo::`vftable'
DD FLAT:?GetValue#Foo##UAEHXZ ;;; GetValue at offset 4
CONST ENDS
; Function compile flags: /Odtp
; COMDAT ??0Foo##QAE#XZ
_TEXT SEGMENT
_this$ = -4 ; size = 4
??0Foo##QAE#XZ PROC ; Foo::Foo, COMDAT
; _this$ = ecx
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
mov DWORD PTR [eax], OFFSET ??_7Foo##6B# ;;; Init ptr to virtual table
mov eax, DWORD PTR _this$[ebp]
mov esp, ebp
pop ebp
ret 0
??0Foo##QAE#XZ ENDP ; Foo::Foo
Thanks for your answering!
Update
#Hans Passant This seems to be a bug. I ml /c the assembly output vtab.asm (with a few symbols deletion) and link it with check.obj to get an exe vtab2.exe. But vtab2.exe won't run correctly. Then I modify the following code
; COMDAT ??_7Foo##6B#
CONST SEGMENT
??_7Foo##6B# DD FLAT:??_R4Foo##6B# ; Foo::`vftable'
DD FLAT:?GetValue#Foo##UAEHXZ
CONST ENDS
to
; COMDAT ??_7Foo##6B#
CONST SEGMENT
__NOT_USED_ DD FLAT:??_R4Foo##6B# ; Foo::`vftable'
??_7Foo##6B# DD FLAT:?GetValue#Foo##UAEHXZ
CONST ENDS
and ml and link again to get vtab3.exe. Now vtab3.exe runs correctly and produces an output similar to vtab.exe.

I don't think Microsoft would consider this a bug. Yes, the assembly output should have the vtable symbol on the second element of the vtable so that the RTTI entry appears at offset -4 of the table. However the table should also be in a COMDAT section, but instead there's only a comment in the assembly output (; COMDAT) that indicates this. That's because while the PECOFF object file format supports COMDAT sections, the assembler (MASM, invoked as ml) doesn't. There's no way for the compiler to generate an assembly file that actually corresponds to the contents of the object file it creates.
Or to put it another way, the assembly output isn't meant to be assembled. It's just meant to be informative. Even with your fix applied the assembly output doesn't generate the same object file the compiler does. If you did this in a more realistic project where Foo was used in more than one object file you'd get multiple definition errors when linking. If you want to see the real output of the compiler you need to look at the object file.
For example if you use dumpbin /all vtab.obj and go through its output, you'll see something like:
SECTION HEADER #C
.rdata name
...
40301040 flags
Initialized Data
COMDAT; sym= "const Foo::`vftable'" (??_7Foo##6B#)
4 byte align
Read Only
RAW DATA #C
00000000: 00 00 00 00 00 00 00 00 ........
RELOCATIONS #C
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000000 DIR32 00000000 34 ??_R4Foo##6B# (const Foo::`RTTI Complete Object Locator')
00000004 DIR32 00000000 1F ?GetValue#Foo##UAEHXZ (public: virtual int __thiscall Foo::GetValue(void))
...
COFF SYMBOL TABLE
...
026 00000000 SECTC notype Static | .rdata
Section length 8, #relocs 2, #linenums 0, checksum 0, selection 6 (pick largest)
028 00000004 SECTC notype External | ??_7Foo##6B# (const Foo::`vftable')
It's not easy to understand, but all the information about the actual layout of the vtable is given. The symbol for the vtable, ??_7Foo##6B# (const Foo::`vftable'), is at offset 00000004 of SECTC or section number 0xC. Section #C is 8 bytes long and has relocations for the RTTI locator and Foo::GetValue that are applied at offsets 00000000 and 00000004 of the section. So you can see that in the object file the vtable symbol does in fact point to the entry containing the pointer to the first virtual method.
Open Watcom has a utility that can show you the contents of an object file in a more assembly-like fashion, though notably not in the syntax that MASM uses. Running wdis t279.obj shows:
.new_section .rdata, "dr2"
0000 00 00 00 00 .long ??_R4Foo##6B#
0004 ??_7Foo##6B#:
0004 00 00 00 00 .long ?GetValue#Foo##UAEHXZ

Related

.text segment bigger than .text section in executable. Why?

I have the following 'uppercaser.asm' assembly program in NASM which converts all lowercase letters input from user into uppercase:
section .bss
Buff resb 1
section .data
section .text
global _start
_start:
nop ; This no-op keeps the debugger happy
Read: mov eax,3 ; Specify sys_read call
mov ebx,0 ; Specify File Descriptor 0: Standard Input
mov ecx,Buff ; Pass offset of the buffer to read to
mov edx,1 ; Tell sys_read to read one char from stdin
int 80h ; Call sys_read
cmp eax,0 ; Look at sys_read's return value in EAX
je Exit ; Jump If Equal to 0 (0 means EOF) to Exit
; or fall through to test for lowercase
cmp byte [Buff],61h ; Test input char against lowercase 'a'
jb Write ; If below 'a' in ASCII chart, not lowercase
cmp byte [Buff],7Ah ; Test input char against lowercase 'z'
ja Write ; If above 'z' in ASCII chart, not lowercase
; At this point, we have a lowercase character
sub byte [Buff],20h ; Subtract 20h from lowercase to give uppercase...
; ...and then write out the char to stdout
Write: mov eax,4 ; Specify sys_write call
mov ebx,1 ; Specify File Descriptor 1: Standard output
mov ecx,Buff ; Pass address of the character to write
mov edx,1 ; Pass number of chars to write
int 80h ; Call sys_write...
jmp Read ; ...then go to the beginning to get another character
Exit: mov eax,1 ; Code for Exit Syscall
mov ebx,0 ; Return a code of zero to Linux
int 80H ; Make kernel call to exit program
The program is then assembled with the -g -F stabs option for the debugger and linked for 32-bit executables in ubuntu 18.04.
Running readelf --segments uppercaser for the segments and readelf -S uppercaser for the sections I see a difference in size of text segment and text section.
readelf --segments uppercaser
Elf file type is EXEC (Executable file)
Entry point 0x8048080
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x000db 0x000db R E 0x1000
LOAD 0x0000dc 0x080490dc 0x080490dc 0x00000 0x00004 RW 0x1000
Section to Segment mapping:
Segment Sections...
00 .text
01 .bss
readelf -S uppercaser
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 08048080 000080 00005b 00 AX 0 0 16
[ 2] .bss NOBITS 080490dc 0000dc 000004 00 WA 0 0 4
[ 3] .stab PROGBITS 00000000 0000dc 000120 0c 4 0 4
[ 4] .stabstr STRTAB 00000000 0001fc 000011 00 0 0 1
[ 5] .comment PROGBITS 00000000 00020d 00001f 00 0 0 1
[ 6] .shstrtab STRTAB 00000000 00022c 00003e 00 0 0 1
[ 7] .symtab SYMTAB 00000000 0003d4 0000f0 10 8 11 4
[ 8] .strtab STRTAB 00000000 0004c4 000045 00 0 0 1
In the sections description one can see that the size of .text section is 5Bh=91 bytes (the same number one is getting with the size command) whereas in the segments description we see that the size is 0x000DB, a difference of 128 bytes. Why is that?
From the elf man pages for the Elf32_Phdr (program header) structure:
p_filesz
This member holds the number of bytes in the file image of
the segment. It may be zero.
p_memsz
This member holds the number of bytes in the memory image
of the segment. It may be zero.
Is the difference somehow related to the .bss section?

Notice that the first program segment at file address 0 starts at virtual address 0x08048000, not at VA 0x08048080 which corresponds with the .text section.
In fact the segment displayed by readelf as 00 .text covers ELF file header (52 bytes), alignment, two program headers (2*32 bytes) and the netto contents of .text section, alltogether mapped from file address 0 to VA 0x08048000.

Basic input with x64 assembly code

I am writing a tutorial on basic input and output in assembly. I am using a Linux distribution (Ubuntu) that is 64 bit. For the first part of my tutorial I spoke about basic output and created a simple program like this:
global _start
section .text
_start:
mov rax,1
mov rdi,1
mov rsi,message
mov rdx,13
syscall
mov rax,60
xor rdi,rdi
syscall
section .data
message: db "Hello, World", 10
That works great. The system prints the string and exits cleanly. For the next part of my tutorial, I simply want to read one character in from the keyboard. From my understanding of this web site we change the rdi register to be 0 for a sys_read call.
I first subtract 8 from the current rsp and then load that address into the rsi register. (That is where I want to store the char). When I compile and run my program it appears to work... but the terminal seems to mimick the input I type in again.
Here is the program:
global _start
section .text
_start:
sub rsp,8 ; allocate space on the stack to read
mov rdi,0 ; set rdi to 0 to indicate a system read
mov rsi,[rsp-8]
mov rdx,1
syscall
mov rax,1
mov rdi,1
mov rsi,message
mov rdx,13
syscall
mov rax,60
xor rdi,rdi
syscall
section .data
message: db "Hello, World", 10
and this is what happens in my terminal...
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ nasm -felf64 rps.asm && ld rps.o && ./a.out
5
Hello, World
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ 5
5: command not found
matthew#matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$
The input 5 is repeated back to the terminal after the program has exited. What is the proper way to read in a single char using NASM and Linux x64?

In your first code section you have to set the SYS_CALL to 0 for SYS_READ (as mentioned rudimentically in the other answer).
So check a Linux x64 SYS_CALL list for the appropriate parameters and try
_start:
mov rax, 0 ; set SYS_READ as SYS_CALL value
sub rsp, 8 ; allocate 8-byte space on the stack as read buffer
mov rdi, 0 ; set rdi to 0 to indicate a STDIN file descriptor
lea rsi, [rsp] ; set const char *buf to the 8-byte space on stack
mov rdx, 1 ; set size_t count to 1 for one char
syscall

it appears to work... but the terminal seems to mimick the input I type in again.
No, the 5 + newline that bash reads is the one you typed. Your program waited for input but didn't actually read the input, leaving it in the kernel's terminal input buffer for bash to read after your program exited. (And bash does its own echoing of terminal input because it puts the terminal in no-echo mode before reading; the normal mechanism for characters to appear on the command line as you type is for bash to print what it reads.)
How did your program manage to wait for input without reading any? mov rsi, [rsp-8] loads 8 bytes from that address. You should have used lea to set rsi to point to that location instead of loading what was in that buffer. So read fails with -EFAULT instead of reading anything, but interestingly it doesn't check this until after waiting for there to be some terminal input.
I used strace ./foo to trace system calls made by your program:
execve("./foo", ["./foo"], 0x7ffe90b8e850 /* 51 vars */) = 0
read(0, 5
NULL, 1) = -1 EFAULT (Bad address)
write(1, "Hello, World\n", 13Hello, World
) = 13
exit(0) = ?
+++ exited with 0 +++
Normal terminal input/output is mixed with the strace output; I could have used -o foo.trace or whatever. The cleaned-up version of the read system call trace (without the 5\n mixed in) is:
read(0, NULL, 1) = -1 EFAULT (Bad address)
So (as expected for _start in a static executable under Linux), the memory below RSP was zeroed. But anything that isn't a pointer to writeable memory would have produced the same result.
zx485's answer is correct but inefficient (large code-size and an extra instruction). You don't need to worry about efficiency right away, but it's one of the main reasons for doing anything with asm and there's interesting stuff to say about this case.
You don't need to modify RSP; you can use the red-zone (memory below RSP) because you don't need to make any function calls. This is what you were trying to do with rsp-8, I think. (Or else you didn't realize that it was only safe because of special circumstances...)
The read system call's signature is
ssize_t read(int fd, void *buf, size_t count);
so fd is an integer arg, so it's only looking at edi not rdi. You don't need to write the full rdi, just the regular 32-bit edi. (32-bit operand-size is usually the most efficient thing on x86-64).
But for zero or positive integers, just setting edi also sets rdi anyway. (Anything you write to edi is zero-extended into the full rdi) And of course zeroing a register is best done with xor same,same; this is probably the best-known x86 peephole optimization trick.
As the OP later commented, reading only 1 byte will leave the newline unread, when the input is 5\n, and that would make bash read it and print an extra prompt. We can bump up the size of the read and the space for the buffer to 2 bytes. (There'd be no downside to using lea rsi, [rsp-8] and leave a gap; I'm using lea rsi, [rsp-2] to pack the buffer right below argc on the stack, or below the return value if this was a function instead of a process entry point. Mostly to show exactly how much space is needed.)
; One read of up to 2 characters
; giving the user room to type a digit + newline
_start:
;mov eax, 0 ; set SYS_READ as SYS_CALL value
xor eax, eax ; rax = __NR_read = 0 from unistd_64.h
lea rsi, [rsp-2] ; rsi = buf = rsp-2
xor edi, edi ; edi = fd = 0 (stdin)
mov edx, 2 ; rdx = count = 2 char
syscall ; sys_read(0, rsp-2, 2)
; total = 16 bytes
This assembles like so:
+ yasm -felf64 -Worphan-labels -gdwarf2 foo.asm
+ ld -o foo foo.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
$ objdump -drwC -Mintel
0000000000400080 <_start>:
400080: 31 c0 xor eax,eax
400082: 48 8d 74 24 ff lea rsi,[rsp-0x1]
400087: 31 ff xor edi,edi
400089: ba 01 00 00 00 mov edx,0x1
40008e: 0f 05 syscall
; next address = ...90
; I left out the rest of the program so you can't actually *run* foo
; but I used a script that assembles + links, and disassembles the result
; The linking step is irrelevant for just looking at the code here.
By comparison, zx485's answer assembles to 31 bytes. Code size is not the most important thing, but when all else is equal, smaller is better for L1i cache density, and sometimes decode efficiency. (And my version has fewer instructions, too.)
0000000000400080 <_start>:
400080: 48 c7 c0 00 00 00 00 mov rax,0x0
400087: 48 83 ec 08 sub rsp,0x8
40008b: 48 c7 c7 00 00 00 00 mov rdi,0x0
400092: 48 8d 34 24 lea rsi,[rsp]
400096: 48 c7 c2 01 00 00 00 mov rdx,0x1
40009d: 0f 05 syscall
; total = 31 bytes
Note how those mov reg,constant instructions use the 7-byte mov r64, sign_extended_imm32 encoding. (NASM optimizes those to 5-byte mov r32, imm32 for a total of 25 bytes, but it can't optimize mov to xor because xor affects flags; you have to do that optimization yourself.)
Also, if you are going to modify RSP to reserve space, you only need mov rsi, rsp not lea. Only use lea reg1, [rsp] (with no displacement) if you're padding your code with longer instructions instead of using a NOP for alignment. For source registers other than rsp or rbp, lea won't be longer but it is still slower than mov. (But by all means use lea to copy-and-add. I'm just saying it's pointless when you can replace it with a mov.)
You could save even more space by using lea edx, [rax+1] instead of mov edx,1 at essentially no performance cost, but that's not something compilers normally do. (Although perhaps they should.)

You need to set eax to the system call number for read.

How shared library finds GOT section?

While I was reading http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/#id1
question came:
How does PIC shared library after being loaded somewhere in virtual address space of the process knows how to reference external variables?
Here is code of shared library in question:
#include <stdio.h>
extern long var;
void
shara_func(void)
{
printf("%ld\n", var);
}
Produce object code, then shared object(library):
gcc -fPIC -c lib1.c # produce PIC lib1.o
gcc -fPIC -shared lib1.o -o liblib1.so # produce PIC shared library
Disassemble shara_func in shared library:
objdump -d liblib1.so
...
00000000000006d0 <shara_func>:
6d0: 55 push %rbp
6d1: 48 89 e5 mov %rsp,%rbp
6d4: 48 8b 05 fd 08 20 00 mov 0x2008fd(%rip),%rax # 200fd8 <_DYNAMIC+0x1c8>
6db: 48 8b 00 mov (%rax),%rax
6de: 48 89 c6 mov %rax,%rsi
6e1: 48 8d 3d 19 00 00 00 lea 0x19(%rip),%rdi # 701 <_fini+0x9>
6e8: b8 00 00 00 00 mov $0x0,%eax
6ed: e8 be fe ff ff callq 5b0 <printf#plt>
6f2: 90 nop
6f3: 5d pop %rbp
6f4: c3 retq
...
I see that instruction at 0x6d4 address moves some address that is relative to PC to rax, I suppose that is the entry in GOT, GOT referenced relatively from PC to get address of external variable var at runtime(it is resolved at runtime depending where var was loaded).
Then after executing instruction at 0x6db we get external variable's actual content placed in rax, then move value from rax to rsi - second function parameter passed in register.
I was thinking that there is only one GOT in process memory, however,
see that library references GOT? How shared library knows offset to process's GOT when it(PIC library) does not know where in process memory it would be loaded? Or does each shared library has its own GOT that is loaded with her? I would be very glad if you clarify my confusion.

I was thinking that there is only one GOT in process memory, however, see that library references GOT?
We clearly see .got section as part of the library. With readelf we can find what are the sections of the library and how they are loaded:
readelf -e liblib1.so
...
Section Headers:
[21] .got PROGBITS 0000000000200fd0 00000fd0
0000000000000030 0000000000000008 WA 0 0 8
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000000078c 0x000000000000078c R E 200000
LOAD 0x0000000000000df8 0x0000000000200df8 0x0000000000200df8
0x0000000000000230 0x0000000000000238 RW 200000
...
Section to Segment mapping:
Segment Sections...
00 ... .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame
01 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
02 .dynamic
So, there is section .got, but runtime linker ld-linux.so.2 (registered as interpreter for dynamic ELFs) does not load sections; it loads segments as described by Program header with LOAD type. .got is part of segment 01 LOAD with RW flags. Other library will have own GOT (think about compiling liblib2.so from the similar source, it will not know anything about liblib1.so and will have own GOT); so it is "Global" only for the library; but not to the whole program image in memory after loading.
How shared library knows offset to process's GOT when it(PIC library) does not know where in process memory it would be loaded?
It is done by static linker when it takes several ELF objects and combine them all into one library. Linker will generate .got section and put it to some place with known offset from the library code (pc-relative, rip-relative). It writes instructions to program header, so the relative address is known and it is the only needed address to access own GOT.
When objdump is used with -r / -R flags, it will print information about relocations (static / dynamic) recorded in the ELF file or library; it can be combined with -d flag. lib1.o object had relocation here; no known offset to GOT, mov has all zero:
$ objdump -dr lib1.o
lib1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <shara_func>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # b <shara_func+0xb>
7: R_X86_64_REX_GOTPCRELX var-0x4
b: 48 8b 00 mov (%rax),%rax
e: 48 89 c6 mov %rax,%rsi
In library file this was converted to relative address by gcc -shared (it calls ld variant collect2 inside):
$ objdump -d liblib1.so
liblib1.so: file format elf64-x86-64
00000000000006d0 <shara_func>:
6d0: 55 push %rbp
6d1: 48 89 e5 mov %rsp,%rbp
6d4: 48 8b 05 fd 08 20 00 mov 0x2008fd(%rip),%rax # 200fd8 <_DYNAMIC+0x1c8>
And finally, there is dynamic relocation into GOT to put here actual address of var (done by rtld - ld-linux.so.2):
$ objdump -R liblib1.so
liblib1.so: file format elf64-x86-64
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
...
0000000000200fd8 R_X86_64_GLOB_DAT var
Let's use your lib, adding executable with definition, compiling it and running with rtld debugging enabled:
$ cat main.c
long var;
int main(){
shara_func();
return 0;
}
$ gcc main.c -llib1 -L. -o main -Wl,-rpath=`pwd`
$ LD_DEBUG=all ./main 2>&1 |less
...
311: symbol=var; lookup in file=./main [0]
311: binding file /test3/liblib1.so [0] to ./main [0]: normal symbol `var'
So, linker was able to bind relocation for var to the "main" ELF file where it is defined:
$ gdb -q ./main
Reading symbols from ./main...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x4006da
(gdb) r
Starting program: /test3/main
Breakpoint 1, 0x00000000004006da in main ()
(gdb) disassemble shara_func
Dump of assembler code for function shara_func:
0x00007ffff7bd56d0 <+0>: push %rbp
0x00007ffff7bd56d1 <+1>: mov %rsp,%rbp
0x00007ffff7bd56d4 <+4>: mov 0x2008fd(%rip),%rax # 0x7ffff7dd5fd8
0x00007ffff7bd56db <+11>: mov (%rax),%rax
0x00007ffff7bd56de <+14>: mov %rax,%rsi
No changes in mov in your func. rax after func+4 is 0x601040, it is third mapping of ./main according to /proc/$pid/maps:
00601000-00602000 rw-p 00001000 08:07 6691394 /test3/main
And it was loaded from main after this program header (readelf -e ./main)
LOAD 0x0000000000000df0 0x0000000000600df0 0x0000000000600df0
0x0000000000000248 0x0000000000000258 RW 200000
It is part of .bss section:
[26] .bss NOBITS 0000000000601038 00001038
0000000000000010 0000000000000000 WA 0 0 8
After stepping to func+11, we can check value in GOT:
(gdb) b shara_func
(gdb) r
(gdb) si
0x00007ffff7bd56db in shara_func () from /test3/liblib1.so
1: x/i $pc
=> 0x7ffff7bd56db <shara_func+11>: mov (%rax),%rax
(gdb) p $rip+0x2008fd
$6 = (void (*)()) 0x7ffff7dd5fd8
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00601040 0x00000000
Who did write correct value to this GOT entry?
(gdb) watch *0x7ffff7dd5fd8
Hardware watchpoint 2: *0x7ffff7dd5fd8
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /test3/main
Hardware watchpoint 2: *0x7ffff7dd5fd8
Old value = <unreadable>
New value = 6295616
0x00007ffff7de36bf in elf_machine_rela (..) at ../sysdeps/x86_64/dl-machine.h:435
(gdb) bt
#0 0x00007ffff7de36bf in elf_machine_rela (...) at ../sysdeps/x86_64/dl-machine.h:435
#1 elf_dynamic_do_Rela (...) at do-rel.h:137
#2 _dl_relocate_object (...) at dl-reloc.c:258
#3 0x00007ffff7ddaf5b in dl_main (...) at rtld.c:2072
#4 0x00007ffff7df0462 in _dl_sysdep_start (start_argptr=start_argptr#entry=0x7fffffffde20,
dl_main=dl_main#entry=0x7ffff7dd89a0 <dl_main>) at ../elf/dl-sysdep.c:249
#5 0x00007ffff7ddbe7a in _dl_start_final (arg=0x7fffffffde20) at rtld.c:307
#6 _dl_start (arg=0x7fffffffde20) at rtld.c:413
#7 0x00007ffff7dd7cc8 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00601040 0x00000000
Runtime linker of glibc did (rtld.c), just before calling main - here is the source (bit different version) - http://code.metager.de/source/xref/gnu/glibc/sysdeps/x86_64/dl-machine.h
329 case R_X86_64_GLOB_DAT:
330 case R_X86_64_JUMP_SLOT:
331 *reloc_addr = value + reloc->r_addend;
332 break;
With reverse stepping we can get history of code and old value = 0:
(gdb) b _dl_relocate_object
(gdb) r
(gdb) dis 3
(gdb) target record-full
(gdb) c
(gdb) disp/i $pc
(gdb) rsi
(gdb) rsi
(gdb) rsi
(gdb) x/2x 0x7ffff7dd5fd8
0x7ffff7dd5fd8: 0x00000000 0x00000000
=> 0x7ffff7de36b8 <_dl_relocate_object+1560>: add 0x10(%rbx),%rax
=> 0x7ffff7de36bc <_dl_relocate_object+1564>: mov %rax,(%r10)
=> 0x7ffff7de36bf <_dl_relocate_object+1567>: nop

Can _start be the thumb function?

Help me please with gnu assembler for arm926ejs cpu.
I try to build a simple program(test.S):
.global _start
_start:
mov r0, #2
bx lr
and success build it:
arm-none-linux-gnueabi-as -mthumb -o test.o test.S
arm-none-linux-gnueabi-ld -o test test.o
but when I run the program in the arm target linux environment, I get an error:
./test
Segmentation fault
What am I doing wrong?
Can _start function be the thumb func?
or
It is always arm func?

Can _start be a thumb function (in a Linux user program)?
Yes it can. The steps are not as simple as you may believe.
Please use the .code 16 as described by others. Also look at ARM Script predicate; my answer shows how to detect a thumb binary. The entry symbol must have the traditional _start+1 value or Linux will decide to call your _start in ARM mode.
Also your code is trying to emulate,
int main(void) { return 2; }
The _start symbol must not do this (as per auselen). To do _start to main() in ARM mode you need,
#include <linux/unistd.h>
static inline void exit(int status)
{
asm volatile ("mov r0, %0\n\t"
"mov r7, %1\n\t"
"swi #7\n\t"
: : "r" (status),
"Ir" (__NR_exit)
: "r0", "r7");
}
/* Wrapper for main return code. */
void __attribute__ ((unused)) estart (int argc, char*argv[])
{
int rval = main(argc,argv);
exit(rval);
}
/* Setup arguments for estart [like main()]. */
void __attribute__ ((naked)) _start (void)
{
asm(" sub lr, lr, lr\n" /* Clear the link register. */
" ldr r0, [sp]\n" /* Get argc... */
" add r1, sp, #4\n" /* ... and argv ... */
" b estart\n" /* Let's go! */
);
}
It is good to clear the lr so that stack traces will terminate. You can avoid the argc and argv processing if you want. The start shows how to work with this. The estart is just a wrapper to convert the main() return code to an exit() call.
You need to convert the above assembler to Thumb equivalents. I would suggest using gcc inline assembler. You can convert to pure assembler source if you get inlines to work. However, doing this in 'C' source is probably more practical, unless you are trying to make a very minimal executable.
Helpful gcc arguements are,
-nostartfiles -static -nostdlib -isystem <path to linux user headers>
Add -mthumb and you should have a harness for either mode.

Your problem is you end with
bx lr
and you expect Linux to take over after that. That exact line must be the cause of Segmentation fault.
You can try to create a minimal executable then try to bisect it to see the guts and understand how an executable is expected to behave.
See below for a working example:
.global _start
.thumb_func
_start:
mov r0, #42
mov r7, #1
svc #0
compile with
arm-linux-gnueabihf-as start.s -o start.o && arm-linux-gnueabihf-ld
start.o -o start_test
and dump to see the guts
$ arm-linux-gnueabihf-readelf -a -W start_test
Now you should notice the odd address of _start
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x8055
Start of program headers: 52 (bytes into file)
Start of section headers: 160 (bytes into file)
Flags: 0x5000000, Version5 EABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 3
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00008054 000054 000006 00 AX 0 0 4
[ 2] .ARM.attributes ARM_ATTRIBUTES 00000000 00005a 000014 00 0 0 1
[ 3] .shstrtab STRTAB 00000000 00006e 000031 00 0 0 1
[ 4] .symtab SYMTAB 00000000 000190 0000e0 10 5 6 4
[ 5] .strtab STRTAB 00000000 000270 000058 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00008000 0x00008000 0x0005a 0x0005a R E 0x8000
Section to Segment mapping:
Segment Sections...
00 .text
There is no dynamic section in this file.
There are no relocations in this file.
There are no unwind sections in this file.
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00008054 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 FILE LOCAL DEFAULT ABS start.o
4: 00008054 0 NOTYPE LOCAL DEFAULT 1 $t
5: 00000000 0 FILE LOCAL DEFAULT ABS
6: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 _bss_end__
7: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 __bss_start__
8: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 __bss_end__
9: 00008055 0 FUNC GLOBAL DEFAULT 1 _start
10: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 __bss_start
11: 0001005c 0 NOTYPE GLOBAL DEFAULT 1 __end__
12: 0001005a 0 NOTYPE GLOBAL DEFAULT 1 _edata
13: 0001005c 0 NOTYPE GLOBAL DEFAULT 1 _end
No version information found in this file.
Attribute Section: aeabi
File Attributes
Tag_CPU_arch: v4T
Tag_THUMB_ISA_use: Thumb-1

here answer.
Thanks for all.
http://stuff.mit.edu/afs/sipb/project/egcs/src/egcs/gcc/config/arm/README-interworking
Calls via function pointers should use the BX instruction if the call is made in ARM mode:
.code 32
mov lr, pc
bx rX
This code sequence will not work in Thumb mode however, since the mov instruction will not set the bottom bit of the lr register. Instead a branch-and-link to the _call_via_rX functions should be used instead:
.code 16
bl _call_via_rX
where rX is replaced by the name of the register containing the function address.

memory access when writing a linux kernel module in assembler

i try writing a kernel module in assembler. In one time i needed a global vars. I define a dword in .data (or .bss) section, and in init function i try add 1 to var. My program seccesfully make, but insmod sey me:
$ sudo insmod ./test.ko
insmod: ERROR: could not insert module ./test.ko: Invalid module format
it's my assembler code in nasm:
[bits 64]
global init
global cleanup
extern printk
section .data
init_mess db "Hello!", 10, 0
g_var dd 0
section .text
init:
push rbp
mov rbp, rsp
inc dword [g_var]
mov rdi, init_mess
xor rax, rax
call printk
xor rax, rax
mov rsp, rbp
pop rbp
ret
cleanup:
xor rax, rax
ret
if i write adding in C code, all work good:
static i = 0;
static int __init main_init(void) { i++; return init(); }
But in this objdump -d test.ko write a very stainght code for me:
0000000000000000 <init_module>:
0: 55 push %rbp
1: ff 05 00 00 00 00 incl 0x0(%rip) # 7 <init_module+0x7>
7: 48 89 e5 mov %rsp,%rbp
a: e8 00 00 00 00 callq f <init_module+0xf>
f: 5d pop %rbp
10: c3 retq
What does this mean (incl 0x0(%rip))? How can I access memory? Please, help me :)
(My system is archlinux x86_64)
my C code for correct make a module:
#include <linux/module.h>
#include <linux/init.h>
MODULE_AUTHOR("Actics");
MODULE_DESCRIPTION("Description");
MODULE_LICENSE("GPL");
extern int init(void);
extern int cleanup(void);
static int __init main_init(void) { return init(); }
static void __exit main_cleanup(void) { cleanup(); }
module_init(main_init);
module_exit(main_cleanup);
and my Makefile:
obj-m := test.o
test-objs := inthan.o module.o
KVERSION = $(shell uname -r)
inthan.o: inthan.asm
nasm -f elf64 -o $# $^
build:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules

Kernel mode lives in the "negative" (ie. top) part of the address space, where 32 bit absolute addresses can not be used (because they are not sign-extended). As you have noticed, gcc uses rip-relative addresses to work around this problem which gives offsets from the current instruction pointer. You can make nasm do the same by using the DEFAULT REL directive. See the relevant section in the nasm documentation.

you can always use inline assembly
asm("add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0" \
54 : "=&r" (flag), "=r" (roksum) \
55 : "1" (addr), "g" ((long)(size)), \
56 "rm" (limit));

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string