How to control the order of sections in MSVC x86 compiled binaries? - visual-c++

For some very low level application I require to have the entry point as the lowest/first item in the binary.
To do this I've done the following:
Placed the entry point in "function_ordering.txt" and passed it to the /ORDER linker option.
Generated a map file to see the following sections exist:
Preferred load address is 00040000
Start Length Name Class
0001:00000000 000210fdH .text$di CODE
0001:00021100 002d7b0cH .text$mn CODE
0001:002f8c10 00027874H .text$x CODE
0001:00320490 0000830fH .text$yd CODE
0002:00000000 000005d8H .idata$5 DATA
0002:000005d8 00000004H .CRT$XCA DATA
0002:000005dc 00000004H .CRT$XCAA DATA
0002:000005e0 00000004H .CRT$XCL DATA
0002:000005e4 00001554H .CRT$XCU DATA
0002:00001b38 00000004H .CRT$XCZ DATA
0002:00001b3c 00000004H .CRT$XIA DATA
0002:00001b40 00000004H .CRT$XIAA DATA
0002:00001b44 00000004H .CRT$XIC DATA
0002:00001b48 00000004H .CRT$XIY DATA
0002:00001b4c 00000004H .CRT$XIZ DATA
0002:00001b50 00057fb8H .rdata DATA
And that WinMain was AFTER this data. So I figured I need to put the function in its own section to make it "first".
So I put it in its own section:
#pragma code_seg("test")
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd)
{
return 0;
}
But this put the function even further away! This is because my test section appeared after the other sections.
So then I did this:
#pragma code_seg(".text$di")
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd)
{
return 0;
}
Which resulted in:
Address Publics by Value Rva+Base Lib:Object
0000:00000000 ___guard_fids_count 00000000 <absolute>
0000:00000000 ___guard_flags 00000000 <absolute>
0000:00000000 ___guard_fids_table 00000000 <absolute>
0000:00000000 __except_list 00000000 <absolute>
0000:00000ec7 ___safe_se_handler_count 00000ec7 <absolute>
0000:00009876 __ldused 00009876 <absolute>
0000:00009876 __fltused 00009876 <absolute>
0000:00000000 ___ImageBase 00040000 <linker-defined>
0001:00000000 _WinMain#16 00041000 f lol.obj
BOOM it worked. But this seems very much like a nasty hack.
So the question is.. is there a "supported" or "documented" way to get my section to appear first?

Related

x86 Linux ELF Loader Troubles

I'm trying to write an ELF executable loader for x86-64 Linux, similar to this, which was implemented on ARM. Chris Rossbach's advanced OS class includes a lab that does basically what I want to do. My goal is to load a simple (statically-linked) "hello world" type binary into my process's memory and run it without execveing. I have successfully mmap'd the ELF file, set up the stack, and jumped to the ELF's entry point (_start).
// put ELF file into memory. This is just one line of a complex
// for() loop that loads the binary from a file.
mmap((void*)program_header.p_vaddr, program_header.p_memsz, map, MAP_PRIVATE|MAP_FIXED, elffd, program_header.p_offset);
newstack = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); // Map a page for the stack
if((long)newstack < 0) {
fprintf(stderr, "ERROR: mmap returned error when allocating stack, %s\n", strerror(errno));
exit(1);
}
topstack = (unsigned long*)((unsigned char*)newstack+4096); // Top of new stack
*((unsigned long*)topstack-1) = 0; // Set up the stack
*((unsigned long*)topstack-2) = 0; // with argc, argv[], etc.
*((unsigned long*)topstack-3) = 0;
*((unsigned long*)topstack-4) = argv[1];
*((unsigned long*)topstack-5) = 1;
asm("mov %0,%%rsp\n" // Install new stack pointer
"xor %%rax, %%rax\n" // Zero registers
"xor %%rbx, %%rbx\n"
"xor %%rcx, %%rcx\n"
"xor %%rdx, %%rdx\n"
"xor %%rsi, %%rsi\n"
"xor %%rdi, %%rdi\n"
"xor %%r8, %%r8\n"
"xor %%r9, %%r9\n"
"xor %%r10, %%r10\n"
"xor %%r11, %%r11\n"
"xor %%r12, %%r12\n"
"xor %%r13, %%r13\n"
"xor %%r14, %%r14\n"
:
: "r"(topstack-5)
:"rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10", "r11", "r12", "r13", "r14");
asm("push %%rax\n"
"pop %%rax\n"
:
:
: "rax");
asm("mov %0,%%rax\n" // Jump to the entry point of the loaded ELF file
"jmp *%%rax\n"
:
: "r"(jump_target)
: );
I then step through this code in gdb. I've pasted the first few instructions of the startup code below. Everything works great until the first push instruction (starred). The push causes a segfault.
0x60026000 xor %ebp,%ebp
0x60026002 mov %rdx,%r9
0x60026005 pop %rsi
0x60026006 mov %rsp,%rdx
0x60026009 and $0xfffffffffffffff0,%rsp
0x6002600d * push %rax
0x6002600e push %rsp
0x6002600f mov $0x605f4990,%r8
I have tried:
Using the stack from the original process.
mmaping a new stack (as in the above code): (1) and (2) both cause segfaults.
pushing and poping to/from the stack before jmping to the loaded ELF file. This does not cause a segfault.
Changing the protection flags for the stack in the second mmap to PROT_READ | PROT_WRITE | PROT_EXEC. This doesn't make a difference.
I suspect this maybe has something to do with the segment descriptors (maybe?). It seems like the code from the ELF file that I'm loading does not have write access to the stack segment, no matter where it is located. I have not tried to modify the segment descriptor for the newly loaded binary or change the architectural segment registers. Is this necessary? Does anybody know how to fix this?
It turned out that when I was stepping through the loaded code in gdb, the debugger would consistently blow by the first push instruction when I typed nexti and instead continue execution. It was not in fact the push instruction that was causing the segfault but a much later instruction in the C library start code. The problem was caused by a failed call to mmap in the initial binary load that I didn't error check.
Regarding gdb randomly deciding to continue execution instead of stepping: this can be fixed by loading the symbols from the target executable after jumping to the newly loaded executable.

Reading ELF header of loaded shared object during runtime

I wrote some code to search for a symbol in a shared library's ELF header. The code works if I parse the shared object file stored on my disk.
Now, I wanted to use this code to parse the ELF header of a loaded shared library. As an example the libdl library is mapped into the current process:
b7735000-b7738000 r-xp 00000000 08:01 315560 /lib/i386-linux-gnu/libdl.so.2
b7738000-b7739000 r--p 00002000 08:01 315560 /lib/i386-linux-gnu/libdl.so.2
b7739000-b773a000 rw-p 00003000 08:01 315560 /lib/i386-linux-gnu/libdl.so.2
The (first) mapping of the address contains the ELF header. I tried to read this header and to extract the dlopen symbol in the .dynsym section. However, the header is slightly different from the one of the 'plain' .so file on the disk. For example the offset of the .shstrtab version is 0. Therefore, it is not possible to get the name of a section.
I wanted to ask why the ELF header is changed during loading of the library and where I can find the 'missing' sections. Is it even possible to parse the ELF header after the library was loaded?
Does anybody know any article explaining the layout of a shared library/its ELF header when it is mapped into a process?
Currently I'm using following functions to iterate over the ELF header. If libdl_start points to the memory mapped libdl.so.2 file, the code works fine. However, if it points to the region mapped by the linker, get_dynstr_section does not find the dynstr section.
int get_libdl_functions()
{
Elf32_Ehdr *ehdr = libdl_start;
Elf32_Shdr *shdr, *shdrs_start = (Elf32_Shdr *)(((char *)ehdr) + ehdr->e_shoff);
Elf32_Sym *symbol, *symbols_start;
char *strtab = get_dynstr_section();
int sec_it = 0, sym_it = 0;
rt_info->dlopen = NULL;
rt_info->dlsym = NULL;
if(strtab == NULL)
return -1;
for(sec_it = 0; sec_it < ehdr->e_shnum; ++sec_it) {
// Iterate over all sections to find .dynsym
shdr = shdrs_start + sec_it;
if(shdr->sh_type == SHT_DYNSYM)
{
// Ok we found the right section
symbols_start = (Elf32_Sym *)(((char *)ehdr) + shdr->sh_offset);
for(sym_it = 0; sym_it < shdr->sh_size / sizeof(Elf32_Sym); ++sym_it) {
symbol = symbols_start + sym_it;
if(ELF32_ST_TYPE(symbol->st_info) != STT_FUNC)
continue;
if(strncmp(strtab + symbol->st_name, DL_OPEN_NAME, sizeof DL_OPEN_NAME) && !rt_info->dlopen) {
//printf("Offset of dlopen: 0x%x\n", symbol->st_value);
dlopen = ((char *)ehdr) + symbol->st_value;
} else if(strncmp(strtab + symbol->st_name, DL_SYM_NAME, sizeof DL_SYM_NAME) && !rt_info->dlsym) {
//printf("Offset of dlsym: 0x%x\n", symbol->st_value);
dlsym = ((char *)ehdr) + symbol->st_value;
}
if(dlopen != 0 && dlsym != 0)
return 0;
}
}
}
return -1;
}
void *get_dynstr_section()
{
Elf32_Ehdr *ehdr = libdl_start;
Elf32_Shdr *shdr, *shdrs_start = (Elf32_Shdr *)(((char *)ehdr) + ehdr->e_shoff);
char *strtab = ((char *)ehdr) + ((shdrs_start + ehdr->e_shstrndx))->sh_offset;
int sec_it = 0;
for(sec_it = 0; sec_it < ehdr->e_shnum; ++sec_it) {
// Iterate over all sections to find .dynstr section
shdr = shdrs_start + sec_it;
if(shdr->sh_type == SHT_STRTAB && strncmp(strtab + shdr->sh_name, DYNSTR_NAME, sizeof DYNSTR_NAME))
return ((char *)ehdr) + shdr->sh_offset;
}
return NULL;
}
You do NOT need to mmap the shared library again - the system already did it- but you cannot rely on the section headers. The section headers are only for the linking view of an ELF file and often aren't allocated into a program segment. You will need to look at it from the execution view. The section .dynstr is always loaded into memory. Otherwise dynamic linking wouldn't work. To get at it, go through the program headers to find the PT_DYNAMIC segment. It will have elements DT_SYMTAB and DT_STRTAB that correspond to .dynsym and .dynstr. You may also have to adjust the address values using a base address. It's very common especially with ASLR for shared objects to be mapped at different virtual addresses than they were linked at. You can find this adjustment amount by subtracting the lowest virtual address in a PT_LOAD entry from the lowest mapped segment in the memory map. Or even better use the link map maintained by ld.so. It contains the base address, the path of the shared object, and a pointer to the shared object's dynamic area. Consult for how this is laid out. If you are running Linux, you might be very interested in the function dl_iterate_phdr(). It's great for finding things about the libraries mapped into the current process image. If you want to examine another process you have to roll your own.
why the ELF header is changed during loading of the library
It isn't. Your question is based on false assumption, but since you didn't show any actual code, it's hard to guess what you've done wrong.
Update:
In this code:
*shdrs_start = (Elf32_Shdr *)(((char *)ehdr) + ehdr->e_shoff);
you assume that sections headers are loaded into memory. But sections headers are not required at runtime, and if they end up loaded into memory, it's only by accident.
You need to read them into memory from disk (or mmap them) yourself, using the e_shoff you got from ehdr.

How can I select a static library to be linked while ARM cross compiling?

I have an ARM cross compiler in Ubuntu(arm-linux-gnueabi-gcc) and the default archtecture is ARMv7. However, I want to compile an ARMv5 binary. I do this by giving the compiler the -march=armv5te option.
So far, so good. Since my ARM system uses BusyBox, I have to compile my binary statically linked. So I give gcc the -static option.
However, I have a problem with libc.a which the linker links to my ARMv5 binary. This file is compiled with the ARMv7 architecture option. So, even if I cross-compile my ARM binary with ARMv5, I can't run it on my BusyBox based ARMv5 box.
How can I solve this problem?
Where can I get the ARMv5 libc.a static library, and how can I link it?
Thank you in advance.
You have two choices,
Get the right compiler.
Write your own 'C' Library.
Get the right compiler.
You are always safest to have a compiler match your system. This applies to x86 Linux and various distributions. You are lucky if different compilers work. It is more difficult when you cross-compile as often the compiler will not be automatically synced. Try to run a program on a 1999 x86 Mandrake Linux compiled on your 2014 Ubuntu system.
As well as instruction compatibility (which you have identified), there are ABI and OS dependencies. Specifically, the armv7 is most likely hardfloat (has floating point FPU and register call convention) and you need a softfloat (emulated FPU). The specific glibc (or ucLibc) has specific calls and expectations of the Linux OS. For instance, the way threads works has changed over time.
Write your own
You can always use -fno-builtin and -ffreestanding as well as -static. Then you can not use any libc functions, but you can program them your self.
There are external source, like Mark Martinec's snprintf and building blocks like write() which is easy to implement,
#define _SYS_IOCTL_H 1
#include <linux/unistd.h>
#include <linux/ioctl.h>
static inline int write(int fd, void *buf, int len)
{
int rval;
asm volatile ("mov r0, %1\n\t"
"mov r1, %2\n\t"
"mov r2, %3\n\t"
"mov r7, %4\n\t"
"swi #0\n\t"
"mov %0, r0\n\t"
: "=r" (rval)
: "r" (fd),
"r" (buf),
"r" (len),
"Ir" (__NR_write)
: "r0", "r1", "r2", "r7");
return rval;
}
static inline void exit(int status)
{
asm volatile ("mov r0, %0\n\t"
"mov r7, %1\n\t"
"swi #0\n\t"
: : "r" (status),
"Ir" (__NR_exit)
: "r0", "r7");
}
You have to add your own start-up machinery taken care of by the 'C' library,
/* Called from assembler startup. */
int main (int argc, char*argv[])
{
write(STDOUT, "Hello world\n", sizeof("Hello world\n"));
return 0;
}
/* Wrapper for main return code. */
void __attribute__ ((unused)) estart (int argc, char*argv[])
{
int rval = main(argc,argv);
exit(rval);
}
/* Setup arguments for estart [like main()]. */
void __attribute__ ((naked)) _start (void)
{
asm(" sub lr, lr, lr\n" /* Clear the link register. */
" ldr r0, [sp]\n" /* Get argc... */
" add r1, sp, #4\n" /* ... and argv ... */
" b estart\n" /* Let's go! */
);
}
If this is too daunting, because you need to implement a lot of functionality, then you can try and get various library source and rebuild them with -fno-builtin and make sure that the libraries do not get linked with the Ubuntu libraries, which are incompatible.
Projects like crosstool-ng can allow you to build a correct compiler (maybe with more advanced code generation) that suits the armv5 system exactly. This may seem like a pain, but the alternatives above aren't easy either.

numa_police_memory

I'm debugging NUMACTL on MIPS machine. In numa_police_memory() API, we have:
void numa_police_memory(void *mem, size_t size)
{
int pagesize = numa_pagesize_int();
unsigned long i;
for (i = 0; i < size; i += pagesize)
asm volatile("" :: "r" (((volatile unsigned char *)mem)[i]));
}
It seems "asm volatile("" :: "r" (((volatile unsigned char *)mem)[i]));" is used for reading a VM so that all the memory applied by previous mmap will be allocated onto some specific physical memory. But how does this asm code work? I can't read assembly language! Why is the first double quote empty???
Thanks
Interestingly, there is no assembly code in this snippet at all, though the asm statement is used. It contains a blank assembly "program", an empty list of outputs, and a list of inputs. The input specification forces ((volatile unsigned char *)mem)[i] to be in a register. So all this bit of magic will do is generate a load of the first byte of each page (pre-fault the pages).

Reading x86 MSR from kernel module

My main aim is to get the address values of the last 16 branches maintained by the LBR registers when a program crashes. I tried two ways till now -
1) msr-tools
This allows me to read the msr values from the command line. I make system calls to it from the C program itself and try to read the values. But the register values seem no where related to the addresses in the program itself. Most probably the registers are getting polluted from the other branches in system code. I tried turning off recording of branches in ring 0 and far jumps. But that doesn't help. Still getting unrelated values.
2) accessing through kernel module
Ok I wrote a very simple module (I've never done this before) to access the msr registers directly and possibly avoid register pollution.
Here's what I have -
#define LBR 0x1d9 //IA32_DEBUGCTL MSR
//I first set this to some non 0 value using wrmsr (msr-tools)
static void __init do_rdmsr(unsigned msr, unsigned unused2)
{
uint64_t msr_value;
__asm__ __volatile__ (" rdmsr"
: "=A" (msr_value)
: "c" (msr)
);
printk(KERN_EMERG "%lu \n",msr_value);
}
static int hello_init(void)
{
printk(KERN_EMERG "Value is ");
do_rdmsr (LBR,0);
return 0;
}
static void hello_exit(void)
{
printk(KERN_EMERG "End\n");
}
module_init(hello_init);
module_exit(hello_exit);
But the problem is that every time I use dmesg to read the output I get just
Value is 0
(I have tried for other registers - it always comes as 0)
Is there something that I am forgetting here?
Any help? Thanks
Use the following:
unsigned long long x86_get_msr(int msr)
{
unsigned long msrl = 0, msrh = 0;
/* NOTE: rdmsr is always return EDX:EAX pair value */
asm volatile ("rdmsr" : "=a"(msrl), "=d"(msrh) : "c"(msr));
return ((unsigned long long)msrh << 32) | msrl;
}
You can use Ilya Matveychikov's answer... or... OR :
#include <asm/msr.h>
int err;
unsigned int msr, cpu;
unsigned long long val;
/* rdmsr without exception handling */
val = rdmsrl(msr);
/* rdmsr with exception handling */
err = rdmsrl_safe(msr, &val);
/* rdmsr on a given CPU (instead of current one) */
err = rdmsrl_safe_on_cpu(cpu, msr, &val);
And there are many more functions, such as :
int msr_set_bit(u32 msr, u8 bit)
int msr_clear_bit(u32 msr, u8 bit)
void rdmsr_on_cpus(const struct cpumask *mask, u32 msr_no, struct msr *msrs)
int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
Have a look at /lib/modules/<uname -r>/build/arch/x86/include/asm/msr.h

Resources