Trying to load position independent code on cortex-m3

Trying to load position independent code on cortex-m3 - position

I have an embedded application which will have a bootloader which will decide to run 1 of two applications directly from internal flash. I am trying to make these apps position independent so that they both can be compiled for the same base address. There is no operating system, so no dynamic linker is available. So far I have tried building with -fpie option (using gcc) with not too much success. The function calls appear to be correct but the global data does not have the correct address. The locally defined global data seems to have it's address offset by the amount that the app is offset from its original base address. The global data which is declared in other files has a completely wrong address(and if I build with -fpic then both the locally declared global data and global data in other files are completely wrong). I suspect I need to do some manipulation of the GOT section when my app starts but I am not sure.

I finally got it working. It looks like I need to do the following:
All code needs to be complied with -fpic (previously I was trying -fpie)
Also I had my linker script needed modification. I was forcing the GOT into the sram section and it was located after the dynamic section which was in flash. Looks like everything works properly if the GOT section is located prior to the dynamic section in the flash. Not sure why this matters but it seemed to fix everything - prior to this it was as if the code did not locate the GOT properly since the GOT had the correct values stored in it but the address of all my variables were incorrect.

PIE (and PIC) code needs a relocation process after loading at some address (different from default) and before it will be runned. I suggest you consult the code of ld.so. Also, you should check the relocations table in your binary (e.g. using readelf -r).
Here is a good presentation on PIE (it is about OpenBSD, but the process is same). http://www.openbsd.org/papers/nycbsdcon08-pie/ or http://www.dcbsdcon.org/speakers/slides/miller_dcbsdcon2009.pdf
I guess you should not only to change a GOT, but to also to find all Relocations and to do them.
Basically, processing of PIE binary by ld.so is almost the same as processing a dynamic library with PIC, with relocating not a library, but the executable image itself.
The "Wrong addresses" you see is a place, where an actual value would be written by relocation solving. As for i386 http://books.google.com/books?id=Id9cYsIdjIwC&pg=PA174 there are relocations:
R_386_GOTPC
R_386_GOT32
R_386_GOTOFF
R_386_RELATIVE
Linker should resolve all of them before the code can access a global data.
Readelf -r sample:
Dynamically linked one
$ readelf -r fdyn
Relocation section '.rel.dyn' at offset 0x27c contains 1 entries:
Offset Info Type Sym.Value Sym. Name
08049ff0 00000106 R_386_GLOB_DAT 00000000 __gmon_start__
Relocation section '.rel.plt' at offset 0x284 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
0804a000 00000107 R_386_JUMP_SLOT 00000000 __gmon_start__
0804a004 00000207 R_386_JUMP_SLOT 00000000 __libc_start_main
PIE:
$ readelf -r fPIE
Relocation section '.rel.dyn' at offset 0x388 contains 6 entries:
Offset Info Type Sym.Value Sym. Name
00001fe8 00000008 R_386_RELATIVE
00001ff0 00000008 R_386_RELATIVE
00002010 00000008 R_386_RELATIVE
00001fe0 00000106 R_386_GLOB_DAT 00000000 __gmon_start__
00001fe4 00000206 R_386_GLOB_DAT 00000000 _Jv_RegisterClasses
00001fec 00000406 R_386_GLOB_DAT 00000000 __cxa_finalize
Relocation section '.rel.plt' at offset 0x3b8 contains 3 entries:
Offset Info Type Sym.Value Sym. Name
00002000 00000107 R_386_JUMP_SLOT 00000000 __gmon_start__
00002004 00000307 R_386_JUMP_SLOT 00000000 __libc_start_main
00002008 00000407 R_386_JUMP_SLOT 00000000 __cxa_finalize

Related

implementation of elf packer for shared libraries on aarch64

I had implemented an elf packer for executable for aarch64.
Now, I would like to pack a shared library.
For executable binaries, the entry point is replaced with the address
of the loader. While the job of the loader is to decrypt the .text section
and finally jump to the decrypted .text section.
However, there is no entry point for shared libraries.
Where should my loader be placed ?

The elf loader can be triggered by patching the entry of .rela.dyn with the address of the loader.
https://github.com/0xN3utr0n/Noteme/blob/master/injection.c
This trick has been tested on x86_64 and x86_86.
But failed on aarch64.

Can I do `ret` instruction from code at _start in MacOS? Linux?

I am wondering if it is legal to return with ret from a program's entry point.
Example with NASM:
section .text
global _start
_start:
ret
; Linux: nasm -f elf64 foo.asm -o foo.o && ld foo.o
; OS X: nasm -f macho64 foo.asm -o foo.o && ld foo.o -lc -macosx_version_min 10.12.0 -e _start -o foo
ret pops a return address from the stack and jumps to it.
But are the top bytes of the stack a valid return address at the program entry point, or do I have to call exit?
Also, the program above does not segfault on OS X. Where does it return to?

MacOS Dynamic Executables
When you are using MacOS and link with:
ld foo.o -lc -macosx_version_min 10.12.0 -e _start -o foo
you are getting a dynamically loaded version of your code. _start isn't the true entry point, the dynamic loader is. The dynamic loader as one of its last steps does C/C++/Objective-C runtime initialization, and then calls your specified entry point specified with the -e option. The Apple documentation about Forking and Executing the Process has these paragraphs:
A Mach-O executable file contains a header consisting of a set of load commands. For programs that use shared libraries or frameworks, one of these commands specifies the location of the linker to be used to load the program. If you use Xcode, this is always /usr/lib/dyld, the standard OS X dynamic linker.
When you call the execve routine, the kernel first loads the specified program file and examines the mach_header structure at the start of the file. The kernel verifies that the file appear to be a valid Mach-O file and interprets the load commands stored in the header. The kernel then loads the dynamic linker specified by the load commands into memory and executes the dynamic linker on the program file.
The dynamic linker loads all the shared libraries that the main program links against (the dependent libraries) and binds enough of the symbols to start the program. It then calls the entry point function. At build time, the static linker adds the standard entry point function to the main executable file from the object file /usr/lib/crt1.o. This function sets up the runtime environment state for the kernel and calls static initializers for C++ objects, initializes the Objective-C runtime, and then calls the program’s main function
In your case that is _start. In this environment where you are creating a dynamically linked executable you can do a ret and have it return back to the code that called _start which does an exit system call for you. This is why it doesn't crash. If you review the generated object file with gobjdump -Dx foo you should get:
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000001 0000000000001fff 0000000000001fff 00000fff 2**0
CONTENTS, ALLOC, LOAD, CODE
SYMBOL TABLE:
0000000000001000 g 03 ABS 01 0010 __mh_execute_header
0000000000001fff g 0f SECT 01 0000 [.text] _start
0000000000000000 g 01 UND 00 0100 dyld_stub_binder
Disassembly of section .text:
0000000000001fff <_start>:
1fff: c3 retq
Notice that start address is 0. And the code at 0 is dyld_stub_binder. This is the dynamic loader stub that eventually sets up a C runtime environment and then calls your entry point _start. If you don't override the entry point it defaults to main.
MacOS Static Executables
If however you build as a static executable, there is no code executed before your entry point and ret should crash since there is no valid return address on the stack. In the documentation quoted above is this:
For programs that use shared libraries or frameworks, one of these commands specifies the location of the linker to be used to load the program.
A statically built executable doesn't use the dynamic loader dyld with crt1.o embedded in it. CRT = C runtime library which covers C++/Objective-C as well on MacOS. The processes of dealing with dynamic loading are not done, C/C++/Objective-C initialization code is not executed, and control is transferred directly to your entry point.
To build statically drop the -lc (or -lSystem) from the linker command and add -static option:
ld foo.o -macosx_version_min 10.12.0 -e _start -o foo -static
If you run this version it should produce a segmentation fault. gobjdump -Dx foo produces
start address 0x0000000000001fff
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000001 0000000000001fff 0000000000001fff 00000fff 2**0
CONTENTS, ALLOC, LOAD, CODE
1 LC_THREAD.x86_THREAD_STATE64.0 000000a8 0000000000000000 0000000000000000 00000198 2**0
CONTENTS
SYMBOL TABLE:
0000000000001000 g 03 ABS 01 0010 __mh_execute_header
0000000000001fff g 0f SECT 01 0000 [.text] _start
Disassembly of section .text:
0000000000001fff <_start>:
1fff: c3 retq
You should notice start_address is now 0x1fff. 0x1fff is the entry point you specified (_start). There is no dynamic loader stub as an intermediary.
Linux
Under Linux when you specify your own entry point it will segmentation fault whether you are building as a static or shared executable. There is good information on how ELF executables are run on Linux in this article and the dynamic linker documentation. The key point that should be observed is that the Linux one makes no mention of doing C/C++/Objective-C runtime initialisation unlike the MacOS dynamic linker documentation.
The key difference between the Linux dynamic loader (ld.so) and the MacOS one (dynld) is that the MacOS dynamic loader performs C/C++/Objective-C startup initialization by including the entry point from crt1.o. The code in crt1.o then transfers control to the entry point you specified with -e (default is main). In Linux the dynamic loader makes no assumption about the type of code that will be run. After the shared objects are processed and initialized control is transferred directly to the entry point.
Stack Layout at Process Creation
FreeBSD (on which MacOS is based) and Linux share one thing in common. When loading 64-bit executables the layout of the user stack when a process is created is the same. The stack for 32-bit processes is similar but pointers and data are 4 bytes wide, not 8.
Although there isn't a return address on the stack, there is other data representing the number of arguments, the arguments, environment variables, and other information. This layout is not the same as what the main function in C/C++ expects. It is part of the C startup code to convert the stack at process creation to something compatible with the C calling convention and the expectations of the function main (argc, argv, envp).
I wrote more information on this subject in this Stackoverflow answer that shows how a statically linked MacOS executable can traverse through the program arguments passed by the kernel at process creation.

Suplementing what Michael Petch already answered:
from runnable Mach-o executable perspective program launch happens either due to load command LC_MAIN (most modern executables since 10.7) which utilises DYLD in the process or backward compatible load command LC_UNIXTHREAD. The former is the variant where your ret is allowed and in fact preferable because you return control to DYLD __mh_execute_header. This will be followed by a buffer flush.
Alternatively to ret you can use system exit call either through undocumented syscall kernel API (64bit, int 0x80 for 32bit) or DYLD wrapper C lib doing it(documented).
If your executable is not utilising LC_MAIN you're left with legacy LC_UNIXTHREAD where you have no alternative to system exit call , ret will cause a segmentation fault.

How to list dependencies of c/c++ static library?

For a static library (.a file), how to list the module-level dependencies of it?
I know for a shared library (.so), we can use objdump or readelf to do this:
objdump -p test.so
or
readelf -d test.so
I can get something like
NEEDED libOne.so
NEEDED libc.so.6
But for a static library, I can only get the dependencies in symbol-level, for example, by running
objdump -T test.a
I will get some thing like:
00000000 DF UND 00000000 QByteArray::mid(int, int) const
00000000 DF UND 00000000 QUrl::fromEncoded(QByteArray const&)
00000000 DF UND 00000000 QFileInfo::fileName() const
But I need the information in module-level, does anyone know how to get that information?

A static library have no such list of dependencies.
A static library is nothing more than an archive of object files. And as object files doesn't know what libraries they depend on, neither can a static library.

#Some programmer dude is right, but what you can do is build a simple program using this library statically and then check with ldd -v what are the dependencies.

linux application get Killed

I have a "Seagate Central" NAS with an embedded linux on it
$ cat /etc/*release
MontaVista Linux 6, (.dev-snapshot-20130726)
When I try to run my own application on this NAS, it will be "Killed"
without any notifications on dmesg or /var/log/messages
$ cat /proc/cpuinfo
Processor : ARMv6-compatible processor rev 4 (v6l)
BogoMIPS : 279.34
Features : swp half thumb fastmult vfp edsp java
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xb02
CPU revision : 4
Hardware : Cavium Networks CNS3420 Validation Board
Revision : 0000
Serial : 0000000000000000
My toolchain is
Sourcery_CodeBench_Lite_for_ARM_GNU_Linux/arm-none-linux-gnueabi
and my compile switches are
-march=armv6k -mcpu=mpcore -mfloat-abi=softfp -mfpu=vfp
How can I find out which process is killing my application, or what setting I have to change?
PS: I have created a simple HelloWorld application which is also not working !
$ ldd Hello
$ not a dynamic executable
readelf -a Hello
=> http://pastebin.com/kT9FvkjE
readelf -a zip
=> http://pastebin.com/3V6kqA9b
UPDATE 1
I have comiled a new binary with hard float
Readelf output
http://pastebin.com/a87bKksY
But no success ;(
I guess it is really a "lock" topic, which is blocking my application to execute. How can I find out what application kills mine ?
Or how can I disable such kind of function ?

Use these compiler switches:
-march=armv6k -Wl,-z,max-page-size=0x10000,-z,common-page-size=0x10000,-Ttext-segment=0x10000
See also this link regarding the toolchain.
You can run readelf -a against one of the built-in binaries (e.g. /usr/bin/nano) to see the proper text-segment offset in the section headers and page size / alignment in the program headers. The above compiler flags make self-compiled programs match the structure of built in binaries, and have been tested to work. It seems the Seagate Central NAS uses a page size / offset of 0x10000 while the default for ARM gcc is 0x8000.
Edit: I see you ran readelf already. Your pastebin shows
HelloWorld:[ 1] .interp PROGBITS 00008134 000134 000013 00 A 0 0 1
zip:[ 1] .interp PROGBITS 00010134 000134 000013 00 A 0 0 1
The value 10134-134=10000 (hex) yields the correct text-segment linker option. Further down (LOAD...) are the alignment specifiers, which are 0x8000 for your HelloWorld, but 0x10000 for the zip built-in. In my experience, soft-float has not caused problems.

Do you see any output at all?
Is your application dynamically linked?
If so, run the dynamic linker with the verbose option (you'll have to figure out the name of the dynamic linker on your platform, for Arch linux, it is ldd):
ldd --verbose 'your_program_name'
That will tell you if you're missing any dependencies (shared libs etc)
Run readelf -a 'your_program_name'
Make sure the file mentioned in Requesting program interpreter: /lib/ld-linux.so.2 exists. In this case, that filename is /lib/ld-linux.so.2
If this fails to help you figure out the problem, post the complete output of ldd --verbose 'your_program_name' and readelf -a 'your_program_name' in your question.
Another issue may be that the NAS software just kills foreign programs. I'm not sure why it would, but we're talking about a big corporation here (Seagate) and they have odd ideas of how the world works at times...
Edit, after looking at the pastebin of readelf:
From what I see, your Hello executable differs in 2 ways from the zip executable:
It is not dynamically linked, so that throws out a whole load of problems to look for.
There's a difference in how the 2 programs are built. zip does not use softfloats and Hello does. I suspect the soft-float dependency is due to one or both of these compiler switches: -mfloat-abi=softfp -mfpu=vfp
Hello Flags: 0x5000202, has entry point, Version5 EABI, soft-float ABI
zip Flags: 0x5000002, has entry point, Version5 EABI
I'd start with either:
Removing the soft-float option from the Hello build or:
make sure the soft-float emulation libraries are on the machine. I don't know what libs this would depend on, but I do remember MontaVista supplying them the last time I touched their software. It's been 8+ years since I touched MontaVista so it's clouded in a bit of old-memory fog.

This is an old thread, but I just wanted to add that I succeeded in compiling a "hello world" for this old NAS today.
Running ld-linux.so.3 <app> told me that
ELF load command alignment not page-aligned
Googling this, I found this: https://github.com/JuliaLang/julia/issues/33293, which pointed me to linker-options:
-Wl,-z,max-page-size=0x10000
Compiling with these options yielded en ELF that actually did work!

Are you sure your compilation options are correct ?
Try the following :
strace your application (if strace is present on the NAS)
downloas one of the NAS binary and run arm-none-linux-gnueabi-readelf -a on it, do the same on your helloworld program and see if the abi tag differ.
It looks like an illegal instruction issue, a floating point issue or an incompatible libc issue.
Edit : according to readelf output, nas program are compiled without soft float, you should try that.

What's the memory before 0x08048000 used for in 32 bit machine?

In Linux, I learned that every process stores data starting at 0x08048000 in 32 bit machine (and 0x00400000 in 64 bit machine).
But I don't know the reason why starting from there. What's the memory before 0x08048000 used for?
Update:
Some people think it's mapped for the kernel. However as far as I know, Linux kernel uses the high-end memory starting after the user stack.

The answer is really: a bunch of things. There is no magical meaning to the load address of the executable and pretty much anything can be mapped to the lower addresses. Common examples including: C library (such as the C library), the dynamic loader ld.so and the kernel VDSO (kernel mapped dynamic code library that provides some of the interface to the kernel in x86 Linux). But you can pretty much map anything you desire there, using the mmap() system call.
For example on my specific machine the map is as follows (acquired but "cat /proc/self/maps"):
gby#watson:~$ cat /proc/self/maps
001c0000-00317000 r-xp 00000000 08:01 245836 /lib/libc-2.12.1.so
00317000-00318000 ---p 00157000 08:01 245836 /lib/libc-2.12.1.so
00318000-0031a000 r--p 00157000 08:01 245836 /lib/libc-2.12.1.so
0031a000-0031b000 rw-p 00159000 08:01 245836 /lib/libc-2.12.1.so
0031b000-0031e000 rw-p 00000000 00:00 0
00376000-00377000 r-xp 00000000 00:00 0 [vdso]
00852000-0086e000 r-xp 00000000 08:01 245783 /lib/ld-2.12.1.so
0086e000-0086f000 r--p 0001b000 08:01 245783 /lib/ld-2.12.1.so
0086f000-00870000 rw-p 0001c000 08:01 245783 /lib/ld-2.12.1.so
08048000-08051000 r-xp 00000000 08:01 2244617 /bin/cat
08051000-08052000 r--p 00008000 08:01 2244617 /bin/cat
08052000-08053000 rw-p 00009000 08:01 2244617 /bin/cat
09ab5000-09ad6000 rw-p 00000000 00:00 0 [heap]
b7502000-b7702000 r--p 00000000 08:01 4456455 /usr/lib/locale/locale-archive
b7702000-b7703000 rw-p 00000000 00:00 0
b771b000-b771c000 r--p 002a1000 08:01 4456455 /usr/lib/locale/locale-archive
b771c000-b771e000 rw-p 00000000 00:00 0
bfbd9000-bfbfa000 rw-p 00000000 00:00 0 [stack]

The start address for loading executable code is determined by the ELF headers for the executable. For example:
/bin/ls
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x08049bb0
There's nothing stopping an executable from specifying a different load address; for whatever reason the default linker settings put it there. You could override with a custom linker script.
By default, on linux/x86, you won't see low addresses below 0x08000000 used for much; although the kernel may use it if requested in a mmap call, or if it runs out of room for mmaps. Additionally, there have been proposals to use addresses in the 0x00000000 - 0x01000000 range for library mappings, to make buffer overflows more difficult (by embedding a NUL byte to terminate strings).

The load address is arbitrary, but was standardized back with SYSV for x86. It's different for every architecture. What goes above and below is also arbitrary, and is often taken up by linked in libraries and mmap() regions.

Typical Load Map for x86 for 32 bit Small static app looks like :
Address Contents
0x08048000 code
0x08052000 data
0x0805A000 bss (zero data)
0x08072000 end of data (brk marker)
0xBFFFE000 stack
All this being in reverse - stack being on top and moving down while data moving up. My guess is the address before 0x08048000 are statically linked something similar to the MBR per OS.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string