How to allocate a new TLS area with clone system call

How to allocate a new TLS area with clone system call - linux

Short version of question: What parameter do I need to pass to the clone system call on x86_64 Linux system if I want to allocate a new TLS area for the thread that I am creating.
Long version:
I am working on a research project and for something I am experimenting with I want to create threads using the clone system call instead of using pthread_create. However, I also want to be able to use thread local storage. I don't plan on creating many threads right now, so it would be fine for me to create a new TLS area for each thread that I create with the clone system call.
I was looking at the man page for clone and it has the following information about the flag for the TLS parameter:
CLONE_SETTLS (since Linux 2.5.32)
The newtls argument is the new TLS (Thread Local Storage) descriptor.
(See set_thread_area(2).)
So I looked at the man page for set_thread_area and noticed the following which looked promising:
When set_thread_area() is passed an entry_number of -1, it uses a
free TLS entry. If set_thread_area() finds a free TLS entry, the value of
u_info->entry_number is set upon return to show which entry was changed.
However, after experimenting with this some it appears that set_thread_area is not implemented in my system (Ubunut 10.04 on an x86_64 platform). When I run the following code I get an error that says: set_thread_area() failed: Function not implemented
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <linux/unistd.h>
#include <asm/ldt.h>
int main()
{
struct user_desc u_info;
u_info.entry_number = -1;
int rc = syscall(SYS_set_thread_area,&u_info);
if(rc < 0) {
perror("set_thread_area() failed");
exit(-1);
}
printf("entry_number is %d",u_info.entry_number);
}
I also saw that when I use strace the see what happens when pthread_create is called that I don't see any calls to set_thread_area. I have also been looking at the nptl pthread source code to try to understand what they do when creating threads. But I don't completely understand it yet and I think it is more complex than what I'm trying to do since I don't need something that is as robust at the pthread implementation. I'm assuming that the set_thread_area system call is for x86 and that there is a different mechanism used for x86_64. But for the moment I have not been able to figure out what it is so I'm hoping this question will help me get some ideas about what I need to look at.

I am working on a research project and for something I am experimenting with I want to create threads using the clone system call instead of using pthread_create
In the exceedingly unlikely scenario where your new thread never calls any libc functions (either directly, or by calling something else which calls libc; this also includes dynamic symbol resolution via PLT), then you can pass whatever TLS storage you desire as the the new_tls parameter to clone.
You should ignore all references to set_thread_area -- they only apply to 32-bit/ix86 case.
If you are planning to use libc in your newly-created thread, you should abandon your approach: libc expects TLS to be set up a certain way, and there is no way for you to arrange for such setup when you call clone directly. Your new thread will intermittently crash when libc discovers that you didn't set up TLS properly. Debugging such crashes is exceedingly difficult, and the only reliable solution is ... to use pthread_create.

The other answer is absolutely correct in that setting up a thread outside of libc's control is guaranteed to cause trouble at a certain point. You can do it, but you can no longer rely on libc's services, definitely not on any of the pthread_* functions or thread-local variables (defined as such using __thread or thread_local).
That being said, you can set one of the segment registers used for TLS (GS and FS) even on x86-64. The system call to look for is prctl(ARCH_SET_GS, ...).
You can see an example comparing setting up TLS registers on i386 and x86-64 in this piece of code.

Related

mmap flag MAP_UNINITIALIZED not defined

mmap() docs mentions flag MAP_UNINITIALIZED, but the flag doesn't seem to be defined.
Tried on Centos7, and Xenial, neither distro has the flag defined in sys/mman.h as alleged.
Astonishingly, the internet doesn't seem to be aware of this. What's the story?
Edit: I understand from the docs that the flag is only honoured on embedded or low-security devices, but that doesn't mean the flag shouldn't be defined... How do you use it in portable code? Google has revealed code where it is defined as 0 in cases where not supported, except in my cases it's not defined at all.

In order to understand what to do about the fact that #include <sys/mman.h> does not define MAP_UNINITIALIZED, it is helpful to understand how the interface to the kernel is defined.
To build a kernel module, you will need the kernel headers used to build the kernel for the exact version of the kernel for which you wish to build the module. As you wish to run in userspace, you won't need these.
The headers that define the kernel API for userspace are largely in /usr/include/linux and /usr/include/asm (see this for how they are generated). One of the more important consumers of these headers is the C standard library, e.g., glibc, which must be built against some version of these headers. Since the linux kernel API is backwards compatible, you may have a glibc (or other library implementation) built against an older version of these headers than the kernel you are running. I'm by no means an expert on how all the various distros distribute glibc, but it is my impression that the kernel headers defining its userspace API are generally the version that glibc has been built against.
Finally, glibc defines its API through headers also installed under /usr/include such as /usr/include/sys. I don't know exactly what, if any, backward or forward compatibility is provided for applications built with older or newer glibc headers, but I'm guessing that the library .so version number gets bumped when backward comparability would be broken.
So now we can understand your problem to be that the glibc headers don't actually define MAP_UNINITIALIZED for the distros/versions that you tried.
However, the linux kernel API has exposed MAP_UNINITIALIZED, as this patch demonstrates. If the glibc headers don't define it for you, you can use the linux kernel API headers and #include <linux/mman.h> if this defines it. Note that you will still need to #include <sys/mman.h> in order to get the prototype for mmap, among other things.
If your linux kernel API headers don't define MAP_UNINITIALIZED but you have a kernel version that implements it, you can define it yourself:
#define MAP_UNINITIALIZED 0x4000000
You don't have to worry that you are effectively using "newer" headers than your glibc was built with, because the glibc implementation of mmap is very thin:
#include <sys/types.h>
#include <sys/mman.h>
#include <errno.h>
#include <sysdep.h>
#ifndef MMAP_PAGE_SHIFT
#define MMAP_PAGE_SHIFT 12
#endif
__ptr_t
__mmap (__ptr_t addr, size_t len, int prot, int flags, int fd, off_t offset)
{
if (offset & ((1 << MMAP_PAGE_SHIFT) - 1))
{
__set_errno (EINVAL);
return MAP_FAILED;
}
return (__ptr_t) INLINE_SYSCALL (mmap2, 6, addr, len, prot, flags, fd,
offset >> MMAP_PAGE_SHIFT);
}
weak_alias (__mmap, mmap)
It is just passing your flags straight through to the kernel.

The kernel normally needs to clear the memory, to protect the privacy of both kernel space and other process' memory.
Continue reading:
This flag is honored only if the kernel was configured with the CONFIG_MMAP_ALLOW_UNINITIALIZED option. Because of the security implications, that option is normally enabled only on embedded devices (i.e., devices where one has complete control of the contents of user memory).

Code sequences for TLS on ARM

The ELF Handling For Thread-Local Storage document gives assembly sequences for the various models (local exec/initial exec/general dynamic) for various architectures. But not ARM -- is there anywhere I can see such code sequences for ARM? I'm working on a compiler and want to generate code that will operate properly with the platform linkers (both program and dynamic).
For clarity, let's assume an ARMv7 CPU and a pretty new kernel and glibc (say 3.13+ / 2.19+), but I'd also be interested in what has to change for older hw/sw if that's easy to explain.

I don't exactly understand what you want. However, the assembler sequences (for ARMv6+ and a capable kernel) are,
mrc p15, 0, rX, c13, c0, 2 # get the user r/w register
This is called TPIDRURW in some ARM manuals. Your TLS tables/structure must be parented from this value (probably a pointer). Using the mcr is faster, but you can also call the helper (see below) if you don't set HWCAP_TLS in your ELF (which can be used on all ARM CPUs supported by Linux).
The intent of address 0xffff0fe8 seems to be that you can use those 4-bytes instead of using the above assembler directly with (rX == r0) as maybe it is different for some machine somewhere.
It is dependent on the CPU type. There is a helper in the vector page #0xffff0fe0 in entry-armv.S; it is in the process/thread structure if the hardware doesn't support it. Documentation is in kernel_user_helpers.txt
Usage example:
typedef void * (__kuser_get_tls_t)(void);
#define __kuser_get_tls (*(__kuser_get_tls_t *)0xffff0fe0)
void foo()
{
void *tls = __kuser_get_tls();
printf("TLS = %p\n", tls);
}
You do a syscall to set the TLS stuff. clone is a way to setup a thread context. The thread_info holds all register for a thread; it may share an mm (memory management or process memory view) with other task_struct. Ie, the thread_info has a tp_value for each created thread.
Here is a dicussion of the ARM implementation. ELF/nptl/glibc and Linux kernel are all involved (and/or search terms to investigate more). The syscall for get_tls() was probably too expensive and the current mainline has a vector page helper (mapped by all threads/processes).
Some glibc source, tls-macros.h, tlsdesc.c, etc. Most likely a full/concise answer will depend on the version of,
Your ARM CPU.
Your Linux kernel.
Your glibc.
Your compiler (and flags!).

How to prohibit system calls, GNU/Linux

I'm currently working on the back-end of ACM-like public programming contest system. In such system, any user can submit a code source, which will be compiled and run automatically (which means, no human-eye pre-moderation is performed) in attempt to solve some computational problem.
Back-end is a GNU/Linux dedicated machine, where a user will be created for each contestant, all such users being part of users group. Sources sent by any particular user will be stored at the user's home directory, then compiled and executed to be verified against various test cases.
What I want is to prohibit usage of Linux system calls for the sources. That's because problems require platform-independent solutions, while enabling system calls for insecure source is a potential security breach. Such sources may be successfully placed in the FS, even compiled, but never run. I also want to be notified whenever source containing system calls was sent.
By now, I see the following places where such checker may be placed:
Front-end/pre-compilation analysis - source already checked in the system, but not yet compiled. Simple text checker against system calls names. Platform-dependent, compiler-independent, language-dependent solution.
Compiler patch - crash GCC (or any other compiler included in the tool-chain) whenever system call is encountered. Platform-dependent, compiler-dependent, language-independent solution (if we place checker "far enough"). Compatibility may also be lost. In fact, I dislike this alternative most.
Run-time checker - whenever system call is invoked from the process, terminate this process and report. This solution is compiler and language independent, but depends on the platform - I'm OK with that, since I will deploy the back-end on similar platforms in short- and mid-terms.
So the question is: does GNU/Linux provide an opportunity for administrator to prohibit system calls usage for a usergroup, user or particular process? It may be a security policy or a lightweight GNU utility.
I tried to Google, but Google disliked me today.

mode 1 seccomp allows a process to limit itself to exactly four syscalls: read, write, sigreturn, and _exit. This can be used to severely sandbox code, as seccomp-nurse does.
mode 2 seccomp (at the time of writing, found in Ubuntu 12.04 or patch your own kernel) provides more flexibility in filtering syscalls. You can, for example, first set up filters, then exec the program under test. Appropriate use of chroot or unshare can be used to prevent it from re-execing anything else "interesting".

I think you need to define system call better. I mean,
cat <<EOF > hello.c
#include <stdio.h>
int main(int argc,char** argv) {
fprintf(stdout,"Hello world!\n");
return 0;
}
EOF
gcc hello.c
strace -q ./a.out
demonstrates that even an apparently trivial program makes ~27 system calls.
You (I assume) want to allow calls to the "standard C library", but those in turn will be implemented in terms of system calls. I guess what I'm trying to say is that run-time checking is less feasible than you might think (using strace or similar anyway).

Disabling vsyscalls in Linux

I'm working on a piece of software that monitors other processes' system calls using ptrace(2). Unfortunately most modern operating system implement some kind of fast user-mode syscalls that are called vsyscalls in Linux.
Is there any way to disable the use of vsyscalls/vDSO for a single process or, if that is not possible, for the whole operating system?

Try echo 0 > /proc/sys/kernel/vsyscall64
If you're trying to ptrace on gettimeofday calls and they aren't showing up, what time source is the system using (pmtimer, acpi, tsc, hpet, etc). I wonder if you'd humor me by trying to force your timer to something older like pmtimer. It's possible one of the many gtod timer specific optimizations is causing your ptrace calls to be avoided, even with vsyscall set to zero.

Is there any way to disable the use of vsyscalls/vDSO for a single process or, if that is not possible, for the whole operating system?
It turns out there IS a way to effectively disable linking vDSO for a single process without disabling it system-wide using ptrace!
All you have to do is to stop the traced process before it returns from execve and remove the AT_SYSINFO_EHDR entry from the auxiliary vector (which comes directly after environment variables along the memory region pointed to in rsp). PTRACE_EVENT_EXEC is a good place to do this.
AT_SYSINFO_EHDR is what the kernel uses to tell the system linker where vDSO is mapped in the process's address space. If this entry is not present, ld seems to act as if the system hasn't mapped a vDSO.
Note that this doesn't somehow unmap the vDSO from your processes memory, it merely ignores it when linking other shared libraries. A malicious program will still be able to interact with it if the author really wanted to.
I know this answer is a bit late, but I hope this information will spare some poor soul a headache

For newer systems echo 0 > /proc/sys/kernel/vsyscall64 might not work. In Ubuntu 16.04 vDSO can be disabled system-wide by adding the kernel parameter vdso=0 in /etc/default/grub under the parameter: GRUB_CMDLINE_LINUX_DEFAULT.
IMPORTANT: Parameter GRUB_CMDLINE_LINUX_DEFAULT might be overwriten by other configuration files in /etc/default/grub.d/..., so double check when to add your custom configuration.

Picking up on Tenders McChiken's approach, I did create a wrapper that disables vDSO for an arbitrary binary, without affecting the rest of the system: https://github.com/danteu/novdso
The general procedure is quite simple:
use ptrace to wait for return from execve(2)
find the address of the auxvector
overwrite the AT_SYSINFO_EHDR entry with AT_IGNORE, telling the application to ignore the following value

I know this is an older question, but nobody has mentioned a third useful way of disabling the vDSO on a per-process basis. You can overwrite the libc functions with your own that performs the actual system call using LD_PRELOAD.
A simple shared library for overriding the gettimeofday and time functions, for example, could look like this:
vdso_override.c:
#include <time.h>
#include <sys/time.h>
#include <unistd.h>
#include <sys/syscall.h>
int gettimeofday(struct timeval *restrict tv, struct timezone *restrict tz)
{
return syscall(__NR_gettimeofday, (long)tv, (long)tz, 0, 0, 0, 0);
}
time_t time(time_t *tloc)
{
return syscall(__NR_time, (long)tloc, 0, 0, 0, 0, 0);
}
This uses the libc wrapper to issue a raw system call (see syscall(2)), so the vDSO is circumvented. You would have to overwrite all system calls that the vDSO exports on your architecture in this way (listed at vdso(7)).
Compile with
gcc -fpic -shared -o vdso_override.so vdso_override.c
Then run any program in which you want to disable VDSO calls as follows:
LD_PRELOAD=./vdso_override.so <some program>
This of course only works if the program you are running is not actively trying to circumvent this. While you can override a symbol using LD_PRELOAD, if the target program really wants to, there is a way to find the original symbol and use that instead.

What functions does _WinMainCRTStartup perform?

This is part of a series of at least two closely related, but distinct questions. I hope I'm doing the right thing by asking them separately.
I'm trying to get my Visual C++ 2008 app to work without the C Runtime Library. It's a Win32 GUI app without MFC or other fancy stuff, just plain Windows API.
So I set Project Properties -> Configuration -> C/C++ -> Advanced -> Omit Default Library Names to Yes (compiler flag /Zl) and rebuilt.
Then the linker complains about an unresolved external _WinMainCRTStartup. Fair enough, I can tell the linker to use a different entry point, say MyStartup. From what I gather around the web, _WinMainCRTStartup does some initialization stuff, and I probably want MyStartup to do a subset of that.
So my question is: What functions does _WinMainCRTStartup perform, and which of these can I omit if I don't use the CRT?
If you are knowledgeable about this stuff, please have a look at my other question too. Thanks!
Aside: Why do I want to do this in the first place?
My app doesn't explicitly use any CRT functions.
I like lean and mean apps.
It'll teach me something new.

The CRT's entry point does the following (this list is not complete):
Initializes global state needed by the CRT. If this is not done, you cannot use any functions or state provided by the CRT.
Initializes some global state that is used by the compiler. Run-time checks such as the security cookie used by /GS definitely stands out here. You can call __security_init_cookie yourself, however. You may need to add other code for other run-time checks.
Calls constructors on C++ objects. If you are writing C++ code, you may need to emulate this.
Retrieves command line and start up information provided by the OS and passes it your main. By default, no parameters are passed to the entry point of the program by the OS - they are all provied by the CRT.
The CRT source code is available with Visual Studio and you can step through the CRT's entry point in a debugger and find out exactly what it is doing.

A true Win32 program written in C (not C++) doesn't need any initialization at all, so you can start your project with WinMainCRTStartup() instead of WinMain(HINSTANCE,...).
It's also possible but a bit harder to write console programs as true Win32 applications; the default name of entry point is _mainCRTStartup().
Disable all extra code generation features like stack probes, array checks etc. Debugging is still possible.
Initialization
Sometimes you need the first HINSTANCE parameter. For Win32 (except Win32s), it is fixed to (HINSTANCE)0x400000.
The nCmdShow parameter is always SW_SHOWDEFAULT.
If necessary, retrieve the command line with GetCommandLine().
Termination
When your program spawns threads, e.g. by calling GetOpenFileName(), returning from WinMainCRTStartup() with return keyword will hang your program — use ExitProcess() instead.
Caveats
You will run into considerable trouble when:
using stack frames (i.e. local variables) larger than 4 KBytes (per function)
using float-point arithmetic (e.g. float->int conversion)
using 64-bit integers on 32-bit machines (multiply, bit-shift operations)
using C++ new, delete, and static objects with non-zero-out-all-members constructors
using standard library functions like fopen(), printf() of course
Troubleshoot
There is a C standard library available on all Windows systems (since Windows 95), the MSVCRT.DLL.
To use it, import their entry points, e.g. using my msvcrt-light.lib (google for it). But there are still some caveats, especially when using compilers newer than MSVC6:
stack frames are still limited to 4 KBytes
_ftol_sse or _ftol2_sse must be routed to _ftol
_iob_func must be routed to _iob
Its initialization seems to run at load time. At least the file functions will run seemlessly.

Old question, but the answers are either incorrect or focus on one specific problem.
There are a number of C and C++ features that simply will not be available on Windows (or most operating systems, for that matter) if the programs actually started at main/WinMain.
Take this simple example:
class my_class
{
public:
my_class() { m_val = 5; }
int my_func(){ return m_val }
private:
int m_val;
}
my_class g_class;
int main(int argc, char **argv)
{
return g_class.my_func();
}
in order for this program to function as expected, the constructor for my_class must be called before main. If the program started exactly at main, it would require a compiler hack (note: GCC does this in some cases) to insert a function call at the very beginning of main. Instead, on most OSes and in most cases, a different function constructs g_class and then calls main (on Windows, this is either mainCRTStartup or WinMainCRTStartup; on most other OSes I'm used to it is a function called _start).
There's other things C++ and even C require to be done before or after main to work.
How are stdin and stdout (std::cin and std::cout) useable as soon as main starts?
How does atexit work?
The C standard requires the standard library have a POSIX-like signal API, which on Windows must be "installed" before main().
On most OSes, there is no system-provided heap; the C runtime implements its own heap (Microsoft's C runtime just wraps the Kernel32 Heap functions).
Even the arguments passed to main, argc and argv, must be gotten from the system somehow.
You might want to take a look at Matt Pietrick's (ancient) articles on implementing his own C runtime for specifics on how this works with Windows + MSVC (note: MinGW and Cygwin implement specific things differently, but actually fall back to MSVCRT for most things):
http://msdn.microsoft.com/en-us/library/bb985746.aspx

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to allocate a new TLS area with clone system call - linux

Related

mmap flag MAP_UNINITIALIZED not defined

Code sequences for TLS on ARM

How to prohibit system calls, GNU/Linux

Disabling vsyscalls in Linux

What functions does _WinMainCRTStartup perform?

Categories

Resources