ucontext across threads - linux

Are contexts (the objects manipulated by functions in ucontext.h) allowed to be shared across threads? That is, can I swapcontext with the second argument being a context created in makecontext on another thread? A test program seems to show this working on Linux. I can't find documentation one way or the other on this, whereas Windows fibers appear to explicitly support such a use case. Is this safe and OK to do in general? Is it standard POSIX behavior that this should work?

Actually, there was an NGPT - threading library for linux, which uses not a current 1:1 threading model (each user thread is the kernel thread or LWP), but a M:N threading model (several user threads corresponds to another, smaller number of kernel threads).
According to ftp://ftp.uni-duisburg.de/Linux/NGPT/ngpt-0.9.4.tar.gz/ngpt-0.9.4/pth_sched.c:170 pth_scheduler it was possible of moving user thread contexts between native (kernel) threads:
/*
* See if the thread is unbound...
* Break out and schedule if so...
*/
if (current->boundnative == 0)
break;
/*
* See if the thread is bound to a different native thread...
* Break out and schedule if not...
*/
if (current->boundnative == this_sched->lastrannative)
break;
To save and restore user threads, the ucontext can be used ftp://ftp.uni-duisburg.de/Linux/NGPT/ngpt-0.9.4.tar.gz/ngpt-0.9.4/pth_mctx.c:64 and seems this was a preferred method (mcsc):
/*
* save the current machine context
*/
#if PTH_MCTX_MTH(mcsc)
#define pth_mctx_save(mctx) \
( (mctx)->error = errno, \
getcontext(&(mctx)->uc) )
#elif
....
/*
* restore the current machine context
* (at the location of the old context)
*/
#if PTH_MCTX_MTH(mcsc)
#define pth_mctx_restore(mctx) \
( errno = (mctx)->error, \
(void)setcontext(&(mctx)->uc) )
#elif PTH_MCTX_MTH(sjlj)
...
#if PTH_MCTX_MTH(mcsc)
/*
* VARIANT 1: THE STANDARDIZED SVR4/SUSv2 APPROACH
*
* This is the preferred variant, because it uses the standardized
* SVR4/SUSv2 makecontext(2) and friends which is a facility intended
* for user-space context switching. The thread creation therefore is
* straight-foreward.
*/
So, even if NGPT is dead and unused, it selected *context() for switching user threads even between kernel threads. I assume, that using *context() family is safe enough on Linux.
There can be some problems when mixing ucontexts and other native threads library. I will consider a NPTL, which is standard linux native threading library since glibc 2.4. The main problem is THREAD_SELF - pointer to struct pthread of the current thread. TLS (Thread-local storage) also works via THREAD_SELF. The THREAD_SELF is usually stored on register (r2 on powerpc, %gs on x86, etc). get/setcontext might save and restore this register breaking internals of native pthread library (e.g. thread-local storage, thread identification, etc).
The glibc setcontext will not save/restore %gs register to be compatible with pthreads:
/* Restore the FS segment register. We don't touch the GS register
since it is used for threads. */
movl oFS(%eax), %ecx
movw %cx, %fs
You should check, does setcontext saves THREAD_SELF register on the architecture you are interested in. Also, your code can be not portable between OSes and libcs.

From the man page
In a System V-like environment, one
has the type ucontext_t defined in
and the four functions
getcontext(2), setcontext(2),
makecontext() and swapcontext() that
allow user-level context switching
between multiple threads of control
within a process.
Sounds like that's what it's for.
EDIT: although this discussion seems to indicate that you shouldn't be mixing them.

Related

Alternate to setpriority(PRIO_PROCESS, thread_id, priority)

Given - Thread id of a thread.
Requirement - Set Linux priority of the thread id.
Constraint - Cant use setpriority()
I have tried to use below
pthread_setschedprio(pthread_t thread, int prio);
pthread_setschedparam(pthread_t thread, int policy,
const struct sched_param *param);
Both the above APIs use pthread_t as an argument. I am not able to construct (typecast) pthread_t from thread id. I understand converting this is not possible due to different types.
Is there a way to still accomplish this ?
Some aspects of the pthread_setschedprio interface are available for plain thread IDs with the sched_setparam function (declared in <thread.h>). The sched_setparam manual page says that the process is affected (which is the POSIX-mandated behavior), but on Linux, it's actually the thread of that ID.
Keep in mind that calling sched_setparam directly may break the behavior expected from PI mutexes and other synchronization primitives because the direct call does not perform the additional bookkeeping performed by the pthread_* functions.

Golang, processes and shared memory

Today a friend of mine told me that Go programs can scale themselves on multiple CPU cores. I were quite surprised to hear that knowing that system task schedulers do not know anything about goroutines and hence can't run them on multiple cores.
I did some search and found out that Go programs can spawn multiple OS tasks to run them on different cores (the number is controlled by GOMAXPROCS environment variable). But as far as I know forking a process leads to complete copy of process data and different processes run in different address spaces.
So what about global variables in Go programs? Are they safe to use with multiple goroutines? Do they somehow synchronize between system processes? And if they do then how? I am mainly concerned about linux and freebsd implementations.
I figured it out! It's all in go sources.
There is a Linux system call that I were unaware of.
It's called "clone". It is more flexible than fork and it allows
a child process to live in its parent's address space.
Here is a short overview of the thread creation process.
First there is a newm function in src/runtime/proc.go. This
function is responsible for creating a new working thread
(or machine as it is called in comments).
// Create a new m. It will start off with a call to fn, or else the scheduler.
// fn needs to be static and not a heap allocated closure.
// May run with m.p==nil, so write barriers are not allowed.
//go:nowritebarrier
func newm(fn func(), _p_ *p) {
// ... some code skipped ...
newosproc(mp, unsafe.Pointer(mp.g0.stack.hi))
}
This function calls newosproc which is OS-specific.
For Linux it can be found in src/runtime/os_linux.go. Here
are relevant parts of that file:
var (
// ...
cloneFlags = _CLONE_VM | /* share memory */
_CLONE_FS | /* share cwd, etc */
_CLONE_FILES | /* share fd table */
_CLONE_SIGHAND | /* share sig handler table */
_CLONE_THREAD /* revisit - okay for now */
)
// May run with m.p==nil, so write barriers are not allowed.
//go:nowritebarrier
func newosproc(mp *m, stk unsafe.Pointer) {
// ... some code skipped ...
ret := clone(cloneFlags, /* ... other flags ... */)
// ... code skipped
}
And the clone function is defined in architecture-specific
files. For amd64 it is in src/runtime/sys_linux_amd64.s.
It is the actual system call.
So Go programs do run in multiple OS threads which enables
spanning across CPUs, but they use one shared address space.
Phew... I love Go.

A multiprocessor architecture and Ring 3

void
sema_down (struct semaphore *sema)
{
old_level = intr_disable ();
while (sema->value == 0)
{
list_push_back (&sema->waiters, &thread_current ()->elem);
thread_block ();
}
sema->value--;
intr_set_level (old_level);
}
The above piece of code is a mechanim locking mutex in PintOS. PintOS is targeted for uniprocessor systems. Because of that fact it is sufficient to just disable interrupts. There is no possibility that the other will take a mutex.
So, let's consider a multiprocessor design:
void
sema_down (struct semaphore *sema)
{
old_level = intr_disable ();
while (!lock cmpxchg(1,0)) // it is just pseudocode-idea
{
list_push_back (&sema->waiters, &thread_current ()->elem);
thread_block ();
}
intr_set_level (old_level);
}
old_level = intr_disable ();. It turned off interrupts but it is crucial only in that CPU's context.
It can be a prototype of function acquiring mutex in MP architecture. But, there is a problem with list_push_back. It must be also safe-multithreading. But, we cannot make it safe with a mutex because we are just implementing it now!
The main question is:
Is it possible that two ( or more) CPUs are executing code on Ring 0 level ( kernel)?
And, subquestions that are dependent on the answer to the first one:
If not, there is no problem I described above. But- how it can be implemented?
If yes ( it seems impossible or very hard to realize), what about my above considerations ( It is just only example of potential problem).
Do we have to use spinlocks or lock-free structures?
Yes, in SMP multiple CPU can execute the same code, even at Ring 0.
Every CPU is symmetrical thus it can execute the same code path as the others (including kernel code), unless the software implements some sort of synchronization.
The Linux kernel also faced this problem, and initially implemented a not-so-good solution: A Big Kernel Lock that was acquired and released upon entering and exiting the kernel.
It was not a good solution because only one CPU at a time could execute the kernel code, but it was quick to implement and it was the equivalent of the item number one in you list.
A better solution is to use finer locks across the whole kernel.
Since is it the kernel that implements the sleeping locks like mutexes or semaphores shown in your example, it cannot rely on those primitives themselves1 and must use spinlocks or other, simpler, form of locking.
Luckily this is not a problem, a spinlock (and its variants) is actually better than a mutex when there is a low contention or the critical path is really short (like updating a list).
You can take a look at mutex_init from Linux to see that a spinlock is used to synchronize the queue of waiting tasks.
49 void
50 __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
51 {
52 atomic_set(&lock->count, 1);
53 spin_lock_init(&lock->wait_lock);
54 INIT_LIST_HEAD(&lock->wait_list);
55 mutex_clear_owner(lock);
56 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
57 osq_lock_init(&lock->osq);
58 #endif
59
60 debug_mutex_init(lock, name, key);
61 }
So the answer to your second item is yes.
1 You can't sleep while waiting to sleep while waiting for a lock.

Use of Tcl in c++ multithreaded application

I am facing crash while trying to create one tcl interpreter per thread. I am using TCL version 8.5.9 on linux rh6. It crashes in different functions each time seems some kind of memory corruption. Going through net it seems a valid approach. Has anybody faced similar issue? Does multi-threaded use of Tcl need any kind of special support?
Here is the following small program causing crash with tcl version 8.5.9.
#include <tcl.h>
#include <pthread.h>
void* run (void*)
{
Tcl_Interp *interp = Tcl_CreateInterp();
sleep(1);
Tcl_DeleteInterp(interp);
}
main ()
{
pthread_t t1, t2;
pthread_create(&t1, NULL, run, NULL);
pthread_create(&t2, NULL, run, NULL);
pthread_join (t1, NULL);
pthread_join (t2, NULL);
}
The default Tcl library isn't built thread enabled. (well, not with 8.5.9 afaik, 8.6 is).
So did you check that your tcl lib was built thread enabled?
If you have a tclsh built against the lib, you can simply run:
% parray ::tcl_platform
::tcl_platform(byteOrder) = littleEndian
::tcl_platform(machine) = intel
::tcl_platform(os) = Windows NT
::tcl_platform(osVersion) = 6.2
::tcl_platform(pathSeparator) = ;
::tcl_platform(platform) = windows
::tcl_platform(pointerSize) = 4
::tcl_platform(threaded) = 1
::tcl_platform(wordSize) = 4
If ::tcl_platform(threaded) is 0, your build isn't thread enabled. You would need to build a version with thread support by passing --enable-threads to the configure script.
Did you use the correct defines to declare you want the thread enabled Macros from tcl.h?
You should add -DTCL_THREADS to your compiler invocation, otherwise the locking macros are compiled as no-ops.
You need to use a thread-enabled build of the library.
When built without thread-enabling, Tcl internally uses quite a bit of global static data in places like memory management. It's pretty pervasive. While it might be possible to eventually make things work (provided you do all the initialisation and setup within a single thread) it's going to be rather unadvisable. That things crash in strange ways in your case isn't very surprising at all.
When you use a thread-enabled build of Tcl, all that global static data is converted to either thread-specific data or to appropriate mutex-guarded global data. That then allows Tcl to be used from many threads at once. However, a particular Tcl_Interp is bound to the thread that created it (as it uses lots of thread-specific data). In your case, that will be no problem; your interpreters are happily per-thread entities.
(Well, provided you also add a call to initialise the Tcl library itself, which only needs to be done once. Put Tcl_FindExecutable(NULL); inside main() before you create any of those threads.)
Tcl 8.5 defaulted to not being thread-enabled on Unix for backward-compatibility reasons ā€” on Windows and Mac OS X it was thread-enabled due to the different ways they handle low-level events ā€” but this was changed in 8.6. I don't know how to get a thread-enabled build on RH6 (other than building it yourself from source, which should be straight-forward).

Can AIO run without creating thread?

I would like aio to signal to my program when a read operation completes, and according to this page, such notification can be received by either a signal sent by the kernel, or by starting a thread running a user function. Either behavior can be selected by setting the right value of sigev_notify.
I gave it a try and soon discover that even when set to receive the notification by signal, another thread was created.
(gdb) info threads
Id Target Id Frame
2 Thread 0x7ffff7ff9700 (LWP 6347) "xnotify" 0x00007ffff7147e50 in gettimeofday () from /lib64/libc.so.6
* 1 Thread 0x7ffff7fc3720 (LWP 6344) "xnotify" 0x0000000000401834 in update (this=0x7fffffffdc00)
The doc also states that: The implementation of these functions can be done using support in the kernel (if available) or using an implementation based on threads at userlevel.
I would like to have no thread at all, is this possible?
I checked on my kernel, and that looks okay:
qdii#localhost /home/qdii $ grep -i aio /usr/src/linux/.config
CONFIG_AIO=y
Is it possible to run aio without any (userland) thread at all (apart from the main one, of course)?
EDIT:
I digged deeper into it. librt seems to provide a collection of aio functions: looking through the glibc sources exposed something fishy: inside /rt/aio_read.c is a function stub :
int aio_read (struct aiocb *aiocbp)
{
__set_errno (ENOSYS);
return -1;
}
stub_warning (aio_read)
I found a first relevant implementation in the subdirectory sysdeps/pthread, which directly called __aio_enqueue_request(..., LIO_READ), which in turn created pthreads. But as I was wondering why there would be a stup in that case, I thought maybe the stub could be implemented by the linux kernel itself, and that pthread implementation would be some sort of fallback code.
Grepping aio_read through my /usr/src/linux directory gives a lot of results, which Iā€™m trying to understand now.
I found out that there are actually two really different aio libraries: one is part of glibc, included in librt, and performs asynchronous access by using pthreads. The other aio library implements the same interface as the first one, but is built upon the linux kernel itself and can use signals to run asynchronously.

Resources