How can I build a ThreadId given that I know the actual number? - haskell

It often happens to me when debugging or playing around in GHCi that I happen to know the actual ThreadId number (for example from using Debug.Trace), but that's all I have.
The problem is that all thread APIs, such as killThread require a ThreadId and not an Int.
I've tried Hoogle but came out empty. Is there a way to do this? I'm concerned mostly with debugging, so I don't mind if it's a nasty hack or if it's through a GHC-only library.

You can't. ThreadId is abstract. The Int you have is actually nothing more than a counter (source):
32 static StgThreadID next_thread_id = 1;
...
59 StgTSO *
60 createThread(Capability *cap, W_ size)
61 {
62 StgTSO *tso;
...
126 ACQUIRE_LOCK(&sched_mutex);
127 tso->id = next_thread_id++; // while we have the mutex
...
130 RELEASE_LOCK(&sched_mutex);
...
136 }
...
161 int
162 rts_getThreadId(StgPtr tso)
163 {
164 return ((StgTSO *)tso)->id;
165 }
It's rts_getThreadId that gets called in ThreadId's Show instance. There's no mapping back to the actual TSO. If you want to know what ThreadId belongs to what Int, you need to keep track of them yourself. You could, for example, parse the Int and fill a Map.

Related

Is Entrance into a Windows Critical Section an atomic operation?

I wrote an FFI for critical sections, and I wrote a test for it in Haxe.
Tests run in order defined (public functions are tests)
This test test_critical_section will intermittently hang and fail:
1 var criticalSection:CriticalSection;
2
3 #if master
4 public function test_init_critical_section() {
5 return assert(attempt({
6 criticalSection = synch.SynchLib.critical_section_init(SPIN_COUNT);
7 trace('criticalSection: $criticalSection');
8 }));
9 }
10 var criticalValue = 0;
11 var done = 0;
12 var numThreads = 50;
13 function work_in_critical_section(ID:Int, a:AssertionBuffer) {
14 sys.thread.Thread.create(() -> {
15 inline function threadMsg(msg:String)
16 trace('Thread ID $ID: $msg');
17
18
19 threadMsg("Attempting to enter critical section");
20 criticalSection.critical_section_enter();
21 threadMsg("Entering crtiical section. Doing work.");
22 Sys.sleep(Std.random(100)/500); // simulate work in section
23 criticalValue+= 10;
24 done++;
25 a.assert(criticalValue == done * 10);
26 threadMsg("Leaving critical section. Work done. done: " + done);
27 criticalSection.critical_section_leave();
28 if (done == numThreads) {
29 a.assert(criticalValue == numThreads * 10);
30 a.done();
31
32 }
33 });
34 }
35 #:timeout(30000)
36 public function test_critical_section() {
37 var a = new AssertionBuffer();
38 for (i in 0...numThreads)
39 work_in_critical_section(i, a);
40 return a;
41 }
But when I add Sys.sleep(ID/5); just before entrance into the critical section (on the blank line 18), the test passses every single time (with any number of threads). Without it, the test fails randomly (more often with a higher number of threads).
My conclusion from this test is that entrance to a critical section is not atomic, and multiple threads simultaneously attempting to enter may leave the critical section in an undefined state (leading to undefined/hanging behavior).
Is this the right conclusion or am I simply mis-using critical sections (and thus, the test needs to be re-written)? And if it is the right conclusion.. does this not mean that entrance into the critical section needs its own atomic locking/synchronization mechanism..? (and further, if that is the case.. what is the point of critical sections, why would I not just use whatever that atomic synchronization mechanism is?)
To me, this seems problematic, for example, consider 10 threads meet at a synchronization barrier (with a capacity of 10), and then all 10 need to proceed through a critical section immediately after the 10th thread arrives, does that mean I'd have to synchronize/serialize access to the critical section entrance method (for instance, by sleeping such as to ensure only one thread attempts to enter the section at a given tick, as done to fix the failing test above)?
The FFI is writen ontop of synchapi.h (see EnterCriticalSection)
You read done outside the critical section. That is a race condition. If you want to look at the value of done, you need to do it before you leave the critical section.
You might see a write to done from another thread, triggering the assert before the write to criticalValue is visible to the thread that saw the write to done.
If the critical section protects criticalValue and done, then it is an error to access either of them without being in the critical section unless you are sure every thread that might access them has terminated. Your code violates this rule.

How could sys_sigsuspend is atomical in linux kernel 2.6.11?

I'm reading linux 2.6.11
the implementation of sys_sigsuspend is as the following
34 /*
35 * Atomically swap in the new signal mask, and wait for a signal.
36 */
37 asmlinkage int
38 sys_sigsuspend(int history0, int history1, old_sigset_t mask)
39 {
40 struct pt_regs * regs = (struct pt_regs *) &history0;
41 sigset_t saveset;
42
43 mask &= _BLOCKABLE;
44 spin_lock_irq(&current->sighand->siglock);
45 saveset = current->blocked;
46 siginitset(&current->blocked, mask);
47 recalc_sigpending();
48 spin_unlock_irq(&current->sighand->siglock);
49
50 regs->eax = -EINTR;
51 while (1) {
52 current->state = TASK_INTERRUPTIBLE;
53 schedule();
54 if (do_signal(regs, &saveset))
55 return -EINTR;
56 }
57 }
in ULK3 the author says
the sigsuspend( ) system call does not allow signals to be sent after unblocking and before the schedule( ) invocation, because other processes cannot grab the CPU during that time interval.
Between spin_unlock_irq and schedule the syscall can be interrupted and preempted, so the other process can have enough time to send a signal which is not blocked to the process
But in this case, the signal will be lost, because the process schedule after the signal is delivered.
That's why sigsuspend should be atomical, but it's NOT according the its implementation.
sigsuspend implementation is correct, but the explanation in ULK is seems to be misleading.
When process executes kernel code, that execution is never interrupted by the user's signals. Instead, such signals are accumulated inside current task structure. At the moment the process leaves kernel code and returns to the user one, all signals accumulated(and not blocked) are fired.
schedule() kernel's function checks, whether some signals are accumulated. If they are, and current->state is TASK_INTERRUPTIBLE, schedule() returns. So all signals collected before schedule() call are not lost.
Atomicity of sigsuspend() system call means that if signals, temporary unblocked by the call, are emitted, then that call will garantee see them and return. Such atomicity is simply achived by placing both unblocking and checking signals inside same kernel function.

How does the Linux kernel get info about the processors and the cores?

Assume we have a blank computer without any OS and we are installing a Linux. Where in the kernel is the code that identifies the processors and the cores and get information about/from them?
This info eventually shows up in places like /proc/cpuinfo but how does the kernel get it in the first place?!
Short answer
Kernel uses special CPU instruction cpuid and saves results in internal structure - cpuinfo_x86 for x86
Long answer
Kernel source is your best friend.
Start from entry point - file /proc/cpuinfo.
As any proc file it has to be cretaed somewhere in kernel and declared with some file_operations. This is done at fs/proc/cpuinfo.c. Interesting piece is seq_open that uses reference to some cpuinfo_op. This ops are declared in arch/x86/kernel/cpu/proc.c where we see some show_cpuinfo function. This function is in the same file on line 57.
Here you can see
64 seq_printf(m, "processor\t: %u\n"
65 "vendor_id\t: %s\n"
66 "cpu family\t: %d\n"
67 "model\t\t: %u\n"
68 "model name\t: %s\n",
69 cpu,
70 c->x86_vendor_id[0] ? c->x86_vendor_id : "unknown",
71 c->x86,
72 c->x86_model,
73 c->x86_model_id[0] ? c->x86_model_id : "unknown");
Structure c declared on the first line as struct cpuinfo_x86. This structure is declared in arch/x86/include/asm/processor.h. And if you search for references on that structure you will find function cpu_detect and that function calls function cpuid which is finally resolved to native_cpuid that looks like this:
189 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
190 unsigned int *ecx, unsigned int *edx)
191 {
192 /* ecx is often an input as well as an output. */
193 asm volatile("cpuid"
194 : "=a" (*eax),
195 "=b" (*ebx),
196 "=c" (*ecx),
197 "=d" (*edx)
198 : "" (*eax), "2" (*ecx)
199 : "memory");
200 }
And here you see assembler instruction cpuid. And this little thing does real work.
This information from BIOS + Hardware DB. You can get info direct by dmidecode, for example (if you need more info - try to check dmidecode source code)
sudo dmidecode -t processor

Race condition on ticket-based ARM spinlock

I found that spinlocks in Linux kernel are all using "ticket-based" spinlock now. However after looking at the ARM implementation of it, I'm confused because the "load-add-store" operation is not atomic at all. Please see the code below:
74 static inline void arch_spin_lock(arch_spinlock_t *lock)
75 {
76 unsigned long tmp;
77 u32 newval;
78 arch_spinlock_t lockval;
79
80 __asm__ __volatile__(
81 "1: ldrex %0, [%3]\n" /*Why this load-add-store is not atomic?*/
82 " add %1, %0, %4\n"
83 " strex %2, %1, [%3]\n"
84 " teq %2, #0\n"
85 " bne 1b"
86 : "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
87 : "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
88 : "cc");
89
90 while (lockval.tickets.next != lockval.tickets.owner) {
91 wfe();
92 lockval.tickets.owner = ACCESS_ONCE(lock->tickets.owner);
93 }
94
95 smp_mb();
96 }
As you can see, on line 81~83 it loads lock->slock to "lockval" and increment it by one and then store it back to the lock->slock.
However I didn't see anywhere this is ensured to be atomic. So it could be possible that:
Two users on different cpu are reading lock->slock to their own variable "lockval" at the same time; Then they add "lockval" by one respectively and then store it back.
This will cause these two users are having the same "number" in hand and once the "owner" field becomes that number, both of them will acquire the lock and do operations on some shared-resources!
I don't think kernel can have such a bug in spinlock. Am I wrong somewhere?
STREX is a conditional store, this code has Load Link-Store Conditional semantics, even if ARM doesn't use that name.
The operation either completes atomically, or fails.
The assembler block tests for failure (the tmp variable indicates failure) and reattempts the modification, using the new value (updated by another core).

Choppy SDL+OpenGL animation when vsync is on

Uint32 prev = SDL_GetTicks();
while ( true )
{
Draw();
Uint32 now = SDL_GetTicks();
Uint32 delta = now - prev;
printf( "%u\n" , delta );
Update( delta / 1000.0f );
prev = now;
ProcessEvents();
}
The application is a simple moving square. My loop looks like that and when vsync is on the whole thing just runs quite smoothly; turning it off instead causes some kind of jumps of the animation. I've inserted some prints and here's what I've found:
[...]
16
15
16
66 #
2 #
0 #
0 #
16
16
21
[...]
I know there are several issues with this kind of loop but none of them seem to apply to this simple example (am I wrong?). What causes this behavior and how can I overcome it?
I'm using an ATI card on a Linux system, but I'm expecting a portable explanation/solution.
It seems that it was a lack of glFinish(), I've read somewhere that calls to that function are in most cases useless (here or here for example). Well, I'm maybe misunderstanding some fundamental concepts but that worked for me and now the Draw() function ends with:
[...]
glFinish();
SDL_GL_SwapBuffers();
}

Resources