Node using all processors without clustering. How come? - node.js

I have a nodejs application that gets data from one server and pushes into another. For testing I sent 1000 requests to my node server and saw what happens on the system monitor. There I could see that all 4 processors were 100% occupied.
Now, from what I have read on nodejs, it seems that it by default uses only 1 thread(which means 1 processor?). But how come all my computer's processors were occupied? Is this load balancing happening at OS level(I am on ubuntu 14)
And in case the balancing was done by OS then what is the difference between this automatic OS level load balancing and explicitly using clusters to divide the load? What are the advantages/disadvantages of each?
Any help would be deeply appreciated :)

Though the application is driven by a single thread, there are helper threads inside node to facilitate the execution within the runtime environment. Examples are JIT compiler thread and GC helper threads. Though they wont consume CPU in proportion to the application load, they will be driven by the characteristics internal to the virtual machine.
Hooking onto a live debugger shows how many threads are there any what they are doing:
gdb) info threads
6 Thread 0x7ffff61d8700 (LWP 23181) 0x00000034d080d930 in sem_wait () from /lib64/libpthread.so.0
5 Thread 0x7ffff6bd9700 (LWP 23180) 0x00000034d080d930 in sem_wait () from /lib64/libpthread.so.0
4 Thread 0x7ffff75da700 (LWP 23179) 0x00000034d080d930 in sem_wait () from /lib64/libpthread.so.0
3 Thread 0x7ffff7fdb700 (LWP 23178) 0x00000034d080d930 in sem_wait () from /lib64/libpthread.so.0
2 Thread 0x7ffff7ffc700 (LWP 23177) 0x00000034d080d930 in sem_wait () from /lib64/libpthread.so.0
* 1 Thread 0x7ffff7fdd720 (LWP 23168) 0x00000034d04e5239 in syscall () from /lib64/libc.so.6
(gdb)

Related

Deadlock while multi-threaded process exit in signal handler

There are two thread in a process. when main thread receive SEGV, from signal handler i used to send some internal signal to other auxiliary thread using pthread_kill and using this internal signal i used to trap auxiliary thread in sleep state, so that i can now do mandatory cleanup and stack-trace dump into file from main thread with thinking of now single threaded process, (as other auxiliary thread is in sleep state).
But, once i encounter that while main thread is exiting, process left (doesn't exit)and seems
present in deadlock state between two thread.
Please help me why and which part of code is causing deadlock.
Thanks in Advance!!
Auxiliary Thread stack:
Thread 2 (Thread 0x7fc565b5b700 (LWP 13831)):
#0 0x00007fc5668e81fd in nanosleep () from /lib64/libc.so.6
#1 0x00007fc566915214 in usleep () from /lib64/libc.so.6
#2 0x00000000009699a2 in SignalHandFun() at ...........
#3 <signal handler called>
#4 0x00007fc56691820a in mmap64 () from /lib64/libc.so.6
#5 0x00007fc5668a5bfc in _IO_file_doallocate_internal () from /lib64/libc.so.6
#6 0x00007fc5668b386c in _IO_doallocbuf_internal () from /lib64/libc.so.6
#7 0x00007fc5668b215b in _IO_new_file_underflow () from /lib64/libc.so.6
#8 0x00007fc5668b38ae in _IO_default_uflow_internal () from /lib64/libc.so.6
#9 0x00007fc566894bad in _IO_vfscanf_internal () from /lib64/libc.so.6
#10 0x00007fc5668a2cd8 in fscanf () from /lib64/libc.so.6
.....
......
.....
#15 0x00007fc567259806 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fc56691b64d in clone () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()
Main Thread stack:
Thread 1 (Thread 0x7fc5679c0720 (LWP 13795)):
#0 0x00007fc56692878e in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007fc5668b504b in _L_lock_1309 () from /lib64/libc.so.6
#2 0x00007fc5668b3d9a in _IO_flush_all_lockp () from /lib64/libc.so.6
#3 0x00007fc5668b4181 in _IO_cleanup () from /lib64/libc.so.6
#4 0x00007fc566872630 in __run_exit_handlers () from /lib64/libc.so.6
#5 0x00007fc5668726b5 in exit () from /lib64/libc.so.6
#6 0x00000000009698e3 in SignalHandFun() at ....
#7 <signal handler called>
#8 0x000000b1000000b0 in ?? ()
#9 0x0000000000000000 in ?? ()
I assume that you send a signal to another thread because you want to do some work that cannot be done with async-signal-safe functions.
The problem is that if your signal handler is called on a thread that has any locks acquired (such as in your case, the internal libio list lock), then any thread that attempts to acquire the same lock will block indefinitely: You cannot return from your SIGSEGV handler, so the lock will never become available for locking again, and no thread waiting on the lock will make progress. In your case, the exit function needs to acquire the libio list lock because it has to go through the list of all open file streams and flush them, while a thread opening a new file acquires the lock while it puts the new file on the list.
While this is an implementation detail and could conceivable be addressed inside glibc at some (far) point in the future (the small improvements we have made relatively recently will not help in your case), the only way is that you call _exit before the final process exit procedure in glibc, after the cleanup you need to do. In your case, it may be possible to do so from an atexit handler you registered as early possible, but this depends on your application.
Regarding crash handlers, we published some advice here:
Using the fork function in signal handlers
The article focuses on fork, but the deadlock issues are pretty much the same in your case.

How does more than one thread execute on a processor core

I wanted to know how does a multi-threaded program with more number of threads executes on a processor core. For example, my program has 12 threads and I am running it on a intel core-i5 machine. It has four CPUs. Will each core run 3 threads? I am confused because I have seen programs with 30 threads running on a 4 core machine.
Thanks
Each core would be able to execute one thread simultaneously. So if there are 30 threads and 4 cores, 26 threads will be waiting to get context switched to get executed. Something like, thread 1-4 runs for 200ms and then 5-8 runs for 200 ms and so on
The processor core is capable of executing one thread at a time. In a quad core, 4 threads are executed simultaneously. Not all the user space threads are executed simultaneously, the kernel threads also runs to schedule the next thread or do some other kernel tasks.

POSIX Threads on a Multiprocessor System

I have written software which takes advantage of POSIX threads so that I can utilize shared memory within the process. My question is if I have a machine running Ubuntu with 4 processors and each processor has 16 cores. Is it more efficient to run 4 processes each with 16 threads or 1 process with 64 threads? Each processor has a dedicated 32gb of ram.
My main worry is that there will be a lot of memcopy happening behind the seen with 1 process.
In summary:
On a 4(16core) Proc Machine
1 process 64 threads? 4 Processes 16 Threads each?
If the process requires more than 32 gb of RAM(The amount dedicated to one Proc) does the answer differ?
Thanks for your help
Depends on what your application does.
A thread in a single-threaded process runs faster then a thread in a multi-threaded process since the latter requires synchronization between threads in library functions like malloc(), fprintf(), etc.. Also, more threads in a multi-threaded process are likely to cause more lock contention slowing down each other. If threads don't need to communicate and don't share data they don't need to be in the same process.
In your case, you may get better parallelism with 4 processes with 16 threads rather then 1 process with 64 threads.

getting info about threads in gdb/ddd

I am debugging a multi threaded application using ddd.
At the same time each second I can see on DDD console out that a new thread is created
[NewThread 0x455fc940 (LWP 27373)]
and destroyed immediately after it.
[Thread 0x455fc940 (LWP 27373) exited]
After few minutes I have this text out
[NewThread 0x455fc940 (LWP 27363)]
[Thread 0x455fc940 (LWP 27363) exited]
[NewThread 0x455fc940 (LWP 27367)]
[Thread 0x455fc940 (LWP 27367) exited]
[NewThread 0x455fc940 (LWP 27373)]
[Thread 0x455fc940 (LWP 27373) exited]
...and so on..
with this LWP increasing.
The threas comes and go too fast to be displayed using the window I got clicking on Status->Thread. Can you address me a bit about how to get information about that thread?
Do you know why this LWP is increasing all the time?
More important how to get the function that is lunched into that thread?
Thank you all
AFG
LWP is an acronym and stands for Light Weight Process. It is in effect the thread ID of each newly spawned thread.
On what to do about those spawning and dying threads: you could try set a break point at clone, which is he system call (? am I correct?) which starts a new thread at a given function.
Note: When breaking at clone you know from where the thread will be started, but don't actually have a thread, you can then however set break points at the functions given as argument to clone...
That is, start your program from gdb or ddd with the start command, which sets a temporary break point at the program entry point (i.e. main), than set a break point at clone, continue and see what happens ;).
Update: setting a break point at clone works for me... at least in my test. I should add that this is linux specific - and is actually what pthread_create uses.
Set a breakpoint at pthread_create.
(gdb) break pthread_create
Breakpoint 1 at 0x20c49ba5cabf44
Now when you run it, it will stop execution when the next call to create a thread happens, and you can type where to see who the caller was.

How to continue one thread at a time when debugging a multithreaded program in GDB?

I have a program which uses two threads. I have put the break point in both the threads. While running the program under gdb I want to switch between the threads and make them run.
(thread t1 is active and running and thread t2; when paused on the breakpoint. I want to stop T1 running and run the T2).
Is there any way that I can schedule the threads in gdb?
By default, GDB stops all threads when any breakpoint is hit, and resumes all threads when you issue any command (such as continue, next, step, finish, etc.) which requires that the inferior process (the one you are debugging) start to execute.
However, you can tell GDB not to do that:
(gdb) help set scheduler-locking
Set mode for locking scheduler during execution.
off == no locking (threads may preempt at any time)
on == full locking (no thread except the current thread may run)
step == scheduler locked during every single-step operation.
In this mode, no other thread may run during a step command.
Other threads may run while stepping over a function call ('next').
So you'll want to set breakpoints, then set scheduler-locking on, then continue or finish in thread 1 (thread 2 is still stopped), then Ctrl-C to regain control of GDB, switch to thread 2, continue (thread 1 is still stopped), etc.
Beware: by setting scheduler-locking on it is very easy to cause the inferior process to self-deadlock.
If you're using GDB 7 or later, try "non-stop mode".
http://sourceware.org/gdb/current/onlinedocs/gdb/Non_002dStop-Mode.html
The "scheduler-locking on" command previously mentioned allows you step one thread with the others stopped. Non-stop mode allows you to step one thread with the others active.
use break conditions
(gdb) break frik.c:13 thread 28 if bartab > lim
see Debugging with GDB
Edit:
(gdb) break <thread_function_entry_point> thread 2
(gdb) break <thread_function_entry_point> thread 1
(gdb) thread 1
(gdb) continue
(gdb) ... thread 1 finishes
(gdb) thread 2
(gdb) continue
You can put these commands inside a .gdbrc file.

Resources