Migrating a running process/thread to different core - linux

Is there any way to migrate a currently running process to a different cpu core by triggering the migration from another process.
Here is what i am trying to do in more detail.
I am working on a heterogeneous processor system. I have a multi-threaded application which runs on the system. I want to migrate one of the thread to different cores (with different capabilities) whenever my manager process decides.
Can my manager process trigger the thread migration for the particular tid of the target application pid?
If so, can it be done instantaneously i.e the running thread be immediately migrated to another core (say from core 0 to core 1) upon triggered by my manager process?

I guess this should be possible (if you are using the POSIX threads API) using pthread_setaffinity_np(3):
The pthread_setaffinity_np() function sets the CPU affinity mask of
the thread thread to the CPU set pointed to by cpuset. If the call is
successful, and the thread is not currently running on one of the CPUs
in cpuset, then it is migrated to one of those CPUs.

Related

Setting Thread Affinities in Docker Containers

I have an application which has one thread that runs with 100% CPU usage. This thread reads and processes data from RAM that is populated by a group of 10 threads that wait on system IO to retrieve data.
In my Linux environment I will isolate one CPU and then assign the core, spinning, thread to that core cpu. All other reader threads will be scheduled by the OS on all of the schedulable CPU's.
If I need to run additional instances of my application we would isolate another CPU and put the second instance's core reader thread onto that isolated CPU. Again, the reading threads are IO bound and will happily go along with the system scheduler.
This is simple enough to set up manually with isolation set up as needed and the each instance of the application having a reserved core for it's critical thread.
How can this transition to docker? I'd like to run Docker containers so that I can run many instances across various hosts for Docker (including AWS Batch). But then in a Docker scenario, how can we have every instance allocate CPU's without stepping on each-other's cores? When we call pthread_setaffinity_np() to set the core how can we know which thread to use in a dynamically allocated container environment?

Monitor activity of a specific CPU core on Linux

I have a performance issue in my application. I have a critical thread running on a specific cpu core (its affinity is set to this cpu core). It always performs the same task in a loop. But from time to time, the loop takes more time than it should.
This might be due to the code of the application itself or to some external factors (like other processes running on the same cpu core).
I would like to be able to create some kind of timeline of things that run on this core (which process, which thread and ideally which function).
What do you recommend me to use to do this? (I am working on Linux)
Thanks in advance,
Serb.

When a workerThread is created in nodejs, does it utilize the same core in which nodejs process is running?

Let's assume i have a nodejs serverProgram with one api and it does some manipulations on the video file, sent via the http request.
const saveVideoFile=(req,res)=>{
processAndSaveVideoFile(); // can run for minimum of 10 minutes
res.send({status: "video is being processed"})
}
i decided to to make use of a workerThread to do this processing as my machine has 3 cores (core1,core2,core3) and there is no hyperthreading enabled here
Assume that my nodejs program is running on core1. When i fire up a single workerThread, will the workerThread run on core2/core3 or core1?
i read that workerThread is not the same as childProcess. ChildProcess will fork a new process which will facilitate the childProcess to choose from available free cores (core2 or core3).
i read that workerThread shares memory with the mainThread. Let's assume that i create 2 workerThreads (wt1,wt2). Will my nodejs program, wt1, wt2 run on the same core i.e core1 ?
Also, in nodejs we have eventloop (mainthread) and otherThreads doing the background operations i.e I/O. is it correct to assume that all of these are utilizing the resources available in a single core (core1). if this is the case, is creating and using additional workerThread's an overkill on the nodejs server?
Below is an excerpt from this blog
We can run things in parallel in Node.js. However, we need not to
create threads. The operating system and the virtual machine
collectively run the I/O in parallel and the JS code then runs in a
single thread when it is time to send the data back to the JavaScript
code.
i keep reading this same information about nodejs in many articles and video presentations. But what i do not understand is this,
The operating system and the virtual machine collectively run the I/O in parallel
How can the operating system run the I/O requests from nodejs program in parallel without using any of the childProcess or threads spawned from nodejs? if those I/O requests from nodejs program is running in parallel, does it mean that all 3 cores (core1,core2,core3) will be utilized?
There are lot of contents on nodejs, but it doesn't clear doubts related to my above questions. if you have idea on how these things actually work, please share the detail.
A worker thread in node.js is an actual OS thread running in a different instance of V8. As such, it's totally up to the operating system to decide how to allocate it among available CPU cores. If there are cores with available time, then it will not generally be run on the same core as the main nodejs thread when that thread is busy because the OS will allocate busy threads across the various cores.
But, again this is entirely up to the OS and is not something that nodejs controls and the exact strategy for which cores are used will vary by OS. But, in all modern operating systems, the design goal is that available cores are used for threads that are currently executing. Now, if there are more threads active at once than there are cores, the threads will be time-sliced and all the cores will be active.
Also, in nodejs we have eventloop (mainthread) and otherThreads doing the background operations i.e I/O. is it correct to assume that all of these are utilizing the resources available in a single core (core1). if this is the case, is creating and using additional workerThread's an overkill on the nodejs server?
No, it is not correct to assume those threads all use the same core.
A workerThread in nodejs has its own event loop. For the most part, it does not share memory. In fact, if you want to share memory, you have to very specifically allocated SharedMemory and pass that to the workerThread.
Is it overkill? Well, it depends upon what you're doing. There are very useful things to do with workerThreads and there are things that they would not be necessary for.
The operating system and the virtual machine collectively run the I/O in parallel
I/O in node.js is either asynchronous at the OS level (such as networking) or run in separate threads (such as disk I/O). That means it runs separately from the main thread in node.js that runs your Javascript and can run in parallel with it, synchronizing only at the completion of an event. "Parallel" in this case means that both make progress at the same time. If there are multiple cores, then they can truly be running at exactly the same time. If there was only one core, then the OS will timeslice between the various threads and they will be both make progress (in an interleaved fashion that will seem to be parallel, but really they are taking turns).
How can the operating system run the I/O requests from nodejs program in parallel without using any of the childProcess or threads spawned from nodejs? if those I/O requests from nodejs program is running in parallel, does it mean that all 3 cores (core1,core2,core3) will be utilized?
The OS has its own threads for managing things like a network interface or a disk interface. The job of those threads is to interface with the hardware and bring data to an appropriate application or take data from the application and send it to the hardware. These are OS-level threads that exists independent of node.js. Yes, other cores can be used by those OS-level threads. It is important to realize that many operations such as networking are inherently non-blocking. Thus, if you're waiting for some data to arrive on a network interface, you don't need to have a thread doing something the whole time.
I want to add that it appears in your questions that you've combined questions about a several different things. Mentioned in your questions are:
Worker Threads
Internal node.js threads
Operating system threads
These are all different things.
A worker thread is a new thread you can start to run specific pieces of Javascript in another thread so you can have more than one Javascript thread running at the same time. In node.js, this is done by creating a whole new instance of V8, setting up a whole new global environment and loaded modules environment and using almost entirely separate memory.
Internal node.js threads are used by node.js as part of implementing its event loop and its standard library. Specifically, disk I/O and some crypto operations are run in internal native threads and they communicate with your Javascript via events/callbacks through the event loop.
Operating system threads are threads that the OS uses to implement it's own system APIs. Since the OS is responsible for lots of things, these threads ca have many different uses. Depending upon native implementations, they may be used to facilitate things like disk I/O or networking I/O. These threads are the responsibility of the OS to create and use and are not directly controlled by node.js.
Some additional questions asked in comments:
what is the difference b/w workerThread & childProcess concept in nodejs? is childProcess = workerThread without sharedMemory ?
A child process can be any type of program - it does not have to be a node.js program. A worker thread is node.js code.
A worker thread can share memory if sharedMemory is specifically allocated and shared with the worker thread and if it is carefully managed for concurrency issues.
It is more efficient to copy memory back and forth between worker thread and main thread than with child process.
If main program exits, worker threads will exit. If main program exits, child process can be configured to exit or to continue.
If worker thread calls process.exit(), the main thread will exit too. If child program exits, it cannot cause main program to exit without main program's cooperation.
how nodejs is able to magically interact with the os level thread without nodejs itself creating any threads?, i need additional details on this, your explanation is the common one present in most places including the blog i shared?
nodejs just calls an OS API. It's the OS API that manages communicating with its own threads (if threads are needed for that specific OS API). How it does that communication internally is implementation dependent and will vary by OS. It will even vary by OS which OS APIs use threads and which don't.

Monitor multiple thread performance

I have create a windows service having multiple threads (approx 4-5 threads). In this service thread created at specific internal and abort. Once thread is created it performs some I/O operations & db operation.
I have a GUI for this service to provide configuration which is required by this service. In this GUI I want to add one more functionality which shows me the performance of windows service with respect to all threads. I want show CPU utilization (if multicore processor is available than all the processors utilization) with its memory utilization.
If you look at Windows Task Manager it shows CPU (Per core basis) + Memory Utilization, I want to build the same thing but only for threads running by my windows service.
Can anybody help me out how to get CPU% and memory utilization per thread?
I think you cannot get the CPU and Memory utilization of Threads. Instead you can get the same for your service.
My question is, why would you require to build your own functionality, where SysInternals Process explorer gives more details for you? Any specific needs?
If you need to monitor the thread activities, you could better log some information using Log4net or other logging tools. This will get you an idea about the threads and what they are doing.
To be more specific, you could publish the logs using TelNetAppender, which can be received by your application. This will help you to look into the Process in real time.

Why operating system does'nt perform round-robin between the two threads when it only has 1 CPU?

Problem: the operating system does not perform round-robin between the two threads and the system just hangs.
Our system is implemented as a HTTP Native Code Module in c++ for IIS/W2008 R2 64-bit.
Scenario:
Request 1 arrives to the web server; a new thread (t1) is started by IIS. The thread is executing.
Request 2 arrives before request 1 is finished. The IIS starts a new thread (t2). This thread goes into a loop waiting for a shared resource to be available. This behavior is program by us. Since thread two (t2) goes into a loop it starts consuming 100% of the CPU.
Problem: the operating system does not perform round-robin between the two threads and the system just hangs. If it would switch the execution to the first thread, the shared resource would be released and the second thread could run as well.
Even stranger: This behavior does only occur when the machine as 1 CPU. If we add another CPU to the machine it works perfectly, switching between the two threads as expected. Nothings hang.
A workaround (and better programming too) that makes it work when having only 1 CPU is to put a "sleep(100)" in the loop when checking for availability of the shared resource.
Why does not the operating system perform round-robin between the two threads when it only has 1 CPU? Is it related to VMWare?
This thread goes into a loop waiting for a shared resource to be available
This sounds like the wrong way to synchronize things, you will need to do signaling between threads, so that the OS gets a hint to perform the context switch.

Resources