In my server, there exists several CPUs (0-7). I need run parallel code, and each process affiliate with one CPU, so how do I know the CPU information for each process?
For example, if two processes (#0 and #1) exist, and process #0 uses CPU 5 and process #1 uses CPU 7.
how do I know that by programming in C or Fortran?
Use the sched_getcpu() call.
Keep in mind that a process/thread can be scheduled freely to run on any available cpu/core, so one of your processes could run on core 1 one second, and on core 2 the next milisecond. You can restrict which processors a process is allowed to run on with sched_setaffinity()
I'm not aware of any system call on Linux that will give you general information about what CPU a thread in running on. #nos is correct that sched_getcpu() will tell you which CPU a thread is running on, but only for the calling context.
You can do this by querying the /proc file system. However, if you find yourself building your application around this functionality, it is likely that you need to reexamine your design.
The file /proc/<pid>/stats contains a field that provides you with the last CPU the process ran on. You would just need to parse the output. (use man proc to see the field list).
In general it is the task of the operating system to abstract such things from applications.
Normally I see my applications (as simple as doing a grep on a huge file) change CPU core every once in a while.
Now if you want to force an application on a specific core you can manually set the CPU affinity.
I've written some pretty strange software in the past and I've never had the desire to know and/or control this.
Why would you want to know?
More generally, why do you want to know? The Linux kernel is very good at scheduling processes/threads to make the best use of the available cores.
Generally, you have to change the CPU affinity, because a process can migrate between processors: CPU Affinity (Linux Journal, 2003).
Related
Does fork always create a process in a separate processor?
Is there a way, I could control the forking to a particular processor. For example, if I have 2 processors and want the fork to create a parallel process but in the same processor that contains the parent. Does NodeJS provide any method for this? I am looking for a control over the allocation of the processes. ... Is this even a good idea?
Also, what are the maximum number of processes that could be forked and why?
I've no Node.js wisdom to impart, simply some info on what OSes generally do.
Any modern OS will schedule processes / threads on CPUs and cores according to the prevailing burden on the machine. The whole point is that they're very good at this, so one is going to have to try very hard to come up with scheduling / core affinity decisions that beat the OS. Almost no one bothers. Unless you're running on very specific hardware (which perhaps, perhaps one might get to understand very well), you're having to make a lot of complex decisions for every single different machine the code runs on.
If you do want to try then I'm assuming that you'll have to dig deep below node.JS to make calls to the underlying C library. Most OSes (including Linux) provide means for a process to control core affinity (it's exposed in Linux's glibc).
This is hard to word/ask so please bear with me:
When we see the output of assembly, this is what is going to be executed on the core(s) of the CPU. However, if a CPU has multiple cores- is all of the assembly executed on the same core? At what point would the assembly from the same program begin executed on a different core?
So if I had (assembly pseudo):
ADD x, y, z
SUB p, x, q
how will I know whether ADD and SUB will execute on the same core? Is this linked to affinity? I thought affinity only pinned a process to a CPU, not a core?
I am asking this because I want to try and understand whether you can reasonably predict whether consecutive assembly instructions execute on the same core and whether I can control that they only execute on the same core. I am trying to understand how the decision is made to change executing the same program code from one core, to a different core?
If assembly can change execution (even when using affinity) from CPUA Core1 to Core2, is this where QPI link speed will take effect- and also whether the caches are shared amongst the different CPU cores?
This is a rough overview that hopefully will provide you with the details you need.
Assembly code is translated into machine code; ie binary data, that is run by a CPU.
A CPU is the same as a core on a multi-core processor; ie a CPU is not the same as a processor (chip).
Every CPU has an instruction pointer that points to the instruction to execute next. This is incremented for every instruction executed.
So in a multi-core processor you would have one instruction pointer per core. To support more processes than there are available CPUs (or cores), the operating system will interrupt running processes and store their state (including the instruction pointer) at regular intervals. It will then restore the state of already interrupted processes and let them execute for a bit.
What core the execution is continued on is up to the operating system to decide, and is controlled by the affinity of the running thread (and probably some other settings also).
So to answer your question, there is no way of knowing if two adjacent assembly statements will run on the same core or not.
I'm talking mostly about Linux; but I guess what I am saying should be applicable to other OSes. However, without access to Windows source code, no one can reliably say how it behaves in its detail
I think your "abstraction" of what a computer is doing is inadequate. Basically, a (mono-threaded) process (or just a thread) is running on some "virtual" CPU, whose instruction set is the unpriviledged x86 machine instructions augmented by the ability to enter the kernel thru syscalls (usually, thru a special instruction like SYSENTER). So from an application point of view, system calls to the linux kernel are "atomic". See this and that answers.
Indeed, the processor is getting (at arbitrary instants) some interrupts (on Linux, cat /proc/interrupts repeated twice with a one-second delay would show you how often it is getting interrupted, basically many thousand times per second), and these interrupts are handled by the kernel. The kernel is scheduling tasks (e.g. threads or processes) premptively (they can be interrupted and restarted by the kernel at any time).
From an application point of view, interrupts don't really exist (but the kernel can send signals to the process).
Cores, interrupts and caches are handled by the hardware and/or the kernel, so from the application point of view, they don't really exist -except by "slowing down" the process. cache coherency is mostly dealt with in hardware, and with out-of-order execution makes a given -even tiny- binary program execution time unpredictable. (in other words, you cannot statically predict exactly how many milliseconds some given routine or loop will need; you can only dynamically measure that; read more about worst-case execution time).
Reading the Advanced Linux Programming book and the Linux Assembly Howto would help.
You cannot normally predict where each individial instruction will execute. As long as an individual thread is executing continuously, it will run inside of the same core/processor, but you cannot predict on which instruction the thread will be switched-out. The OS makes that decision, the decision of when to switch it back in, and on which core/processor to put it, based on the workload of the system and priority levels, among other things.
You can usually request to the OS specifically that a thread should always run on the same core, this is called affinity. This is normally a bad idea and it should only be done when absolutely necessary because it takes away from the OS the flexibility to decide what to do where, based on the workload; affinity will almost always result in a performance penalty.
Requesting processor-affinity is an extraordinary request that requires extraordinary proof that it would result in better performance. Don't try to outsmart the OS; the OS knows things about the current running environment that you don't know about.
I'm studying about threads and slightly confused about 1 thing.
If I have a single process with multiple threads running on a dual/quad core CPU, will different threads run concurrently on different cores?
Thanks in advance.
It Depends.
At least on Linux, each task gets assigned to a set of CPUs that it can execute on (processor affinity). And, at least on Linux, the scheduler will try to schedule the task on the same processor as last time, so that it gets the best benefit of CPU cache re-use. The hilarious thing is that it doesn't always rebalance when the system is under load, so it is possible to run one core quite hot and contested and leave three cores cool and relatively idle. (I've seen this exact behavior with the Folding # Home client.)
You can force the affinity you need with the pthread_setaffinity_np(3) routine for threaded applications or sched_setaffinity(2) for more traditional Unix-style fork(2)ed applications. Or you can use the taskset(1) program to set the affinity before or after starting an application. (Which is the approach I took with my silly Folding # Home client -- it was easy to modify the initscript to call taskset(1) to set the affinity of each client process correctly, so each client got its own core and didn't compete for resources with the other clients on different sibling HyperThreaded 'faked' execution cores.)
Yes
It depends on the language, the library, and the operating system, and whether the threaded application ever actually has multiple runnable threads at the same point in time, but usually the answer is "yes".
You can never be sure of that fact, but if it is processor-intensive (such as a game) then most likely yes.
In that case you need to synchronized your every core of processor with memory by using volatile keyword which ensure that every core of processor getting new updated value from memory.
Somnetimes the threads will run concurrently, sometimes not. Its all up to the package you use and the operating system and how CPU intensive each thread each.
I think that you are loosing the idea behind concurrency; it's not that you are looking to run processes on multiple cores. Instead, you're needing to not block on one process the entire time. A perfect example of this is with threading network listeners. You want to perform an accept which will actually create a new client->server socket. After this you want to do some processing with that socket while still be able to take new connections. This is where you would want to generate a thread to perform the processing so that the accept can get back on track to waiting for a new connection.
I want to have a real-time process take over my computer. :)
I've been playing a bit with this. I created a process which is essentially a while (1) (never blocks nor yields the processor) and used schedtool to run it with SCHED_FIFO policy (also tried chrt). However, the process was letting other processes run as well.
Then someone told me about sched_rt_runtime_us and sched_rt_period_us. So I set the runtime to -1 in order to make the real-time process take over the processor (and also tried making both values the same), but it didn't work either.
I'm on Linux 2.6.27-16-server, in a virtual machine with just one CPU. What am I doing wrong?
Thanks,
EDIT: I don't want a fork bomb. I just want one process to run forever, without letting other processes run.
There's another protection I didn't know about.
If you have just one processor and want a SCHED_FIFO process like this (one that never blocks nor yields the processor voluntarily) to monopolize it, besides giving it a high priority (not really necessary in most cases, but doesn't hurt) you have to:
Set sched_rt_runtime_us to -1 or to the value in sched_rt_period_us
If you have group scheduling configured, set /cgroup/cpu.rt_runtime_us to -1 (in case
you mount the cgroup filesystem on /cgroup)
Apparently, I had group scheduling configured and wasn't bypassing that last protection.
If you have N processors, and want your N processes to monopolize the processor, you just do the same but launch all of them from your shell (the shell shouldn't get stuck until you launch the last one, since it will have processors to run on). If you want to be really sure each process will go to a different processor, set its CPU affinity accordingly.
Thanks to everyone for the replies.
I'm not sure about schedtool, but if you successfully change the scheduler using sched_setscheduler to SCHED_FIFO, then run a task which does not block, then one core will be entirely allocated to the task. If this is the only core, no SCHED_OTHER tasks will run at all (i.e. anything except a few kernel threads).
I've tried it myself.
So I speculate that either your "non blocking" task was blocking, or your schedtool program failed to change the scheduler (or changed it for the wrong task).
Also You can make you process a SCHED_FIFO with highest priority of 1. So the process would run forever and it wont be pre-empted.
I'm writing a Linux application which observes other applications and tracks consumption of resources . I'm planning work with Java, but programming language isn't important for me. The Goal is important, so I can switch to another technology or use modules. My application runs any selected third party application as child process. Mostly child software solves some algorithm like graphs, string search, etc. Observer program tracks child's resources while it ends the job.
If child application is multi-threaded, maybe somehow is possible to track how much resources consumes each thread? Application could be written using any not distributive-memory threads technology: Java threads, Boost threads, POSIX threads, OpenMP, any other.
In modern Linux systems (2.6), each thread has a separate identifier that has nearly the same treatment as the pid. It is shown in the process table (at least, in htop program) and it also has its separate /proc entry, i.e. /proc/<tid>/stat.
Check man 5 proc and pay particular attention to stat, statm, status etc. You should find the information you're interested in there.
An only obstacle is to obtain this thread identifier. It is different with the process id! I.e. getpid() calls in all threads return the same value. To get the actual thread identifier, you should use (within a C program):
pid_t tid = syscall(SYS_gettid);
By the way, java virtual machine (at least, its OpenJDK Linux implementation) does that internally and uses it for debugging purposes in its back-end, but doesn't expose it to the java interface.
Memory is not allocated to threads, and often shared across threads. This makes it generally impossible and at least meaningless to talk about the memory consumption of a thread.
An example could be a program with 11 threads; 1 creating objects and 10 using those objects. Most of the work is done on those 10 threads, but all memory was allocated on the one thread that created the objects. Now how does one account for that?
If you're willing to use Perl take a look at this: Sys-Statistics-Linux
I used it together with some of the GD graphing packages to generate system resource usage graphs for various processes.
One thing to watch out for - you'll really need to read up on /proc and understand jiffies - last time I looked they're not documented correctly in the man pages, you'll need to read kernel source probably:
http://lxr.linux.no/#linux+v2.6.18/include/linux/jiffies.h
Also, remember that in Linux the only difference between a thread and process is that threads share memory - other than that they're identical in how the kernel implements them.