Execute process for n cpu cycles

Execute process for n cpu cycles - linux

How to execute a process for n cpu cycles on Linux? I have a batch processing system on a multi-core server and would like to ensure that each task gets exactle the same amount of cpu time. Once the cpu amount is consumed I would like to stop the process. So far I tried to do some thing with /proc/pid/stats utime and stime, but I did not succeed.

I believe it is impossible (to give the exact same number of cycles to several processes; a CPU cycle is often less than a nanosecond). You could execute a process for x CPU seconds. For that use setrlimit(2) with RLIMIT_CPU
Your batch processor could also manage time itself, see time(7). You could use timers (see timer_create(2) & timerfd_create(2)), have an event loop around poll(2), measure time with clock_getttime(2)
I'm not sure it is useful to write your own batch processing system. You could use the existing batch or slurm or gnqs (see also commercial products like pbsworks, lsf ..)

Related

How can I measure the queuing time of a process (CPU intensive) before it gets executed?

Actually I am trying to run some experiments where i need to run benchmarks under heavy load. Starting from CPU load, I schedule a sysbench daemon that generates 1000 primes. I set its priority to low so that it only runs once the cpu is not busy with other tasks so as to reduce its impact on the regular workload. Since the priority of the process is set to Low, the process keeps waiting in the queue until it finds a free cpu core to run on. The problem is that its result shows the execution time including the wait period (in the queue) which renders the result invalid.
Is there some way that I could actually calculate the wait period and subtract it from the result to get a valid result?

How can I utilize multithread CPU most in Matlab?

I just bought the Matlab Parallel Computing toolbox.
The command matlabpool open opens parallel workers with the number of the cores in my CPU.
But each of my CPU core has two threads. According to Windows Task Manager, each worker can only use half performance of one CPU core, which seems could be interpreted as one worker = one thread = "half core".
Therefore, after all workers opened, still half of the total power of CPU could be utilized.
Is there any other command could help with that?

By default, the local cluster type for matlabpool considers only "real" cores when choosing the default number of workers to launch. This is because for MATLAB workloads, hyperthreading often does not provide much benefit. However, this value is only a default - you can edit the cluster type and run anything up to 12 local workers.

You need to understand HyperThreading to answer this question.
Matlab launches a worker thread for every CPU. Suppose you now use a directive like parfor to distribute computation over multiple threads. Every thread will now be crunching numbers happily.
Suppose you are doing a sum of a large vector of numbers. What actually happens is the following:
sum = sum + a[0]
array a is not in my CPU cache yet
I will fetch a small part of a from main memory and put it in the CPU cache
sum = sum + a[1]
sum = sum + a[2]
...
During the fetch of a, the CPU stalls, waiting for the system memory. This is called a pipeline bubble, and it is not good for performance. Sometimes, a part of the array a was swapped out to the hard drive. The operating system will need to access the drive to put that part into main memory, after which it will be transferred to the CPU cache. When this happens, your operating system will not let the CPU wait for +200 ms. It will use that time to execute another task instead (like the backup running on your system, or refreshing your screen, or ...).
Switching tasks on a CPU results in a performance penalty. To switch to a different task, the operating system must save the CPU registers in main memory, and load the CPU registers of the other task back into the CPU first. This takes time.
With HyperThreading, the number of registers per CPU is doubled. This means that two processes can 'occupy' the CPU. Only one can be executed, but during a stall, the operating system can switch to the second process without any performance penalty.
Forget how Microsoft Windows reports CPU usage. It's wrong. CPU usage is a lot more complicated than only a simple 47%. The real question is rather: should matlab register two threads per core, or only one?
Arguments pro:
During a stall, the CPU can quickly switch to the other thread and continue executing.
Arguments contra:
There are more threads, and the problem is divided in smaller pieces. This may actually reduce performance, as you need to put more pieces together to get the final result.
A context switch will still 'poison' the L1 and L2 cache, loading in pieces of memory that are of no use to the other thread on the CPU.
If there are no stalls, you have more overhead.
On a desktop, the operating system will also want to run: redrawing the screen, moving your mouse, etc. When all logical cpu's are in use, the operating system is required to do an actual (costly) context switch.
Your problem will only be complete if all pieces of the problem have been calculated. Using all the cores / threads increases the risk of one thread taking more time.
My guess is that the Matlab developers considered the arguments contra to be more important than the arguments pro. My own performance tests certainly suggest that there is little performance gain from HyperThreading for cpu-intensive calculations.

Reserve a processor for only one process (with already the max priority)

I have used this piece of code for trying to set the -same- high priority while executing a program :
CPU_SET(CPU_NUM, &cmask);
if (pthread_setaffinity_np(pid, sizeof(cmask), &cmask) < 0) {
LOG_ERROR("Could not set cpu affinity to core %d", CPU_NUM); goto exit_err;
}
errno = 0;
setpriority(PRIO_PROCESS, 0, -19);
The purpose of the program is to perform a computation for a constant bunch (every 80 bytes) of input.
But when executing the program, the time elapsed for this computation varies from 30% to 150%.
When plotting the computation time values, I was waiting for a -quite- smooth graph were the deviation would be something like 10%-15%, but instead there is more than 40% !!!
So I would like to ask, if the CPU is interfering the execution of the program with an other, and if so could I force the CPU to run ONLY a specific program?
Thanks in advance !
P.S. I haven't found a thread that could answer to my question yet...
The most relevant is :) :
Linux reserve a processor for a group of processes (dynamically)

To try and reduce jitter some of the things you can do are:
Ensure sure you've turned off CPU scaling.
Set scheduling policy to SCHED_FIFO for that program.
Try and pin your process to a single processor if you have more than one.
Try and run as few other processes at the same time while you're measuring your program.
Don't trigger sources of time related non-determinism (e.g. disk I/O).
It is probably useful to skim through How to build a Linux RT application because accurate measurement is the same domain - it's possible to be more extreme though:
Ensure your program doesn't use dynamic memory allocations.
Use a realtime Linux kernel.
Prevent Linux from scheduling non-specific userspace programs on a given CPU.
Even disable timer ticks on a given CPU (CONFIG_TASK_ISOLATION).
Modern desktop/server processors are so complicated that trying to precisely measure a single program's execution time with low variance is extremely hard. Things like the various caches and pipeline starting states can perturb execution times in any number of ways so there are always going to be limits.

Process and thread CPU usage in Linux top

We just discovered the peculiar feature of the Linux "top" tool.
The feature is that the summary cpu time for all threads is less than the time displayed for entire process. This is observed when our application spawns more than 50 threads and works for several minutes.
So the question is: what is that extra time consumed not by any thread but by the process itself? How is that possible?
As I understand the information about processes and threads CPU usage is taken from /proc/<pid>/stat & /proc/<pid>/task/<tid>/stat files. Who fills these files and why the time in <pid>/stat is not a sum of all <tid>/stat times?

How can OS actually measure the CPU power?

Currently I think that processor only has two states: run and not run. If it's running, it will use its full power to process a task. If there are multiple processes, processes will be shared by a portion of CPU.
How can the computing power can be divided into "portions"? So, suppose a CPU has 1 million transistors, only half of the transistors are used if the CPU is only at 50%?
Or is this related to allocated processing time for each process? i.e. assume "100%" means a process seizes a CPU for 200 milliseconds, if a process with a default nice value (priority value) 0, which means the process will receive 50% computing power or, in other word, 100 milliseconds. What is the correct idea?

Let me explain this on the example of Intel x86 CPUs and Windows NT (and its derivatives). One of the built-in system processes on these OSes is the System Idle Process. This process represents how much CPU time is utilized by the operating system's "idle loop". That idle loop does nothing else but executes the HLT instruction of the CPU. That instruction, in turn, commands the CPU to do nothing until the next interrupt arrives.
Therefore, if the scheduler decides that there are no processes that require CPU time at the given moment, it is given to the System Idle Process. If, say, 99% of the time in the last n seconds was spent by "executing" that process, it means that the CPU was really utilized only in 1% in these n seconds.
I believe it is totally analogous with Linux, only that it doesn't have a separate process to model the "idleness" of the CPU.
On a side note : it is, of course, possible, to have a OS that doesn't execute the HLT instruction at all. That was the case with Windows 98 and earlier (including, obviously, MS-DOS), whose idle loop simply consisted of a jmp $. That caused the CPU to use much more power.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string