The result of Linux time command - linux

I got a result from linux time command.
real 119m10.626s
user 133m0.952s
sys 20m32.155s
From the information I searched, it seems that user+sys should less than real, but it is not the case here.
Does somebody know why?

Multiple CPUs.
A multi-threaded application can run simultaneously on multiple CPU cores, (and thus accumulating CPU Time as multiples of real time)

Related

Weird CPU usage: 100% utilization, but temperature abnormally low

I have encountered a weird behavior with my algorithm/cpu, I was wondering what could be causing this.
CPU that I am using: AMD 2990WX 32c/64t, OS: Ubuntu 18.04LTS with 4.15.0-64-generic kernel.
The algorithm (Julia 1.0.3):
#sync #distributed for var in range(0.1,step=0.1,stop=10.0)
res=do_heavy_stuff(var) #solves differential equation,
#basically, multiplying 200x200 matrices many times
save(filename,"RES",res)
end
Function do_heavy_stuff(var) takes ~3 hours to solve on a single CPU core.
When I launch it in parallel with 10 processes (julia -p 10 my_code.jl)it takes ~4 hours for each parallel loop, meaning every 4 hours I get 10 files saved. The slowdown is expected, as cpu frequency goes down from 4.1Ghz to 3.4Ghz.
If I launch 3 separate instances with 10 processes each, so a total cpu utilization is 30 cores, it still takes ~4 hours for one loop cycle, meaning I get 30 runs completed and saved every 4 hours.
However, if I run 2 instances (one has nice value of 0, another nice value of +10) with 30 processes each at once julia -p 30 my_code.jl, I see (using htop) that CPU utilization is 60(+) threads, but the algorithm becomes extremely slow (after 20 hours still zero files saved). Furthermore, I see that CPU temperature is abnormally low (~45C instead of expected 65C).
From this information I can guess, that using (almost) all threads of my cpu makes it do something useless that is eating up CPU cycles, but no floating point operations are being done. I see no I/O to SSD, I utilize only half of RAM.
I launched mpstat mpstat -A: https://pastebin.com/c19nycsT and I can see that all of my cores are just chilling in idle state, that explains low temperature, however, I still don`t understand what exactly is the bottleneck? How do I troubleshoot from here? Is there any way too see (without touching hardware) whether the problem is RAM bandwidth or something else?
EDIT: It came to my attention, that I was using mpstat wrong. Apparently mpstat -A gives cpu stats since launch of the computer, while what I was needed was short time integrated results that can be obtained with mpstat -P ALL 2. Unfortunately, I only learned this after I killed my code in question, so no real data from mpstat. However, I am still interested, how would one troubleshoot such situation, where cores seems to be doing something, but result is not showing? How do I find the bottleneck?
Since you are using multiprocessing there are 2 most likely reasons for the observer behavior:
long delays on I/O. When you are processing lots of disk data or reading data from the network your processes are naturally staled. In this case CPU utilization can be low combined with long execution times.
high variance of execution time for do_heavy_stuff. This variance could arise from unstable I/O or different model parameters resulting in different execution times. Why it is a problem requires understanding how #distributed is sharing the workload among worker processes. Namely, each worker gets an equal of the for loop. For an example if you have 4 workers the first one gets var in range 0.1:0.1:2.5 the second one 2.6:0.1:5.0 and so on. Now if some of the var values result in heavy tasks the first worker might get 5h of work and other workers 1h of work. This means that #sync completes after 5 hours with only one CPU actually working all time.
Looking at your post I would strongly bet on the second reason.

cpu time jumps a lot in virtual machine

I have a C++ program running with 20 threads (boost threads) on one of the RHEL6.5 systems virtualized in dell server. The result is deterministic, but the cpu time and wall time varies a lot in different runs. Sometimes, it takes 200s cpu time to finish, sometimes it may take up to 300s cpu time to finish. This bothers me as performance is a criterion for our testing.
I've changed the originally used boost::timer::cpu_timer for wall/cpu time calc and use sys apis 'clock_gettime' and 'getrusage'. It doesn't help.
Is it because of the 'steal time' by hypervisor (Vmware)? Is steal time included in the user/sys time collected by 'getrusage'?
Anyone have knowledge on this? Many Thanks.
It would be useful if you provided some extra information. For example are your threads dependent? meaning is there any synchronization going among them?
Since you are using a virtual machine, how is your CPU shared with other users of the server. It might be that even the same single CPU core is shared, thus not each time you have the same allocation of CPU resources [this is the steal time you mention above].
Also you mention that CPU time is different: this is the time spent in user code. If you have sync among threads (such as a mutex, etc) then depending on how operating system wakes up threads etc, the over all time might vary.

Why Running multiple same commands take a very long time

This isn't so much of a programming question, but more of a problem which I've encountered lately, which I'm trying to understand.
Example, running an ls command in linux take maybe ..... 1 sec.
But when I spawn off a few thousands of ls command simultaneously, I noticed that some of the process is not running, and kinda take a very long time to run.
Why is that so? And how can we work around that?
Thanks in advance.
UPDATE:
I did a ps, and saw that a couple of the ls commands were in the state of D<. I checked up a bit, and understand that it is an Uninterruptable Sleep. What is that? And when will that happen? How to avoid that?
The number of processes or threads that can be executing concurrently is limited by the number of cores in your machine.
If you spawn thousands of processes or threads simultaneously the kernel is only able to run 'n' (where n equals the number of available cores) at the same time, some of them will have to wait to be scheduled.
If you want to run more processes or threads truly concurrently then you need to increase the number of available cores in the system (ie. by adding CPUs, enabling hyperthreading if available).

Why do I receive a different runtime every time I run "time ./a.out" on the same program?

I am currently trying to reduce the runtime of a kmeans program, however every time I run the "time ./a.out" command the terminal is giving me a different answer even though I haven't changed any of the code. Does anyone have any idea why this is?
real 0m0.100s
user 0m0.082s
sys 0m0.009s
bash-4.1$ time ./a.out
real 0m0.114s
user 0m0.084s
sys 0m0.006s
bash-4.1$ time ./a.out
real 0m0.102s
user 0m0.087s
sys 0m0.005s
bash-4.1$ time ./a.out
real 0m0.099s
user 0m0.082s
sys 0m0.008s
bash-4.1$ time ./a.out
real 0m0.101s
user 0m0.083s
sys 0m0.006s
this is after running the same command consecutively.
On a modern system many many processes run in parallel (or better quasi parallel). That means the system switches between all processes. Note: it does not switch to the next process once one process is finished. That would mean processes would have to wait, get blocked. Instead each process gets a bit of time now and then, until it has finished.
The more processes, the slower the system altogether, the slower the single process gets when measured in absolute duration. That is what you see.
The typical strategy for this is called "round robin". You may want to google that term to read more about this topic.
First, let us understand that the time command will "record the elapsed time or CPU Resource Used time" of the program. That translates to how much time the program runs on the processor. As you have noted, there are different times reported for each run of the program in all categories: Real time, User time, and System time.
Second, let us understand that modern systems will share the processor with all the other processes running on a system (only one process is in control of any core of the processor at any given time), and use many different schemes for how these processes share the processor and system resources, hence the different Real and User times. These times depend on how your system swaps out programs.
The sys time will depend on the program itself, and what resources it is requesting. As with any process, if the resources have been requested by another process, it will be put to sleep, waiting on the resource. Depending on the resource and how your particular system handles shared resources, a process may spend some idle time waiting for the resource and be put to sleep only after a timer times it out, or immediately if the processor can guess that the resource will take longer than the timer. Again, this is highly dependent on how your particular system handles these tasks, on your processor, and on the resources being requested.

the number of pthread_mutex in running system

I have a strange question. I have to calculate the number of
pthread_mutex in running system, for example, debian, ubuntu,system in
microcontroller and etc. I have to do it without LD_PRELOAD,
interrupting, overloading of functions and etc. I have to calculate it
in random time.
Do somebody have idea how I can do it? Can you see me way?
for the count the threads:
ps -eLf will give you a list of all the threads and processes currently running on the system.
However you ask for a list of all threads that HAVE executed on the system, presumably since some arbitrary point in the past - are you sure that is what you mean? You could run ps as a cron job and poll the system every X minutes, but you would miss threads that were born and died between jobs. You would also have a huge amount of data to deal with.
For count the mutex it's impossible

Resources