While working with a tomcat process in Linux we observed that the time field shows
5506:34 ( cumulative CPU time ) . While exploring this is the CPU percentage of time spent running during the entire lifetime of a process.
Since this is a Java process we also observed that memory was almost full and needed a restart.
My Question is what exactly is this Cumulative CPU time. Why does this specific process taking more CPU time when there are other process too ?
the total time the cpu spends on a process. If the process uses more threads, these are cumulated.
Related
I would like to measure the CPU usage of a running database server when I execute a query.
The goal is to get the wallclock time, total CPU time, user CPU time and kernel(system) CPU time, so I can estimate how much time is spent on computation, and how much time is on I/O.
The server is dedicated to this experiment and the CPU usage is close to 0% when no query is running, so my plan is to
start the monitor
run the query
stop the monitor and collect the CPU usage during the interval
The monitor can either give a sum of CPU time in that period or a list of sampling results, which I can sum them up myself.
I have searched for similar problems and tried several solutions but they do not satisfy my need.
pidstat pidstat seems good but the granularity is too coarse. The smallest interval is 1 second and I will need a finer interval such as 100ms.
mpstat The same problem as pidstat. Large interval.
top top can run in batch mode but the sampling interval is also big (2s-3s). It also does not provide user/kernel time breakdown.
Thank you all for your suggestions!
Try using the "time" or "times" command. You might have to make some workaround to use these commands.
time program_name
https://man7.org/linux/man-pages/man1/time.1.html
"times" needs to be executed from the same shell from where database server has been started.
https://man7.org/linux/man-pages/man2/times.2.html
I am observing strange effects with the CPU percentage as shown in e.g. top or htop on Linux (Ubuntu 16.04) for one special application. The application uses many threads (around 1000). Each thread has one computational task. About half of these tasks need to be computed once per "trigger" - the trigger is an external event received exactly every 100ms. The other threads are mostly sleeping (waiting for user interaction) and hence do not play a big role here. So to summarise: many threads are waking up basically simultaneously within a short period of time, doing there (relatively short) computation and going back to sleep again.
Since the machine running this application has 8 virtual CPUs (4 cores each 2 threads, it's an i7-3612QE), only 8 threads can really wake up at a time, so many threads will have to wait. Also some of these tasks have interdependencies, so they anyway have to wait, but I think as an approximation one can think of this application as a bunch of threads going to the runnable state at the same time every 100ms and each doing only a short computation (way below 1ms of CPU time each).
Now coming to the strange effect: If I look at the CPU percentage in "top", it shows something like 250%. As far as I know, top looks on the CPU time (user + system) the kernel accounts for this process, so 250% would mean the process uses 3 virtual CPUs on average. So far so good. Now, if I use taskset to force the entire process to use only a single virtual CPU, the CPU percentage drops to 80%. The application has internal accounting which tells me that still all data is being processed. So the application is doing the same amount of work, but it seemingly uses less CPU resources. How can that be? Can I really trust the kernel CPU time accounting, or is this an artefact of the measurement?
The CPU percentage also goes down, if I start other processes which take a lot of CPU, even if the do nothing ("while(true);") and are running at low priority (nice). If I launch 8 of these CPU-eating processes, the application reaches again 80%. With fewer CPU-eaters, I get gradually higher CPU%.
Not sure if this plays a role: I have used the profiler vtune, which tells me my application is actually quite inefficient (only about 1 IPC), mostly because it's memory bound. This does not change if I restrict the process to a single virtual CPU, so I assume the effect is not caused by a huge increase in efficiency when running everything on the same core (which would be strange anyway).
My question was essentially already answered by myself in the last paragraph: The process is memory bound. Hence not the CPU is the limited resource but the memory bandwidth. Allowing such process to run on multiple CPU cores in parallel will mainly have the effect that more CPU cores are waiting for data to arrive from RAM. This is counted as CPU load, since the CPU is executing the thread, but just quite slowly. All my other observations go along with this.
Actually I am trying to run some experiments where i need to run benchmarks under heavy load. Starting from CPU load, I schedule a sysbench daemon that generates 1000 primes. I set its priority to low so that it only runs once the cpu is not busy with other tasks so as to reduce its impact on the regular workload. Since the priority of the process is set to Low, the process keeps waiting in the queue until it finds a free cpu core to run on. The problem is that its result shows the execution time including the wait period (in the queue) which renders the result invalid.
Is there some way that I could actually calculate the wait period and subtract it from the result to get a valid result?
We just discovered the peculiar feature of the Linux "top" tool.
The feature is that the summary cpu time for all threads is less than the time displayed for entire process. This is observed when our application spawns more than 50 threads and works for several minutes.
So the question is: what is that extra time consumed not by any thread but by the process itself? How is that possible?
As I understand the information about processes and threads CPU usage is taken from /proc/<pid>/stat & /proc/<pid>/task/<tid>/stat files. Who fills these files and why the time in <pid>/stat is not a sum of all <tid>/stat times?
In Linux, we can use "cat /proc/processs-id/sched" to get the scheduling infomation, nr_switches,nr_voluntary_switches,nr_involuntary_switches tell us how many times has the process be scheduled. Is there any similar method that we can get a thread's scheduling times?
thanks in advance!
It's hard to know what you mean by "scheduling times". If you mean kernel/user run ticks then /prox/xxx/stat looks like it has some details about the runtimes.
Under linux, the threads of a process can be found in /proc/xxx/task/yyy. Each directory corresponds to a thread process associated with the parent.
utime %lu Amount of time that this process has been scheduled in user mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK). This includes guest time, guest_time (time spent running a virtual CPU, see below), so that applications that are not aware of the guest time field do not lose that time from their calculations.
stime %lu Amount of time that this process has been scheduled in kernel mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK).
I'd check the proc manpages for a list of the available files.
man proc