Script to spawn a list of commands up to n% of CPU utilization and/or m% of memory consumption - linux

I am looking for a script that will spawn each command in a list until CPU/memory/network reaches a specified bound.
Some commercial scheduling tools will run as many jobs as it can until CPU utilization hits 90%. At that point, they wait until CPU utilization goes below a specified point and start another job. This maximizes utilization to finish the set faster.
An obvious example is in copying files. With 100+ files to copy, it is ludicrous to copy them one at a time. Such little CPU time is used, there should be many copies started. I/O bandwidth and network bandwidth become the constraint to manage.
I would like to not reinvent the wheel if there is already something available. Anyone know of something like this?

Related

Why would a process use over twice as many CPU resources on a machine with double the cores?

I'm hoping someone could point me in the right direction here so I can learn more about this.
We have a process running on our iMac Pro 8-core machine which utilises ~78% CPU.
When we run the same process on our new Mac Pro 16-core machine it utilises ~176% CPU.
What reasons for this could there be? We were hoping the extra cores would allow us to run more processes simultaneously however if each uses over double the CPU resources, surely that means we will be able to run fewer processes on the new machine?
There must be something obvious I'm missing about architecture. Could someone please help? I understand I haven't included any code examples, I'm asking in a more general sense about scenarios that could lead to this.
I suspect that the CPU thread manager tries to use as much CPU as it can/needs. If there are more processes needing CPU time, then the cycles will be shared out more sparingly to each. Presumably your task runs correspondingly faster on the new Mac?
The higher CPU utilization just indicates that it's able to make use of more hardware. That's fine. You should expect it to use that hardware for a shorter period, and so more things should get done in the same overall time.
As to why, it completely depends on the code. Some code decides how many threads to use based on the number of cores. If there are non-CPU bottlenecks (the hard drive or GPU for example), then a faster system may allow the process to spend more time computing and less time waiting for non-CPU resources, which will show up as higher CPU utilization, and also faster throughput.
If your actual goal is to have more processes rather than more throughput (which may be a very reasonable goal), then you will need to tune the process to use fewer resources even when they are available. How you do that completely depends on the code. Whether you even need to do that will depend on testing how the system behaves when there is contention between many processes. In many systems it will take care of itself. If there's no problem, there's no problem. A higher or lower CPU utilization number is not in itself a problem. It depends on the system, where your bottlenecks are, and what you're trying to optimize for.

CPU percentage and heavy multi-threading

I am observing strange effects with the CPU percentage as shown in e.g. top or htop on Linux (Ubuntu 16.04) for one special application. The application uses many threads (around 1000). Each thread has one computational task. About half of these tasks need to be computed once per "trigger" - the trigger is an external event received exactly every 100ms. The other threads are mostly sleeping (waiting for user interaction) and hence do not play a big role here. So to summarise: many threads are waking up basically simultaneously within a short period of time, doing there (relatively short) computation and going back to sleep again.
Since the machine running this application has 8 virtual CPUs (4 cores each 2 threads, it's an i7-3612QE), only 8 threads can really wake up at a time, so many threads will have to wait. Also some of these tasks have interdependencies, so they anyway have to wait, but I think as an approximation one can think of this application as a bunch of threads going to the runnable state at the same time every 100ms and each doing only a short computation (way below 1ms of CPU time each).
Now coming to the strange effect: If I look at the CPU percentage in "top", it shows something like 250%. As far as I know, top looks on the CPU time (user + system) the kernel accounts for this process, so 250% would mean the process uses 3 virtual CPUs on average. So far so good. Now, if I use taskset to force the entire process to use only a single virtual CPU, the CPU percentage drops to 80%. The application has internal accounting which tells me that still all data is being processed. So the application is doing the same amount of work, but it seemingly uses less CPU resources. How can that be? Can I really trust the kernel CPU time accounting, or is this an artefact of the measurement?
The CPU percentage also goes down, if I start other processes which take a lot of CPU, even if the do nothing ("while(true);") and are running at low priority (nice). If I launch 8 of these CPU-eating processes, the application reaches again 80%. With fewer CPU-eaters, I get gradually higher CPU%.
Not sure if this plays a role: I have used the profiler vtune, which tells me my application is actually quite inefficient (only about 1 IPC), mostly because it's memory bound. This does not change if I restrict the process to a single virtual CPU, so I assume the effect is not caused by a huge increase in efficiency when running everything on the same core (which would be strange anyway).
My question was essentially already answered by myself in the last paragraph: The process is memory bound. Hence not the CPU is the limited resource but the memory bandwidth. Allowing such process to run on multiple CPU cores in parallel will mainly have the effect that more CPU cores are waiting for data to arrive from RAM. This is counted as CPU load, since the CPU is executing the thread, but just quite slowly. All my other observations go along with this.

100% cpu usage profile output, what could cause this based on our profile log?

We have a massively scaled nodejs project (~1m+ users) that is suddenly taking a massive beating on our CPU. (Epyc 24c 2ghz)
We've been trying to debug what's using all our CPU using a profiler, (and I can show you the output down below) and it's behaving really weirdly whatever it is.
We have a master process that spawns 48 clusters, after they're all loaded the cpu usage slowly grows to max. After killing a cluster, the LA doesn't dip at all. However after killing the master process, it all goes back to normal.
The master process obviously isn't maxing all threads, and killing a cluster should REALLY do the trick?
We even stopped the user input of the application and a cluster entirely and it didn't reduce cpu usage at all.
We've got plenty of log files we could send if you want them.
Based on the profile, it looks like the code is spending a lot of time getting the current time from the system. Do you maybe have Date.now() (or oldschool, extra-inefficient +new Date()) calls around a bunch of frequently used, relatively quick operations? Try removing those, you should see a speedup (or drop in CPU utilization, respectively).
As for stopping user input not reducing CPU load: are you maybe scheduling callbacks? Or promises, or other async requests? It's not difficult to write a program that only needs to be kicked off and then keeps the CPU busy forever on its own.
Beyond these rough guesses, there isn't enough information here to dig deeper. Is there anything other than time-related stuff on the profile, further down? In particular, any of your own code? What does the bottom-up profile say?

Execute process for n cpu cycles

How to execute a process for n cpu cycles on Linux? I have a batch processing system on a multi-core server and would like to ensure that each task gets exactle the same amount of cpu time. Once the cpu amount is consumed I would like to stop the process. So far I tried to do some thing with /proc/pid/stats utime and stime, but I did not succeed.
I believe it is impossible (to give the exact same number of cycles to several processes; a CPU cycle is often less than a nanosecond). You could execute a process for x CPU seconds. For that use setrlimit(2) with RLIMIT_CPU
Your batch processor could also manage time itself, see time(7). You could use timers (see timer_create(2) & timerfd_create(2)), have an event loop around poll(2), measure time with clock_getttime(2)
I'm not sure it is useful to write your own batch processing system. You could use the existing batch or slurm or gnqs (see also commercial products like pbsworks, lsf ..)

Will an IO blocked process show 100% CPU utilization in 'top' output?

I have an analysis that can be parallelized over a different number of processes. It is expected that things will be both IO and CPU intensive (very high throughput short-read DNA alignment if anyone is curious.)
The system running this is a 48 core linux server.
The question is how to determine the optimum number of processes such that total throughput is maximized. At some point the processes will presumably become IO bound such that adding more processes will be of no benefit and possibly detrimental.
Can I tell from standard system monitoring tools when that point has been reached?
Would the output of top (or maybe a different tool) enable me to distinguish between a IO bound and CPU bound process? I am suspicious that a process blocked on IO might still show 100% CPU utilization.
When a process is blocked on IO, it isn't running, so no time is accounted against it. If there's another process that can run, then that will run instead; if there isn't, the time is counted as 'IO wait', which is accounted as a global statistic.
IO wait would be a useful thing to monitor. It shows up in top's header as something like %iw. You can monitor it in more detail with tools like iostat and vmstat. Serverfault might be a better place to ask about that.
Even a single IO-bound process will rarely show high CPU utilization because the operating system has scheduled its IO and is usually just waiting for it to complete. So top cannot accurately distinguish between an IO-bound process and a non-IO-bound process that merely periodically uses the CPU. In fact, a system horribly overloaded with all IO-bound processes, barely able to accomplish anything can exhibit very low CPU utilization.
Using only top, as a first pass, you can indeed merely keep adding threads/processes until CPU utilization levels off to determine the approximate configuration for a given machine.
You can use tools like iostat and vmstat to show how much time processes are spending blocked on I/O. There's generally no harm in adding more processes than you need, but the benefit decreases. You should measure throughput vs. processes as a measurement of overall efficiency.

Resources