Linux acceptable load average - linux

I have a linux dedicated server machine(8cores 8gbRAM) where i run some crawler php scripts. The load on the system ends up being arround 200, which sounds a lot. Since i am not using the machine to host content, what could be the sideeffects of such high level of load for the purposes stated above.

Machines were made to work so there are no issues with high load average, per se. But, a high load average can be an indicator of a performance issue, often. Such investigation is usually application specific, but here is a very general guideline:
Since load average is a combined metric of (CPU, IO .. etc) you want to examine all separately. I would start with making sure the machine is not thrashing, by checking swap usage (vmstat comes in handy), and disk performance (using iostat). You may also check if your operations are CPU intensive.

You should read your load average value as a 3 component value (1 minute load, 5 minutes load and 15 minutes load respectively).
Take a look at the example taken from Wiki:
For example, one can interpret a load average of "1.73 0.60 7.98" on a single-CPU system as:
during the last minute, the system was overloaded by 73% on average (1.73 runnable processes, so that 0.73 processes had to wait for a turn for a single CPU system on average).
during the last 5 minutes, the CPU was idling 40% of the time on average.
during the last 15 minutes, the system was overloaded 698% on average (7.98 runnable processes, so that 6.98 processes had to wait for a turn for a single CPU system on average).
Full article
Please note that this value depends on the resources of your machine.
Cheers!

Related

What is the load on the system

I have a Red hat server where I can see the load average on the system is 23 24 23 (1min 5min 15min) using the top command. And i can see in /proc/cpuinfo there are 24 processors entry (0-23). But in each processor entry the cpu cores value is 6 and in each processor entry the physical id is either 1 or 0.
I want to know if my system is overloaded. Can anyone please tell me.
It seems you have a system with two processors, each with 6 cores. Each of the cores can likely run hyperthread => 2 x 6 x 2 = 24. In /proc/cpuinfo, top etc, you will see each hyperthread: that's the number of parallel processes or threads that your hardware can run.
The quick answer is that your system is probably not overloaded and that it processes a relatively stable amount of work over time (as the 1, 5 and 15 minute values are about the same). A rule of thumb is that the load average value should stay below the number of hyperthreads -- this is not, however, exactly true.
You'll find a more in-depth discussion here:
https://unix.stackexchange.com/questions/303699/how-is-the-load-average-interpreted-in-top-output-is-it-the-same-for-all-di
and here:
https://superuser.com/questions/23498/what-does-load-average-mean-on-unix-linux
and perhaps here:
https://linuxhint.com/load_average_linux/
However, please keep in mind that load average does not tell you everything about your system -- though it is usually a pretty good indicator in my experience. You'd have to check many other factors to determine overloadedness, like memory pressure, I/O wait times, I/O bandwidth utilization. It also depends on what kind of processing the system is performing.

Estimate Core capacity required based on load?

I have quad core ubuntu system. say If I see the load average as 60 in last 15 mins during peak time. Load average goes to 150 as well.
This loads happens generally only during peak time. Basically I want to know if there is any standard formula to derive the number of cores ideally required to handle the given load ?
Objective :-
If consider the load as 60 then it means 60 task were in queue on an average at any point of time in last 15 mins ? Adding cpu can help me to server the
request faster or save system from hang or crashing .
Linux load average (as printed by uptime or top) includes tasks in I/O wait, so it can have very little to do with CPU time that could potentially be used in parallel.
If all the tasks were purely CPU bound, then yes 150 sustained load average would mean that potentially 150 cores could be useful. (But if it's not sustained, then it might just be a temporary long queue that wouldn't get that long if you had better CPU throughput.)
If you're getting crashes, that's a huge problem that isn't explainable by high loads. (Unless it's from the out-of-memory killer kicking in.)
It might help to use vmstat or dstat to see how much CPU time is spent in user/kernel space when your load avg. is building up, or if it's probably mostly I/O.
Or of course you probably know what tasks are running on your machine, and whether one single task is I/O bound or CPU bound on an otherwise-idle machine. I/O throughput usually scales a bit positively with queue depth, except on magnetic hard drives when that turns sequential read/write into seek-heavy workloads.

Does load average affect performance?

For example, first I run a benchmark program when the load average is 0.00,
then, I run some cpu-consuming task to generate some load to 10.00, then kill it.
next, now cpu usage is 0 but load average is 10.00, if I run the benchmark program again, will the load average affect the result?
No, but that doesn't mean your benchmark will run the same.
The answer to your question is no. The load average is a reported value. It is meant to give you an idea of the state of the system, averaged over several periods of time. Since it is averaged, it takes time for it to go back to 0 after a heavy load was placed on the system.
This is just a report, however. Your system isn't really loaded, and the CPU isn't currently taken. A new benchmark you'll run is unaffected by the system's state 5 minutes ago.
With that said, what is true for CPU may not be true for memory. If your loader uses a lot of memory, the kernel might push less used memory into the swap. It will also reduce the amount of memory it has for file cache. Depending on your benchmark, that might affect the benchmark's performance.

How to convert process cpu usage in clock ticks to percentage?

I setup collectd on my Debian 6 virtual machine for monitoring and performance analysis. Collectd's processes plugin provides statistics about a process' cpu usage, though what units these statistics have is not documented anywhere. It's certainly not jiffies or milliseconds, since the total cpu usage of several processes could go as high as 400,000 (of some unknown unit) per second on a 4 core virtual machine.
By looking at collectd's source code (https://github.com/collectd/collectd/blob/master/src/processes.c - in the ps_read_process function) , I figured out this data is read from the /proc/$pid/stat file of the process. The proc man page (link- http://man7.org/linux/man-pages/man5/proc.5.html) says the cpu usage there is measured in clock ticks.
This is nice, but clock ticks are a little arbitrary for monitoring and performance analysis. I'd like to convert the clock ticks value to something more meaningful, ideally percentage of total cpu time. How can I do that in a portable way, without just assuming my processor provides 3GHZ of clock ticks?
Further inspection of collectd's code revealed that the cpu usage is converted to microseconds.
Also, turns out a similar question was already asked and answered here.

What Would Happen if CPU Load Average is High

I read some articles about the CPU load average. They were talking about the definition, the differences between the CPU usage, and the optimal value (roughly equals to the number of cores). They also mentioned that if the number is high, you will be in trouble (waking up at mid-night etc.), but what would actually be happening if the number is high?
For example, I have been running 4, 6 and 8 sessions on a 4 core Linux server. Although the time it took to finish the task were different (4 fasted, 8 slowest), the results seem OK. The CPU load averages were roughly 4, 8 and 10. I understand that 10 might not be a good number, but then what?
It's just that: if you run absurdly high load averages, the overall efficiency will suffer: the CPU processing power will go to waste.
This is caused by several factors; the most immediate being more CPU time needed for scheduling the competing tasks. One not at all insignificant factor is that several competing processes will also overutlize the CPU cache; each task switch effectively throwing out the cache contents and replacing them with new ones. Further choke points come in forms of bottlenecks in memory and storage bandwidths.

Resources