Linux Webserver - htop shows extreme cpu usage? - linux

I am a bit confused about what the tool "htop" shows as cpu usage and average load. I was asked to have a look at a webserver which is performing incredibly slow.
I googled a bit and always found the statement that everyting above 1.00 in average load is terrible when you only have one cpu in the machine.
However, my "htop" experience looks like this:
htop screenshot
Can someone please tell me what exactly is going on here? Is this bad or do I misunderstand everything?
Thank you for your help.

In your screenshot the CPU usage bars are colored in green and red. Press '?' in htop for a help screen to show up. From there you will see that green color is for a normal priority userspace applications CPU usage and the red color is for kernel threads.
Basically, in your screenshot all the CPU cores are 100% busy and most of the time they spend in the kernel.
Yes, this is bad. Further investigation is needed to tell what exactly is going on here.

The htop screenshot is showing you each of the cores of the CPU and the usage for each. What you really want to be looking at are the processes and how much CPU they are consuming.
There's an article here which explains it in more detail: http://www.thegeekstuff.com/2011/09/linux-htop-examples
Goodluck!

Related

Why would free be reporting a freakishly high swap used value?

I have seen twice recently RHEL 6 boxes showing a swap used value, reported by 'free', of something in the 10^15 bytes range. This is, of course, far in excess of what is allocated. Further, the '-/+ buffers/cache' line shows around 3 GB free. Both of these machines subsequently became unstable and had to be rebooted.
Does anyone have any ideas as to what might cause this? Someone told me that this could be indicative of a memory leak, but I can not find any supporting information online.
Are the systems running kernel-2.6.32-573.1.1.el6.x86_64?
Could be this bug:
access.redhat.com/solutions/1571043
Linux moves unused memory pages to swap (see linux swappiness), so if you still have free memory (considering buffers/cache) you are good to go.
You probably have a process that is not used much and it was swapped out.

Linux process virtual memory column in TOP/HTOP

I wrote a C++ process which is running inside a VMWare machine with a 512Mb of assigned RAM.
I see by TOP/HTOP that the VIRT column shows a value of 490Mb. Instead other processes show few Kbytes for the same field.
Do you know why? Have I to setup something for my process?
Thank you very much!
Virt really doesn't matter, use -a with the resident size. Virt will show even pages that have been swapped out, and I think it's probably useless for what you're trying to figure out.
Here is a good explanation that I'm going to see and learn...
Edit (2016-04-07): I've just seen it, and it is brilliant! Please look at /proc//smaps to know how physical ram is used by your process.
Edit (2016-04-08): I'm going deeper into the problem, and I discovered that each time I create a thread the process increases the used VIRT. I have seen also that all other linux processes with threads allocate much of VIRT memory size, so I think it is absolutely normal!

Understanding cpu frequency, thread selection and more

With a 1270v3 and a single thread app I'm at the end of performance but when I watch monitoring tools like atop I don't understand how this whole stuff works. I tried to find a nice article about this sort of topic but they either have been explained in a language I don't understand or are not about the stuff I would like to know. I hope it is alright to ask this kind of stuff here.
From my understanding a single-thread app does only use one thread for all/most of the work. So the performance is defined by the single-thread power of the CPU.
A moment before I wrote this question I played around with CPU-frequency and noticed that although there are only two instances of the app running the usage is shared across all cores.
So I assume that the thread jumps around between these cores.
So I set the CPU scaling to performance with cpufreq-set -g performance. The result was that all CPU cores/threads stayed at about 2GHz like it was before besides one that is permanently on 3.5GHz (100%). As I only changed the scaling for one core, why is the usage still shared across all cores? I mean the app is running at about 300%, why doesn't it stick to the CPU core with the 100%?
Furthermore as I noticed that only one of the CPU's got scaled up I looked into the help page and found -r which should scale all cores with the performance settings. Unfortunately nothing does change. (Is this a bug in Ubuntu 1404?) So I used -c with the number 8 (8 threads) -> didn't work. 4 -> works but only scales 2 cores out of 8. 7 -> scaled 4 cores. So I'm wondering, does this not support hyper-threading or is the whole program that buggy?
However as I understand it, the CPU's with the max frequency together with the thread jump around in the monitoring tools as they display the average the usage, which than looks like shared. Did I figure this right?
Would forcing one cpu to 3.5GHz and forcing the app to this core improve performance or is all the stuff I'm wondering about only about avg calculation between the data they show each second.
If so am I right that I should run best with cpufreq-set -c 7 -g performance if power consumption doesn't matter?
Thanks for reading so far, I hope you have a moment to help me understand the whole thing.
Atop example screenshots:
http://i.imgur.com/VFEBvLx.png
http://i.imgur.com/cBKOnJM.png
http://i.imgur.com/bgQfwfU.png
I believe a lot of your confusion has to do with the fuzzy mapping of the capabilities of cpufreq to the actual capabilities of the hardware.
Here’s a description of what is taking place on the HW and in the OS.
A processor is a collection of cores on the same silicon substrate. The cores are what we used to call CPUs with some enhancements. CPUs now have the capability of running multiple HW threads (hyperthreading), each hardware thread being equivalent to one of the old type CPUs. Putting this all together, the 1270v3 is a quad core (if I recall correctly), meaning it has 4 cores on the same silicon substrate. Each core can support two HW threads, each HW thread being equivalent to what the OS calls a CPU (and I’ll call a virtual CPU). So from the OS perspective, the 1270v3 has 8 (virtual) CPUs.
The OS doesn’t see packages, cores or HW threads. It sees CPUs, and to it there appear to be 8 of them.
To further complicate the issue, modern processors have various HW supporting power saving states called P-states, C-states and package C-states. Why do I mention these? It’s because they are intimately associated with the frequency of the processor. And cpufreq professes to provide the user with control over the processor’s frequency.
Now, I’m not familiar with cpufreq outside of reading the manpage and other material on the web. From my reading, it has a lot of idiosyncrasies, so I’ll talk about what it is doing from a broad perspective.
In a general sense, cpufreq has a lot of generic capability that may or may not be supported by the HW or the kernel. This is confusing because it looks like the functionality is there but then things don’t happen as you would expect. For example, cpufreq gives the impression that you can set each CPU’s frequency independently. In reality, on a hyperthreaded system, two “CPUs” are associated with each core and must have the same frequency.
A lot of the functionality that cpufreq is supposed to control is associated with the power efficiency characteristics of the processor, but again, its mapping to the processor’s actual hardware capabilities is incomplete and misleading. Though cpufreq seems to allow you to set max and min frequencies, the processor hardware doesn’t work this way. In modern Intel processors, such as the 1270v3, there are something called P-states. These P-states are frequency-voltage pairs that slow down a processor’s frequency (and drop its voltage) to reduce the processor’s power consumption at the cost of the application taking longer to run. These frequency-voltage pairings aren’t arbitrary though cpufreq gives the impression that they are.
What does this all mean? In addition to the thread migration issues that the commenter mentioned, cpufreq isn’t going to behave the way you expect because it appears to have capability that it actually doesn’t, and the capability that it does actually have maps only roughly to the actual capabilities of the HW and OS.
I embedded some further comments in your text.
With a 1270v3 and a single thread app I'm at the end of performance but when I watch monitoring tools like atop I don't understand how this whole stuff works. I tried to find a nice article about this sort of topic but they either have been explained in a language I don't understand or are not about the stuff I would like to know. I hope it is alright to ask this kind of stuff here.
From my understanding a single-thread app does only use one thread for all/most of the work. [Yes, but this doesn’t mean that the thread is locked to a specific virtual CPU or core.] So the performance is defined by the single-thread power of the CPU. [It’s not that simple. The OS migrates threads around, it has its own maintenance processes, etc] A moment before I wrote this question I played around with CPU-frequency and noticed that although there are only two instances of the app running the usage is shared across all cores. So I assume that the thread jumps around between these cores. So I set the CPU scaling to performance with cpufreq-set -g performance. The result was that all CPU cores/threads stayed at about 2GHz like it was before besides one that is permanently on 3.5GHz (100%). As I only changed the scaling for one core, why is the usage still shared across all cores? I mean the app is running at about 300%, why doesn't it stick to the CPU core with the 100%? [Since I can’t see what you are observing, I don’t really understand what you are asking.]
Furthermore as I noticed that only one of the CPU's got scaled up I looked into the help page and found -r which should scale all cores with the performance settings. Unfortunately nothing does change. (Is this a bug in Ubuntu 1404?) So I used -c with the number 8 (8 threads) -> didn't work. 4 -> works but only scales 2 cores out of 8. 7 -> scaled 4 cores. [I haven’t used cpufreq so can’t directly speak to its behavior, but the manpage implies that “-c ” refers to a specific virtual CPU and not the number of virtual CPUs.] So I'm wondering, does this not support hyper-threading or is the whole program that buggy? [Again, I’m not sure from your explanation what you are doing, but the n->n/2 behavior makes sense to me. You can change the frequency of a core but since each core has two hyperthreads/virtual CPUs, two of those virtual CPUs must scale together.]
However as I understand it, the CPU's with the max frequency together with the thread jump around in the monitoring tools as they display the average the usage, which than looks like shared. Did I figure this right? [Again, I’m not sure what you are observing. Both physically and in atop, the CPU designation should not change, meaning CPU001 will always refer to the same virtual CPU. The core with the max frequency shouldn’t physically jump around, though the user thread may. Something to note is that monitoring tools can be pretty heavy users of the CPU. This heavy usage can make figuring out your processor usage difficult if it causes threads to jump around to different virtual CPUs.]
Would forcing one cpu to 3.5GHz and forcing the app to this core improve performance or is all the stuff I'm wondering about only about avg calculation between the data they show each second. [I found a pretty good explanation of atop with a lot of helpful screen shots: http://www.unixmen.com/linux-basics-monitor-system-resources-processes-using-atop/] If so am I right that I should run best with cpufreq-set -c 7 -g performance if power consumption doesn't matter? [It all depends upon what other processes are running on your system. If your system is mostly idle except for your processes, then forcing a core to a certain frequency won’t make a difference. [I’m suspicious of what a “governor” does. The language appears to refer to power-efficiency/performance (“balanced”, “powersave”, “performance”, etc) but the details don’t match the capability of today’s hardware.]
Thanks for reading so far, I hope you have a moment to help me

How do I interpret the memory usage information from htop

We have multiple servers in our lab and I tried to determine which one has more resources currently available. I tried to interpret the information htop displays but I'm not 100% understanding all those numbers.
I have taken a screen shot for each server after issuing htop:
Server #1:
Server #2:
Does server #1 have more memory available than server #2? Should I look at Avg or Mem? Or what other parameter should I look at?
Thanks!
htop author here.
Does server #1 have more memory available than server #2?
Yes.
From the htop faq:
The memory meter in htop says a low number, such as 9%, when top shows something like 90%! (Or: the MEM% number is low, but the bar looks almost full. What's going on?)
The number showed by the memory meter is the total memory used by processes. The additional available memory is used by the Linux kernel for buffering and disk cache, so in total almost the entire memory is in use by the kernel. I believe the number displayed by htop is a more meaningful metric of resources used: the number corresponds to the green bars; the blue and brown bars correspond to buffers and cache, respectively (as explained in the Help screen accessible through the F1 key). Numeric data about these is also available when configuring the memory meter to display as text (in the Setup screen, F2).
Hope that clears things up! Cheers!

PerfMon and w3wp?

I'm looking at "Process:w3wp*:% Processor Time" in PerfMon and am struggling to follow something. I have traces running for w3wp and then w3wp#1 - w3wp#6, which are the six sites running on the server.
w3wp's trace doesn't appear to be related to the total of #1-#6 ?
e.g.
'#1 can have a %Processor higher than w3wp, and conversely w3wp can have near 100% when ALL the other %'s are very low.
I'm trying to find a performance bottleneck in our server and the obvious one is that the CPU tops out. We are going to add another CPU (as it's on VM) but I'd like to try to understand what I am looking at...and what can be done to alleviate the issue?
Why is w3wp often close to 100% even though the individual sites are very low? What might be causing w3wp to be so high if its not a particluar site?
ps. If anyone has a way I can save an image here I can post the graph. TY
pps. IIS7 on Win2008.
Answered by Nick Craver in this thread: .net performance counter - Process(w3wp)\% Processor Time

Resources