We have multiple servers in our lab and I tried to determine which one has more resources currently available. I tried to interpret the information htop displays but I'm not 100% understanding all those numbers.
I have taken a screen shot for each server after issuing htop:
Server #1:
Server #2:
Does server #1 have more memory available than server #2? Should I look at Avg or Mem? Or what other parameter should I look at?
Thanks!
htop author here.
Does server #1 have more memory available than server #2?
Yes.
From the htop faq:
The memory meter in htop says a low number, such as 9%, when top shows something like 90%! (Or: the MEM% number is low, but the bar looks almost full. What's going on?)
The number showed by the memory meter is the total memory used by processes. The additional available memory is used by the Linux kernel for buffering and disk cache, so in total almost the entire memory is in use by the kernel. I believe the number displayed by htop is a more meaningful metric of resources used: the number corresponds to the green bars; the blue and brown bars correspond to buffers and cache, respectively (as explained in the Help screen accessible through the F1 key). Numeric data about these is also available when configuring the memory meter to display as text (in the Setup screen, F2).
Hope that clears things up! Cheers!
Related
I am using Ubuntu to run some calculation, so I'd like to check the memory usage during the calculation. But the information from gnome-system-monitor and psensors is different.
As shown in the following screenshot, in the gnome-system-monitor, only 30.4% of memory is used, but in the psensors window, only 13% of memory is still free to use.
My question is:
Which one is right?
"30.4% of memory is used", it's implies that your computer RAM used 30.4%. You will get same result on another system monitor like bpytop(https://snapcraft.io/install/bpytop/ubuntu).
I have an linux based embedded device on which I am running a QT GUI application as well as a second application controlling some hardware. The two communicate with each other via TCP.
I have recently run a system test where I stimulate the QT application using Squish for an entire week. At the start and end of my test I extract the smap and pmap files for each of my two processes. Likewise I extract the meminfo file.
How might I compare the before and after files to get a rough idea as to whether I have a memory leak problem for the device as a whole? Also, if a leak were detected, how might I make a rough, rough estimate as to when the device will stop functioning correctly?
How might I compare the before and after files to get a rough idea as
to whether I have a memory leak problem for the device as a whole?
Answer:
I think you might need to know how much memory device driver used in kernel space so slab info is the place which it is worthwhile to look into.
Initially,you can do(running a script):
cat /proc/meminfo | grep -i Slab
to monitor the value(any increase after long time running).
If it does, you can compare the leaked one with the one before leaking with the help of cat /proc/slabinfo by looking into each entry.
If you familiar the name of each entry, (if you are sure it must be a leak in this driver and have a rough idea of how it allocate memory) you can do a reasonable guess.
for example
size-4194304 0 0 4194304 1
kmem_cache 136 180 128 30
if "kmem_cache entry" is leaking, then your driver might call kmem_cache_create and kmem_cache_alloc() but not free
if it is "size-4194304", you driver might call get_zeroed_page or _ _get_free_pages but not free.
how might I make a rough, rough estimate as to when the device will
stop functioning correctly
Answer:
if your user space doesn't leak any memory and the leak only happens in your driver, you can use "free" command to get how much free system memory is available.
Then you can roughly calculate how long it will be used up:
(Total free memory(kbyte)/memory leak_speed(kbyte/s)) = number of seconds.
,however kernel will complain before that.
I have a single web app running on a single server. All users use this one app and nothing else. I need to figure out how much memory each instance of httpd takes up. This way I'll know how much ram my new server will need for X users.
the ps -aux command gives me % of memory used. I read online that is % is out of "available memory". What does "available memory" mean to linux?
I found several articles that explain how not to calculate memory usage in linux but I could not find one that would teach how calculate how much memory each httpd needs. Please assist.
The %MEM field in ps is described thus in the ps man page:
%MEM ratio of the process's resident set size to the physical
memory on the machine, expressed as a percentage.
Calculating the memory required by each httpd process is not straightforward - it will highly depend on your webapp itself. httpd processes will also share significant amounts of memory with each other.
The simplest way will be to test. Perform tests with different numbers of users using your webapp simultaneously (eg. 5 users, 10 users, 20 users) and sample the used memory (from the first number on the -/+ buffers/cache: line in the output of the free command). Plot the results, and you should be able to extrapolate to larger numbers of users.
I have been running a crypto-intensive application that was generating pseudo-random strings, with special structure and mathematical requirements. It has generated around 1.7 million voucher numbers per node in over the last 8 days. The generation process was CPU intensive, with very low memory requirements.
Mnesia running on OTP-14B02 was the storage database and the generation was done within each virtual machine. I had 3 nodes in the cluster with all mnesia tables disc_only_copies type. Suddenly, as activity on the Solaris boxes increased (other users logged on remotely and were starting webservers, ftp sessions, and other tasks), my bash shell started reporting afork: not enough space error.
My erlang Vms also, went down with this error below:
Crash dump was written to: erl_crash.dump
temp_alloc: Cannot reallocate 8388608 bytes of memory (of type "root_set").
Usually, we get memory allocation errors and not memory re-location errors and normally memory of type "heap" is the problem. This time, the memory type reported is type "root-set".
Qn 1. What is this "root-set" memory?
Qn 2. Has it got to do with CPU intensive activities ? (why am asking this is that when i start the task, the Machine reponse to say mouse or Keyboard interrupts is too slow meaning either CPU is too busy or its some other problem i cannot explain for now)
Qn 3. Can such an error be avoided? and how ?
The fork: not enough space message suggests this is a problem with the operating system setup, but:
Q1 - The Root Set
The Root Set is what the garbage collector uses as a starting point when it searches for data that is live in the heap. It usually starts off from the registers of the VM and off from the stack, if the stack has references to heap data that still needs to be live. There may be other roots in Erlang I am not aware of, but these are the basic stuff you start off from.
That it is a reallocation error of exactly 8 Megabyte space could mean one of two things. Either you don't have 8 Megabyte free in the heap, or that the heap is fragmented beyond recognition, so while there are 8 megabytes in it, there are no contiguous such space.
Q2 - CPU activity impact
The problem has nothing to do with the CPU per se. You are running out of memory. A large root set could indicate that you have some very deep recursions going on where you keep around a lot of pointers to data. You may be able to rewrite the code such that it is tail-calling and uses less memory while operating.
You should be more worried about the slow response times from the keyboard and mouse. That could indicate something is not right. Does a vmstat 1, a sysstat, a htop, a dstat or similar show anything odd while the process is running? You are also on the hunt to figure out if the kernel or the C libc is doing something odd here due to memory being constrained.
Q3 - How to fix
I don't know how to fix it without knowing more about what the application is doing. Since you have a crash dump, your first instinct should be to take the crash dump viewer and look at the dump. The goal is to find a process using a lot of memory, or one that has a deep stack. From there on, you can seek to limit the amount of memory that process is using. either by rewriting the code so it can give memory up earlier, by tuning the garbage collection setup for the process (see the spawn options in the erlang man pages for this), or by adding more memory to the system.
Could somebody please tell me what is "Memory Page Out Rate".
I have seen this in "HP Open View" server monitoring tool and tried googling it.
Would appreciate if some expert can clarify.
If page out rate is too high as 200+ per second, can it crash the server?
Thanks in advance
This link may help:
http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/index.jsp?topic=/com.ibm.itm.doc/main_unix65.htm
"Page Out Rate (KB per Second) The number of kilobytes that the virtual memory manager pages out per second averaged over the previous 30-second interval. Note: the value -1 indicates Not Available, -2 indicates Not Collected, 2147483648 indicates Value_Exceeds_Minimum, and -2147483647 indicates Value_Exceeds_Maximum."
A page-out rate of 200kb/s could be fine on some systems and service affecting on others. It all depends on how fast your disks/san can keep up.
To be honest you'd be better off asking this question in one of these more suitable sites, SO is for programming related questions, not sysadmin queries:
https://stackoverflow.com/questions/321618/where-can-i-ask-questions-that-arent-programming-questions
What you are referring to is the GBL_MEM_PAGEOUT variable of HP OVPM.
To my understanding it means that some process allocated memory and never de-allocated it, and that process is not currently active.
So basically it should be as low as possible.
However, according to the HP OVPM Metrics document, the definition is this:
GBL_MEM_PAGEOUT
PLATFORMS: HP-UX AIX NCR Sinix DEC SunOS Win3X/95
HP-UX AIX NCR Sinix DEC
The total number of page outs to the disk
during the interval. This includes pages paged out to paging space and
to the file system.
HP-UX
This metric is available on HP-UX 11.0 and beyond.
AIX
This is the same as the “page outs” value from the “vmstat -s”
command. Remember that “vmstat -s” reports cumulative counts.
This metric cannot be compared to the “po” value from the “vmstat”
command. The “po” value only reports the number of pages paged out to
paging space.
To determine the count (that is, GBL_MEM_PAGEOUT) for the current
interval, subtract the previous value from the current value. To
determine the rate (that is, GBL_MEM_PAGEOUT_RATE) for the current
interval, subtract the previous value from the current value and then
divide by the length of the interval. Keep in mind that whenever any
comparisons are made with other tools, both tools must be interval
synchronized with each other in order to be valid.
HP-UX SunOS
The total number of page outs to the disk during the
interval. This includes pages paged out to paging space and to the
file system.
Win3x/95
The total number of page outs to the disk during the interval
(Windows 95 only).