I have a single web app running on a single server. All users use this one app and nothing else. I need to figure out how much memory each instance of httpd takes up. This way I'll know how much ram my new server will need for X users.
the ps -aux command gives me % of memory used. I read online that is % is out of "available memory". What does "available memory" mean to linux?
I found several articles that explain how not to calculate memory usage in linux but I could not find one that would teach how calculate how much memory each httpd needs. Please assist.
The %MEM field in ps is described thus in the ps man page:
%MEM ratio of the process's resident set size to the physical
memory on the machine, expressed as a percentage.
Calculating the memory required by each httpd process is not straightforward - it will highly depend on your webapp itself. httpd processes will also share significant amounts of memory with each other.
The simplest way will be to test. Perform tests with different numbers of users using your webapp simultaneously (eg. 5 users, 10 users, 20 users) and sample the used memory (from the first number on the -/+ buffers/cache: line in the output of the free command). Plot the results, and you should be able to extrapolate to larger numbers of users.
Related
I have been trying to develop an application that would model a system using Graph Theory (See [1]) Graph theory, basically, can be used in order to model runnables in order to figure out their partitions (grouped runnables) and can be used in order to map them to cores.
In order to achieve this, we need many information. Since I dont know about how linux (Raspbian in particular for us) OS schedules everything in detail and I'm interested in finding out how our algorithm will improve core utilization, I thought I could obtain the information of processes and try to model them myself.
For that purpose, we need:
Instruction size, how many instructions CPU runs to complete the task (very important)
Memory needed for the process, physical memory and virtual memory
Core load for debugging the processes.
Read/write accesses, which process is it communicating with, is it a read or write acces, what kind of interface is it, and what is the instruction size and memory needed to read and/or write.
I think I can extract some of these information by using 'top' command in linux. It gives the core load, memory usage, virtual and physical memory. I also think I should mention that I'm intending to use 'taskset' in order to place processes to cores to see their information (See [2]).
Now the first question I have is how do I effectively obtain the instruction sizes, r/w accesses and the things I listed above?
Second question is there any possible way to see runnables of a process, i.e. simple functions it runs. And also their information and r/w accesses with each other? This question is simply about finding out a way to model a process itself, rather than the interactions of processes?
Any help is greately appreciated as it will help our open-source Multi-core platform research.
Thank you very much in advance.
[1] http://math.tut.fi/~ruohonen/GT_English.pdf
[2] To place a process to a core, I use:
pid = $(pgrep -u root -f $process_name -n)
sudo taskset -pc $core $pid &&
echo "Process $process_name with PID=$pid has been placed on core $core"
I submit jobs using headless NetLogo to a HPC server by the following code:
#!/bin/bash
#$ -N r20p
#$ -q all.q
#$ -pe mpi 24
/home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \
--model /home/abhishekb/models/corrected-rk4-20presults.nlogo \
--experiment test \
--table /home/abhishekb/csvresults/corrected-rk4-20presults.csv
Below is the snapshot of a cluster queue using:
qstat -g c
I wish to know can I increase the CQLOAD for my simulations and what does it signify too. I couldn't find an elucidate explanation online.
CPU USAGE CHECK:
qhost -u abhishekb
When I run the behaviour space on my PC through gui assigning high priority to the task makes it use nearly 99% of the CPU which makes it run faster. It uses a greater percentage of CPU processor. I wish to accomplish the same here.
EDIT:
EDIT 2;
A typical HPC environment, is designed to run only one MPI process (or OpenMP thread) per CPU core, which has therefore access to 100% of CPU time, and this cannot be increased further. In contrast, on a classical desktop/server machine, a number of processes compete for CPU time, and it is indeed possible to increase performance of one of them by setting the appropriate priorities with nice.
It appears that CQLOAD, is the mean load average for that computing queue. If you are not using all the CPU cores in it, it is not a useful indicator. Besides, even the load average per core for your runs just translates the efficiency of the code on this HPC cluster. For instance, a value of 0.7 per core, would mean that the code spends 70% of time doing calculations, while the remaining 30% are probably spent waiting to communicate with the other computing nodes (which is also necessary).
Bottom line, the only way you can increase the CPU percentage use on an HPC cluster is by optimising your code. Normally though, people are more concerned about the parallel scaling (i.e. how the time to solution decreases with the number of CPU cores) than with the CPU percentage use.
1. CPU percentage load
I agree with #rth answer regards trying to use linux job priority / renice to increase CPU percentage - it's
almost certain not to work
and, (as you've found)
you're unlikely to be able to do it as you won't have super user priveliges on the nodes (It's pretty unlikely you can even log into the worker nodes - probably only the head node)
The CPU usage of your model as it runs is mainly a function of your code structure - if it runs at 100% CPU locally it will probably run like that on the node during the time its running.
Here are some answers to the more specific parts of your question:
2. CQLOAD
You ask
CQLOAD (what does it mean too?)
The docs for this are hard to find, but you link to the spec of your cluster, which tells us that the scheduling engine for it is Sun's *Grid Engine". Man pages are here (you can access them locally too - in particular typing man qstat)
If you search through for qstat -g c, you will see the outputs described. In particular, the second column (CQLOAD) is described as:
OUTPUT FORMATS
...
an average of the normalized load average of all queue
hosts. In order to reflect each hosts different signifi-
cance the number of configured slots is used as a weight-
ing factor when determining cluster queue load. Please
note that only hosts with a np_load_value are considered
for this value. When queue selection is applied only data
about selected queues is considered in this formula. If
the load value is not available at any of the hosts '-
NA-' is printed instead of the value from the complex
attribute definition.
This means that CQLOAD gives an indication of how utilized the processors are in the queue. Your output screenshot above shows 0.84, so this indicator average load on (in-use) processors in all.q is 84%. This doesn't seem too low.
3. Number of nodes reserved
In a related question, you state colleagues are complaining that your processes are not using enough CPU. I'm not sure what that's based on, but I wonder the real problem here is that you're reserving a lot of nodes (even if just for a short time) for a job that they can see could work with fewer.
You might want to experiment with using fewer nodes (unless your results are very slow) - that is achieved by altering the line #$ -pe mpi 24 - maybe take the number 24 down. You can work out how many nodes you need (roughly) by timing how long 1 model run takes on your computer and then use
N = ((time to run 1 job) * number of runs in experiment) / (time you want the run to take)
So you want to make to make your program run faster on linux by giving it a higher priority than all other processes?
In that case you have to modify something called the program's niceness. This is normally done by invoking the command nice when you first start the program or the command renice while the program is already running. A process can have a niceness from -20 to 19 (inclusive) where lower values give the process a higher priority. Due to security reasons, you can only decrease a processes' niceness if you are the super user (root).
So if you want to make a process run with higher priority then from within bash do
[abhishekb#hpc ~]$ start_process &
[abhishekb#hpc ~]$ jobs -x sudo renice -n -20 -p %+
Or just use the last command and replace the %+ with the process id of the process you want to increase the priority for.
We are seeing occational huge writes to disk in the MongoDB log, effectively locking MongoDB for a long time. Many people are reporting similar issues on the net, but I have found no good answers so far.
Tue Mar 11 09:42:49.818 [DataFileSync] flushing mmaps took 75264ms for 46 files
The average mmap flush on my server is around 100 ms according to the mongo statistics.
A large percentage of our MongDB data is updated within a few hours. This leads me to speculate whether we need to tune the Linux sysctl virtual memory parameters as described in the performance guide for Neo4J, another memory mapped tool: http://docs.neo4j.org/chunked/stable/linux-performance-guide.html
There are a lot of blocks going out to IO, way more than expected for the write speed we
are seeing in the benchmark. Another observation that can be made is that the Linux kernel
has spawned a process called "flush-x:x" (run top) that seems to be consuming a lot of
resources.
The problem here is that the Linux kernel is trying to be smart and write out dirty pages
from the virtual memory. As the benchmark will memory map a 1GB file and do random writes
it is likely that this will result in 1/4 of the memory pages available on the system to
be marked as dirty. The Neo4j kernel is not sending any system calls to the Linux kernel to
write out these pages to disk however the Linux kernel decided to start doing so and it
is a very bad decision. The result is that instead of doing sequential like writes down
to disk (the logical log file) we are now doing random writes writing regions of the
memory mapped file to disk.
TOP shows that we indeed have a flush process that has been running a very long time, so this seems to match.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28352 mongod 20 0 153g 3.2g 3.1g S 3.3 42.3 299:18.36 mongod
3678 root 20 0 0 0 0 S 0.3 0.0 26:27.88 flush-253:1
The recommended Neo4J sysctl settings are
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
Does these settings have any relevance for a MongoDB installation at all?
The short answer is "yes". What values to choose depends very much on your write patterns. This gives background on exactly how MongoDB manages its mappings - it's not anything unexpected.
One wrinkle is that in a web-facing database application, you may care about latency more than throughput. vm.dirty_background_ratio gives the threshold for starting to write dirty pages, and vm.dirty_ratio tells when to stop accepting new writes (ie, block) until all writes have been flushed.
If you are hammering a relatively small working set, you can be OK with setting both of those values fairly high, and relying on Mongo's (or the OS's) periodic time-based flush-to-disk to commit the writes.
If you're conducting a high volume of inserts and also some modifications, which sounds like it might be your situation, it's a balancing act that depends on inserts vs. rewrites - starting to flush too early will cause writes that will be re-written soon, "wasting" io. Starting to flush too late will result in pauses as you flush huge writes.
If you're doing mostly inserts, then you may very well want a large dirty_ratio (to avoid blocking) and a relatively small dirty_background_ratio (small enough to always be writing as you're inserting to reduce latency, and just large enough to linearize some of the writes).
The correct solution is to replay some dummy data with various options for those sysctl parameters, and optimize it by brute force, bearing in mind your average latency / total throughput objectives.
We have multiple servers in our lab and I tried to determine which one has more resources currently available. I tried to interpret the information htop displays but I'm not 100% understanding all those numbers.
I have taken a screen shot for each server after issuing htop:
Server #1:
Server #2:
Does server #1 have more memory available than server #2? Should I look at Avg or Mem? Or what other parameter should I look at?
Thanks!
htop author here.
Does server #1 have more memory available than server #2?
Yes.
From the htop faq:
The memory meter in htop says a low number, such as 9%, when top shows something like 90%! (Or: the MEM% number is low, but the bar looks almost full. What's going on?)
The number showed by the memory meter is the total memory used by processes. The additional available memory is used by the Linux kernel for buffering and disk cache, so in total almost the entire memory is in use by the kernel. I believe the number displayed by htop is a more meaningful metric of resources used: the number corresponds to the green bars; the blue and brown bars correspond to buffers and cache, respectively (as explained in the Help screen accessible through the F1 key). Numeric data about these is also available when configuring the memory meter to display as text (in the Setup screen, F2).
Hope that clears things up! Cheers!
So, the title describes almost all the necessary to answer me. Just one more thing: please, just reply about libraries installed with Python by default, as the app which I'm developing is part of the Ubuntu App Showdown.
Running Python 2.7, Ubuntu 12.04.
You are asking for a number that is nearly impossible to calculate and has very little value.
Any Linux system that is running for an amount of time will have hardly any 'free' ram available. Just cat /proc/meminfo - the MemFree entry is usually in order of just a few megabytes.
So, where did that memory go?
The kernel caches all disk access, for starters.
That's usually visible in the Cached entry. Disk cache will be pruned when you require more memory, so you could add that number to MemFree .
But, if an application allocates (malloc() in C) 2 gigabytes on a system with exactly 2 gigabytes of RAM, that usually will just be granted: you get a valid pointer back.
However, none of the RAM is actually reserved for your application - that only happens when your application starts touching memory pages - each touched page will be allocated.
The maximum size you can ask for is available as CommitLimit.
But the application code itself might not be in RAM either - binary file and libraries are mmapp()ed, so again only pages that are touched are loaded into RAM.
If you run a tool like top - you get all kinds of memory info per process, including VIRT, RES and SHR.
VIRT is for 'virtual' - all memory pages that the app would need if it would claim all pages it has asked for.
RES is 'resident' - the amount of memory actually used
SHR is 'shared' - the amount of pages that are shared with other applications, like e.g. libraries that are loaded in multiple applications.
So, what is the value of knowing how much memory is available?
You can start an application that could require significantly more RAM than your system has, and yet it runs...
You might even be able to run the application twice or thrice - code pages are shared anyway...
Note: the above answer cuts quite a few corners, the real mechanisms are significantly more complex. And I haven't even started bringing swap space into the story.
But this will do for you, I hope...