Understanding cpu shares in cloud foundry - linux

According to this documentation, CPU shares is calculated as
process_cpu.shares = min( 1024*(application_memory / 8 GB), 1024)
According to this formula, if an application is assigned 1GB memory, then it should get 128 CPU shares. 1024*(1/8). However, if we SSH into the application and check the cpu.shares we get 122
cat /sys/fs/cgroup/cpu/cpu.shares
122
Here are the observations:
Memory
Calculated cpu.share
Observed cpu.share
Difference
1GB
128
122
~5% difference
1.5GB/1536MB
192
184
~5% difference
2GB
256
256
3GB
384
384
4GB
512
512
5GB
640
634
~1% difference
5.5GB/5632MB
704
696
~2% difference
8GB
1024
1024
Why is it that for some values there is this discrepancy (Such as 1G,1.5G,5G etc..) while others (like 2,3,4 - 6,7,8) are consistent with the calculation. I believe that I am missing something from Cgroups perspective for this calculation. Is this specific to CF or is it something to do with the way Linux calculates resource allocation in Cgroups in general? Is there a headroom always reserved?

Related

Linux Huge pages memory usage calculation

I read the article about Linux Huge pages technology and misunderstood some important detail.
Here is the phrase:
For example, if you use HugePages with 64-bit hardware, and you want
to map 256 MB of memory, you may need one page table entry (PTE). If
you do not use HugePages, and you want to map 256 MB of memory, then
you must have 256 MB * 1024 KB/4 KB = 65536 PTEs.
I don't understand what is 1024 KB in this formula. I think it should be just 256 MB / 4 KB to calculate the number of table entries. Is there a typo in formula or am I wrong?
I agree that it is confusing. After reading it several times, I believe that it is as simple as a matter of unit conversion. At school the mathematics/physics/chemistry teachers always told us to use the same units when doing operations in order to obtain coherent results.
The value 256 is expressed in megabytes (MB). To divide it by 4 expressed in kilo-bytes (KB), you need to convert it into kilo-bytes. Hence, the multiplication by 1024KB (= 1MB). So, literally the operation is: (256 x 1024) / 4 = 65536 which is the simplification of: (256 x 1024 x 1024) / (4 x 1024)

Linux shmmax and shmall - how to set correct unit?

I have a server which has 16 GB memory.
Now I need to set my shmmax and shmall, because the server default is (checked with ipcs -l)
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509465599
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 32000
max semaphores per array = 32000
max semaphores system wide = 1024000000
max ops per semop call = 500
semaphore max value = 32767
It seems terrible, shmall and shmmax is bigger than my 16 GB.
So I want to change the setting to
shmmax -> 16GB/4
shmall -> 16GB/2
But I can't be sure what unit I have set
shmmax --> 4420960256
shmall --> 8620960256
But is the unit for my number? byte or KB?
Because ipcs -l is showing KB....
echo "kernel.shmmax=4420960256" >> /etc/sysctl.conf
echo 4420960256> /proc/sys/kernel/shmmax
echo "kernel.shmall=8620960256" >> /etc/sysctl.conf
echo 8620960256> /proc/sys/kernel/shmall
thanks for help, but the postgresql just crash and get killed by yesterday, it shows :
This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 4420960256 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
my setting =>
shared_buffers = 4GB
effective_cache_size = 12GB
Just use:
lsipc
On my Ubuntu 16.04 LTS I get:
RESOURCE DESCRIPTION LIMIT USED USE%
MSGMNI Number of message queues 32000 0 0.00%
MSGMAX Max size of message (bytes) 8192 - -
MSGMNB Default max size of queue (bytes) 16384 - -
SHMMNI Shared memory segments 4096 20 0.49%
SHMALL Shared memory pages 2097152 4915 0.23%
SHMMAX Max size of shared memory segment (bytes) 4294967296 - -
SHMMIN Min size of shared memory segment (bytes) 1 - -
SEMMNI Number of semaphore identifiers 128 0 0.00%
SEMMNS Total number of semaphores 32000 0 0.00%
SEMMSL Max semaphores per semaphore set. 250 - -
SEMOPM Max number of operations per semop(2) 100 - -
SEMVMX Semaphore max value 32767 - -
which clearly states the measure unit for the values I have specified in /etc/sysctl.conf. So for me SHMMAX is in bytes while SHMALL in pages (see getconf PAGE_SIZE).
Just leave the setting the way it is – essentially, that means “unlimited” in your case. One less limit you could bang your head against!
The amount of shared memory allocated by PostgreSQL is fixed and mostly determined by shared_buffers. Just make sure you don't set that to exceed your RAM (4GB would be perfect), and there is no danger whatsoever.
For the record: experimentation on my system shows that the unit of kernel.shmmax is bytes, while the unit of kernel.shmall is memory pages (check getconf PAGESIZE).

File sizes are reported differently

Why are file sizes all different?
In Windows 10 I can see all of these sizes:
11,116 KB
10.8 MB
11,382,240 Bytes
11,382,784 Bytes
If I use the Console Window:
D:\My Programs\2017\MeetSchedAssist\Inno\Output>dir *.exe
Volume in drive D is DATA
Volume Serial Number is A8B0-A5C6
Directory of D:\My Programs\2017\MeetSchedAssist\Inno\Output
03/04/2018 08:50 11,382,240 MeetSchedAssistSetup.exe
1 File(s) 11,382,240 bytes
0 Dir(s) 719,837,487,104 bytes free
D:\My Programs\2017\MeetSchedAssist\Inno\Output>
I understand that perhaps on the physical media it has to round it to physically take a certain amount of space, but that line above:
Size: 10.8 MB (11,382,240 bytes)
Huh? Why does it not say 11.38 MB?
Once upon a time it has been defined that
1 kB = 1024 B
1 MB = 1024 kB
If you divide your bytes figure all the way down to MB, you'll get all those figures.
Now that they noticed that many people tend to walk into that trap, they have redefined the unit multiples and defined new ones
1 kiB = 1024 B
1 MiB = 1024 kiB
1 kB = 1000 B
1 MB = 1000 kB
but this scheme is not so widespread (seems to be more common with total size specs of storage media).
Funny sidenote: I guess I am not the only one who has learned it the old way and now mixes it up with the current definition all the time. I'd say problems like this are the root cause for humanity being mostly conservatively oriented.

/etc/sysctl.conf shmall shmmax calculation Oracle DB

I have a question about what COULD be the possible impact to a system, if we have the following kernel shmmax and shmall kernel settings:
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 135456844800
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 107374182400
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
# free -mt
total used free shared buffers cached
Mem: 258065 230766 27299 11935 1481 58723
-/+ buffers/cache: 170561 87504
Swap: 16383 52 16331
Total: 274449 230818 43631
On that specific server we run only Oracle DB, are the settings correct and if not what could be the impact? Thanks a lot in advance!

How to check usage of different parts of memory?

I have computer with 2 Intel Xeon CPUs and 48 GB of RAM. RAM is divided between CPUs - two parts 24 GB + 24 GB. How can I check how much of each specific part is used?
So, I need something like htop, which shows how fully each core is used (see this example), but rather for memory than for cores. Or something that would specify which part (addresses) of memory are used and which are not.
The information is in /proc/zoneinfo, contains very similar information to /proc/vmstat except broken down by "Node" (Numa ID). I don't have a NUMA system here to test it for you and provide a sample output for a multi-node config; it looks like this on a one-node machine:
Node 0, zone DMA
pages free 2122
min 16
low 20
high 24
scanned 0
spanned 4096
present 3963
[ ... followed by /proc/vmstat-like nr_* values ]
Node 0, zone Normal
pages free 17899
min 932
low 1165
high 1398
scanned 0
spanned 223230
present 221486
nr_free_pages 17899
nr_inactive_anon 3028
nr_active_anon 0
nr_inactive_file 48744
nr_active_file 118142
nr_unevictable 0
nr_mlock 0
nr_anon_pages 2956
nr_mapped 96
nr_file_pages 166957
[ ... more of those ... ]
Node 0, zone HighMem
pages free 5177
min 128
low 435
high 743
scanned 0
spanned 294547
present 292245
[ ... ]
I.e. a small statistic on the usage/availability total followed by the nr_* values also found on a system-global level in /proc/vmstat (which then allow a further breakdown as of what exactly the memory is used for).
If you have more than one memory node, aka NUMA, you'll see these zones for all nodes.
edit
I'm not aware of a frontend for this (i.e. a numa vmstat like htop is a numa-top), but please comment if anyone knows one !
The numactl --hardware command will give you a short answer like this:
node 0 cpus: 0 1 2 3 4 5
node 0 size: 49140 MB
node 0 free: 25293 MB
node 1 cpus: 6 7 8 9 10 11
node 1 size: 49152 MB
node 1 free: 20758 MB
node distances:
node 0 1
0: 10 21
1: 21 10

Resources