Linux shmmax and shmall - how to set correct unit? - linux

I have a server which has 16 GB memory.
Now I need to set my shmmax and shmall, because the server default is (checked with ipcs -l)
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509465599
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 32000
max semaphores per array = 32000
max semaphores system wide = 1024000000
max ops per semop call = 500
semaphore max value = 32767
It seems terrible, shmall and shmmax is bigger than my 16 GB.
So I want to change the setting to
shmmax -> 16GB/4
shmall -> 16GB/2
But I can't be sure what unit I have set
shmmax --> 4420960256
shmall --> 8620960256
But is the unit for my number? byte or KB?
Because ipcs -l is showing KB....
echo "kernel.shmmax=4420960256" >> /etc/sysctl.conf
echo 4420960256> /proc/sys/kernel/shmmax
echo "kernel.shmall=8620960256" >> /etc/sysctl.conf
echo 8620960256> /proc/sys/kernel/shmall
thanks for help, but the postgresql just crash and get killed by yesterday, it shows :
This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 4420960256 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
my setting =>
shared_buffers = 4GB
effective_cache_size = 12GB

Just use:
lsipc
On my Ubuntu 16.04 LTS I get:
RESOURCE DESCRIPTION LIMIT USED USE%
MSGMNI Number of message queues 32000 0 0.00%
MSGMAX Max size of message (bytes) 8192 - -
MSGMNB Default max size of queue (bytes) 16384 - -
SHMMNI Shared memory segments 4096 20 0.49%
SHMALL Shared memory pages 2097152 4915 0.23%
SHMMAX Max size of shared memory segment (bytes) 4294967296 - -
SHMMIN Min size of shared memory segment (bytes) 1 - -
SEMMNI Number of semaphore identifiers 128 0 0.00%
SEMMNS Total number of semaphores 32000 0 0.00%
SEMMSL Max semaphores per semaphore set. 250 - -
SEMOPM Max number of operations per semop(2) 100 - -
SEMVMX Semaphore max value 32767 - -
which clearly states the measure unit for the values I have specified in /etc/sysctl.conf. So for me SHMMAX is in bytes while SHMALL in pages (see getconf PAGE_SIZE).

Just leave the setting the way it is – essentially, that means “unlimited” in your case. One less limit you could bang your head against!
The amount of shared memory allocated by PostgreSQL is fixed and mostly determined by shared_buffers. Just make sure you don't set that to exceed your RAM (4GB would be perfect), and there is no danger whatsoever.
For the record: experimentation on my system shows that the unit of kernel.shmmax is bytes, while the unit of kernel.shmall is memory pages (check getconf PAGESIZE).

Related

Understanding cpu shares in cloud foundry

According to this documentation, CPU shares is calculated as
process_cpu.shares = min( 1024*(application_memory / 8 GB), 1024)
According to this formula, if an application is assigned 1GB memory, then it should get 128 CPU shares. 1024*(1/8). However, if we SSH into the application and check the cpu.shares we get 122
cat /sys/fs/cgroup/cpu/cpu.shares
122
Here are the observations:
Memory
Calculated cpu.share
Observed cpu.share
Difference
1GB
128
122
~5% difference
1.5GB/1536MB
192
184
~5% difference
2GB
256
256
3GB
384
384
4GB
512
512
5GB
640
634
~1% difference
5.5GB/5632MB
704
696
~2% difference
8GB
1024
1024
Why is it that for some values there is this discrepancy (Such as 1G,1.5G,5G etc..) while others (like 2,3,4 - 6,7,8) are consistent with the calculation. I believe that I am missing something from Cgroups perspective for this calculation. Is this specific to CF or is it something to do with the way Linux calculates resource allocation in Cgroups in general? Is there a headroom always reserved?

How to increase number of child proceses

cat /sys/fs/cgroup/pids/parent/pids.max = "max"
I created it following https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/pids.html
Consider this Python Code demonstrating the problem:
from os import fork, getpid
from time import sleep
i=0
print( "pid = %d " % getpid())
with open("/proc/%d/limits" % getpid(), "r") as f:
print(f.read())
try:
while fork():
i+=1
except BaseException as e:
print(i)
print(e)
sleep(10)
print("done")
exit(1)
My output:
pid = 18091
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 999999 999999 processes
Max open files 1024 1048576 files
Max locked memory 67108864 67108864 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 31412 31412 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
10227
[Errno 11] Resource temporarily unavailable
done
In Linux you have the pid_max limit:
$ cat /proc/sys/kernel/pid_max
32768
However, if your Linux is running on systemd you might hit user-slice limits:
for root
$ cat /sys/fs/cgroup/pids/user.slice/user-0.slice/pids.max
for currently logged in user:
$ cat /sys/fs/cgroup/pids/user.slice/user-$(id -u).slice/pids.max
10813
The systemd equivalent would be:
$ systemd-analyze dump | sed -n "/-> Unit user-$(id -u).slice:/,/-> Unit /p"| grep -e "TasksMax="
TasksMax=10813
According to man logind.conf
UserTasksMax=
Sets the maximum number of OS tasks each user may run concurrently. > This controls the TasksMax= setting of the per-user slice unit, see systemd.resource-control(5) for details. If assigned the special value
"infinity", no tasks limit is applied. Defaults to 33%, which equals 10813 with the kernel's defaults on the host, but might be smaller in OS containers.
This limit is defined in /etc/systemd/logind.conf, even though it might be commented out, 33% out of 32768.
#UserTasksMax=33%
When you modify the UserTasksMax limit or increase sysctl kernel.pid_max you'll have to restart systemd-logind service:
service systemd-logind restart
On 64-bit system you should be able to increase the value up to 2^22, i.e.: 4194304
sysctl kernel.pid_max=4194304

File sizes are reported differently

Why are file sizes all different?
In Windows 10 I can see all of these sizes:
11,116 KB
10.8 MB
11,382,240 Bytes
11,382,784 Bytes
If I use the Console Window:
D:\My Programs\2017\MeetSchedAssist\Inno\Output>dir *.exe
Volume in drive D is DATA
Volume Serial Number is A8B0-A5C6
Directory of D:\My Programs\2017\MeetSchedAssist\Inno\Output
03/04/2018 08:50 11,382,240 MeetSchedAssistSetup.exe
1 File(s) 11,382,240 bytes
0 Dir(s) 719,837,487,104 bytes free
D:\My Programs\2017\MeetSchedAssist\Inno\Output>
I understand that perhaps on the physical media it has to round it to physically take a certain amount of space, but that line above:
Size: 10.8 MB (11,382,240 bytes)
Huh? Why does it not say 11.38 MB?
Once upon a time it has been defined that
1 kB = 1024 B
1 MB = 1024 kB
If you divide your bytes figure all the way down to MB, you'll get all those figures.
Now that they noticed that many people tend to walk into that trap, they have redefined the unit multiples and defined new ones
1 kiB = 1024 B
1 MiB = 1024 kiB
1 kB = 1000 B
1 MB = 1000 kB
but this scheme is not so widespread (seems to be more common with total size specs of storage media).
Funny sidenote: I guess I am not the only one who has learned it the old way and now mixes it up with the current definition all the time. I'd say problems like this are the root cause for humanity being mostly conservatively oriented.

/etc/sysctl.conf shmall shmmax calculation Oracle DB

I have a question about what COULD be the possible impact to a system, if we have the following kernel shmmax and shmall kernel settings:
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 135456844800
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 107374182400
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
# free -mt
total used free shared buffers cached
Mem: 258065 230766 27299 11935 1481 58723
-/+ buffers/cache: 170561 87504
Swap: 16383 52 16331
Total: 274449 230818 43631
On that specific server we run only Oracle DB, are the settings correct and if not what could be the impact? Thanks a lot in advance!

mmap2 fails to allocate a stack for a thread, but works under GDB

I am running application on ARMv7-A machine with Fedora 18, 2GB of RAM.
The application terminates:
130413 15:49:34 19344 Xrd: PhyConnection: Can't run reader thread: out of system resources. Critical error.
If I strace that, I see that allocation of stack fails for a new thread:
mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = -1 ENOMEM (Cannot allocate memory)
gettimeofday({1365921367, 588018}, NULL) = 0
gettid() = 6309
writev(2, [{"130414 02:36:07 6309 ", 21}, {"Xrd", 3}, {"", 0}, {": ", 2}, {"PhyConnection: Can't run reader "..., 80}, {"\n", 1}], 6130414 02:36:07 6309 Xrd: PhyConnection: Can't ru
n reader thread: out of system resources. Critical error.
) = 107
munmap(0x48172000, 292) = 0
munmap(0x48225000, 292) = 0
Actual code:
253 if (fReaderthreadhandler[i]->Run(this)) {
254 Error("PhyConnection",
255 "Can't run reader thread: out of system resources. Critical error.");
256 // HELP: what do we do here
257 exit(-1);
258 }
The application had 300-350MB in virtual memory size, and ~250MB is resident memory size. High memory limitation is 1.3GB. Virtual address space is not limited:
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 1024
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 64
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 15870
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited
But it does work from GDB! I also looked what limits are reported from GDB and they are the same. Thus GDB does not adjust soft limits, which would be inherited.
Summary:
I have enough memory to run the application. It even works fine inside GDB.
It doesn't seem that it hit any of the resource limits.
Works in GDB, but not outside.
Any hints of what could be wrong here?
Works in GDB, but not outside.
One thing that is different "inside GDB" is address layout (randomization).
In order to make debugging easier, GDB disables ASLR by default. You can turn it back on with
(gdb) set disable-randomization off
and then run the app several times, and check whether it still reliably works.
I have enough memory to run the application.
The allocation (mapping) that is failing requests 8MB of continuous memory, which you may not have if your address space is fragmented. If you don't actually need 8MB of stack (most applications don't), you could get many more threads in by setting ulimit -s (or use setrilimit(RLIMIT_STACK, ...) from within the application) to a significantly smaller value.

Resources