Elasticsearch process memory locking failed - linux

I have set boostrap.memory_lock=true
Updated /etc/security/limits.conf added memlock unlimited for elastic search user
My elastic search was running fine for many months. Suddenly it failed 1 day back. In logs I can see below error and process never starts
ERROR: bootstrap checks failed
memory locking requested for elasticsearch process but memory is not locked
I hit ulimit -as and I can see max locked memory set to unlimited. What is going wrong here? I have been trying for hours but all in vain. Please help.
OS is RHEL 7.2
Elasticsearch 5.1.2
ulimit -as output
core file size (blocks -c) 0
data seg size (kbytes -d) unlimited
scheduling policy (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 83552
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -q) 8
POSIX message queues (bytes,-q) 819200
real-time priority (-r) 0
stack size kbytes, -s) 8192
cpu time seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Here is what I have done to lock the memory on my ES nodes on RedHat/Centos 7 (it will work on other distributions if they use systemd).
You must make the change in 4 different places:
1) /etc/sysconfig/elasticsearch
On sysconfig: /etc/sysconfig/elasticsearch you should have:
ES_JAVA_OPTS="-Xms4g -Xmx4g"
MAX_LOCKED_MEMORY=unlimited
(replace 4g with HALF your available RAM as recommended here)
2) /etc/security/limits.conf
On security limits config: /etc/security/limits.conf you should have
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
3) /usr/lib/systemd/system/elasticsearch.service
On the service script: /usr/lib/systemd/system/elasticsearch.service you should uncomment:
LimitMEMLOCK=infinity
you should do systemctl daemon-reload after changing the service script
4) /etc/elasticsearch/elasticsearch.yml
On elasticsearch config finally: /etc/elasticsearch/elasticsearch.yml you should add:
bootstrap.memory_lock: true
Thats it, restart your node and the RAM will be locked, you should notice a major performance improvement.

OS = Ubuntu 16
ElasticSearch = 5.6.3
I also used to have the same problem.
I set in elasticsearch.yml
bootstrap.memory_lock: true
and i got in my logs:
memory locking requested for elasticsearch process but memory is not locked
i tried several things, but actually you need to do only one thing (according to https://www.elastic.co/guide/en/elasticsearch/reference/master/setting-system-settings.html );
file:
/etc/systemd/system/elasticsearch.service.d/override.conf
add
[Service]
LimitMEMLOCK=infinity
A little bit explanation.
The really funny thing is that systemd does not really care about ulimit settings at all. ( https://fredrikaverpil.github.io/2016/04/27/systemd-and-resource-limits/ ). You can easily check this fact.
Set in /etc/security/limits.conf
elasticsearch - memlock unlimited
check that for elasticsearch max locked memory is unlimited
$ sudo su elasticsearch -s /bin/bash
$ ulimit -l
disable bootstrap.memory_lock: true in /etc/elasticsearch/elasticsearch.yml
# bootstrap.memory_lock: true
start service elasticsearch via systemd
# service elasticsearch start
check what max memory lock settings has service elasticsearch after it is
started
# systemctl show elasticsearch | grep -i limitmemlock
OMG! In spite we have set unlimited max memlock size via ulimit , systemd
completely ignores it.
LimitMEMLOCK=65536
So, we come to conclusion.
To start elasticsearch via systemd with enabled
bootstrap.memory_lock: true
we dont need to care about ulimit settings but we need
explecitely set it in systemd config file.
the end of story.

try setting
in /etc/sysconfig/elasticsearch file
set MAX_LOCKED_MEMORY=unlimited
in /usr/lib/systemd/system/elasticsearch.service
set LimitMEMLOCK=infinity

Make sure that your elasticsearch start process is configured to unlimited. For if e.g. you start elasticsarch with another user as the one configured in /etc/security/limits.conf or as root while defining a wildcard entry in limits.conf (which is not for root) it won't work.
Test itto be sure:
you could e.g. put ulimit -a ; exit just after the "#Start Daemon" in /etc/init.d/elasticsearch and start with bash /etc/init.d/elasticsearch start (adapt accordingly to your start mechanism).

check for the actual limit when the process is running (albeit short) with:
cat /proc/<pid>/limits
You will find lines similar to this:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
<truncated>
Then depend on the runner or container (in my case it was supervisord's minfds value), you can lift the actual limitation configuration.
I hope it gives a little hint for more general cases.

Followed this post
On ubuntu 18.04 with elasticsearch 6.x, there wasn't entry LimitMEMLOCK=infinity in file /usr/lib/systemd/system/elasticsearch.service.
So adding that in that file and setting MAX_LOCKED_MEMORY=unlimited in /etc/default/elasticsearch did the trick.
The jvm options can be added in /etc/elasticsearch/jvm.options file.

If you use the tar distribution and want to monitor it with monit you
have to tell monit to use unlimited - all other places for this configuration are ignored.
Add ulimit -s unlimited at the beginning of /etc/init.d/monit, then do systemctl daemon-reload and then service monit restart and monit start $yourMonitLabel.

One thing it "can" be is that your /tmp is mounted with noexec https://discuss.elastic.co/t/not-able-to-start-elasticsearch-due-to-failed-memory-lock/158009/6 check your logs and see if it complains about .UnsatisfiedLinkError: Native library
especially CentOS/RedHat but maybe others? Might be fixed in ES 7?

Related

Error: EMFILE: too many open files, watch, unless I use sudo

Description
Recently I've run into an problem. I am not able to run yarn start in element-web directory, I get these errors. Originally I thought it had something to do with element-web itself so I created an issue. Some time after that I tried to run wintersmith preview in bibviz directory and got the same errors. This was weird so I tried to create an Angular project and run ng serve and errors again. I headed to the issue to close it as it wasn't an element-web issue. I found that there was another issue created with the same problem. It had already been closed by turt2live saying it looks like you've run out of memory on your system. Based on this I tried to turn of most programs running in the background and now all the commands worked.
I am sure that ng serve used to work in the past.
My PC has 16 GB of RAM and the commands already fail when I am on 7/16 GB. I can't see any memory spikes when running the commands. Running the commands with sudo also completely eliminates the problem. This doesn't make any sense to me.
Research lead me to ulimits but they seem to have no effect. I have also installed watchman with no effect.
Can someone tell me what I am missing?
Thank you in advance!
Info
I am on Debian 11 Bullseye. This is the output of a few commands that could be useful.
As a regular user:
> uname -a
Linux Simon-s-PC 5.8.0-3-amd64 #1 SMP Debian 5.8.14-1 (2020-10-10) x86_64 GNU/Linux
> sudo sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 524288
> ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 46482
-n: file descriptors 8192
-l: locked-in-memory size (kbytes) unlimited
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 63664
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 95
-N 15: unlimited
> yarn --version
1.22.5
With sudo su:
> sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 524288
> ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 63664
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 2043392
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 63664
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited
I think I've found a solution:
Set limits in /etc/sysctl.conf by adding:
fs.inotify.max_user_watches=524288
fs.inotify.max_user_instances=512
Open a new terminal or reload sysctl.conf variables with
sudo sysctl --system
Run yarn start
Everything should work fine now, hopefully. If it doesn't work try setting the limits higher.

Why can't I create 50k processes in Linux?

Using Linux
$ uname -r
4.4.0-1041-aws
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
With limits allowing up to 200k processes
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 563048
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 524288
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
$ cat /proc/sys/kernel/pid_max
200000
$ cat /proc/sys/kernel/threads-max
1126097
And enough free memory to give 1MB each to 127k processes
$ free
total used free shared buff/cache available
Mem: 144156492 5382168 130458252 575604 8316072 137302624
Swap: 0 0 0
And I have fewer than 1k existing processes/threads.
$ ps -elfT | wc -l
832
But I cannot start 50k processes
$ echo '
seq 50000 | while read _; do
sleep 20 &
done
' | bash
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
...
Why can't I create 50k processes?
It was caused by Linux cancer systemd.
In addition to kernel.pid_max and ulimit, I also needed to change a third limit.
/etc/systemd/logind.conf
[Login]
UserTasksMax=70000
And then restart.
Building on #Basile's answer, you probably ran out of pids.
cat /proc/sys/kernel/pid_max gives me 32768 on my machine (maximum value of a signed short). which is less than 50k
EDIT: I missed that /proc/sys/kernel/pid_max is set to 200000. That probably isn't the issue in this case.
Because each process requires some resources: some RAM (including some kernel memory), some CPU, etc.
Each process has its own virtual address space, including its own call stack (and some of it requires physical resources, including several pages of RAM; read more about resident set size; on my desktop the RSS of some bash process is about 6Mbytes). So a process is actually some quite heavy stuff.
BTW, this is not specific to Linux.
Read more about operating systems, e.g. Operating Systems : Three Easy Pieces
Try also cat /proc/$$/maps and cat /proc/$$/status and read more about proc(5). Read about failure of fork(2) and of execve(2). The resource temporarily unavailable is for EAGAIN (see errno(3)), and several reasons can make fork fail with EAGAIN. And on my system, cat /proc/sys/kernel/pid_max gives 32768 (and reaching that limit gives EAGAIN for fork).
BTW, imagine if you could fork ten thousand processes. Then the context switch time would be dominant w.r.t. to running time.
Your Linux system looks like some AWS instance. Amazon won't let you create that much processes, because their hardware is not expecting that much.
(on some costly supercomputer or server with e.g. a terabyte of RAM and a hundred of cores, perhaps you could run 50K processes; I guess that they need some particular kernel, or kernel configuration. I recommend getting help from Amazon support)

Ulimit chnage after reboot as no effect

I have changed /etc/security/limits.com and rebooted the machine remotely, However, after the boot, the nproc parameter has still the old value.
[ost#compute-0-1 ~]$ cat /etc/security/limits.conf
* - memlock -1
* - stack -1
* - nofile 4096
* - nproc 4096 <=====================================
[ost#compute-0-1 ~]$
Broadcast message from root#compute-0-1.local
(/dev/pts/0) at 19:27 ...
The system is going down for reboot NOW!
Connection to compute-0-1 closed by remote host.
Connection to compute-0-1 closed.
ost#cluster:~$ ssh compute-0-1
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Last login: Tue Sep 27 19:25:25 2016 from cluster.local
Rocks Compute Node
Rocks 6.1 (Emerald Boa)
Profile built 19:00 23-Aug-2016
Kickstarted 19:08 23-Aug-2016
[ost#compute-0-1 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 516294
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1024 <=========================
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Please see that I set max user processes to 4096 but after the reboot, the value is still 1024.
Please take a look at a file named /etc/pam.d/sshd .
If you can find it, open the file and insert a following line.
session required pam_limits.so
Then the new value will be effective even after rebooting.
PAM is a module which is related to authentication. So you need to enable the module through ssh login.
More details on man pam_limits.
Thanks!

forkpty fails for jailed linux user

I have a Ubuntu 12.04 setup on the server. Every registered user is also registered as linux user & jailed with limited system resource access through /etc/security/limits.conf .
I tried running a server as one of the registered users. The app is a nodejs app - http://github.com/pocha/terminal-codelearn . It uses https://github.com/chjj/pty.js to create a Pseudo Terminal for every user which comes to the nodejs app.
The app fails with 'forkpty(3) failed' error pointed to line 184 of https://github.com/chjj/pty.js/blob/65dd89fd8f87de914ff1814362918d7bd87c9cbf/src/unix/pty.cc
pid_t pid = pty_forkpty(&master, name, NULL, &winp);
if (pid) {
for (i = 0; i < argl; i++) free(argv[i]);
delete[] argv;
for (i = 0; i < envc; i++) free(env[i]);
delete[] env;
free(cwd);
}
switch (pid) {
case -1:
return ThrowException(Exception::Error(
String::New("forkpty(3) failed.")));
I am able to successfully deploy the app on http://nitrous.io . They probably have similar way to jail user. I tried running ulimits -a & matched every value except for pending signal. Somehow on my server, the maximum pending signal value does not exceed around 90k value while it is 584k on Nitrous server.
Below is the ulimit -a output from Nitrous server
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 548288
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 512
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 256
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
The app fails on heroku with exact similar error.
Can anybody help on how to make the app run on my server the way it works on nitrous.io
I know that heroku fails to forkpty because they're not actually running POSIX, just very posix-like. So some things, like forkpty, just don't work. I don't think there's a way around that :( wish there were.
I am not sure if I understand POSIX type. But I figured out that in my jailed environment there was no /dev/ptmx & /dev/pts/* . I googled & created them & it started working.

Too many open files error on Ubuntu 8.04

mysqldump: Couldn't execute 'show fields from `tablename`': Out of resources when opening file './databasename/tablename#P#p125.MYD' (Errcode: 24) (23)
on checking the error 24 on the shell it says
>>perror 24
OS error code 24: Too many open files
how do I solve this?
At first, to identify the certain user or group limits you have to do the following:
root#ubuntu:~# sudo -u mysql bash
mysql#ubuntu:~$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 71680
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 71680
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
mysql#ubuntu:~$
The important line is:
open files (-n) 1024
As you can see, your operating system vendor ships this version with the basic Linux configuration - 1024 files per process.
This is obviously not enough for a busy MySQL installation.
Now, to fix this you have to modify the following file:
/etc/security/limits.conf
mysql soft nofile 24000
mysql hard nofile 32000
Some flavors of Linux also require additional configuration to get this to stick to daemon processes versus login sessions. In Ubuntu 10.04, for example, you need to also set the pam session limits by adding the following line to /etc/pam.d/common-session:
session required pam_limits.so
Quite an old question but here are my two cents.
The thing that you could be experiencing is that the mysql engine didn't set its variable "open-files-limit" right.
You can see how many files are you allowing mysql to open
mysql> SHOW VARIABLES;
Probably is set to 1024 even if you already set the limits to higher values.
You can use the option --open-files-limit=XXXXX in the command line for mysqld.
Cheers
add --single_transaction to your mysqldump command
It could also be possible that by some code that accesses the tables dint close those properly and over a point of time, the number of open files could be reached.
Please refer to http://dev.mysql.com/doc/refman/5.0/en/table-cache.html for a possible reason as well.
Restarting mysql should cause this problem to go away (although it might happen again unless the underlying problem is fixed).
You can increase your OS limits by editing /etc/security/limits.conf.
You can also install "lsof" (LiSt Open Files) command to see Files <-> Processes relation.
There are no need to configure PAM, as I think. On my system (Debian 7.2 with Percona 5.5.31-rel30.3-520.squeeze ) I have:
Before my.cnf changes:
\#cat /proc/12345/limits |grep "open files"
Max open files 1186 1186 files
After adding "open_files_limit = 4096" into my.cnf and mysqld restart, I got:
\#cat /proc/23456/limits |grep "open files"
Max open files 4096 4096 files
12345 and 23456 is mysqld process PID, of course.
SHOW VARIABLES LIKE 'open_files_limit' show 4096 now.
All looks ok, while "ulimit" show no changes:
\# su - mysql -c bash
\# ulimit -n
1024
There is no guarantee that "24" is an OS-level error number, so don't assume that this means that too many file handles are open. It could be some type of internal error code used within mysql itself. I'd suggest asking on the mysql mailing lists about this.

Resources