monitoring linux server sockets or files - linux

I have the famous socketexception too many open files bug.
Iam running an apache http server, tomcat server and a mysql database on my server.
I checked the limit of open files with ulimit -n that gave me 1024.
If i want to check how many files are opened by lsof -u tomcat, it gives me 5
same for mysql. I not sure what the problem is.. but i have also a readlink permission denied.
i want to monitor my socket connections and opened files on my server. I thought about using the decribed linux commands in a shell script and send them per mail to me.
The other option i think is using netstat and count maybe the connections.. but its loading very slowly and is giving me getnameinfo fail.
what would be the better command to monitor the bug i have`?
EDIT:
SHOW GLOBAL VARIABLES LIKE '%open%';
Variable_name Value
Com_ha_open 0
Com_show_open_tables 0
Open_files 8
Open_streams 0
Open_table_definitions 87
Open_tables 64
Opened_files 673
Opened_table_definitions 87
Opened_tables 628
Slave_open_temp_tables 0
SHOW GLOBAL VARIABLES LIKE '%open%';
Variable_name Value
have_openssl DISABLED
innodb_open_files 300
open_files_limit 2000
table_open_cache 64
SHOW GLOBAL VARIABLES LIKE '%connect%'
character_set_connection latin1
collation_connection latin1_swedish_ci
connect_timeout 10
init_connect
max_connect_errors 10
max_connections 400
max_user_connections 0
SHOW GLOBAL STATUS LIKE '%connect%';
Variable_name Value
Aborted_connects 1
Connections 35954
Max_used_connections 102
Ssl_client_connects 0
Ssl_connect_renegotiates 0
Ssl_finished_connects 0
Threads_connected 11

You may check ulimit values with 'ulimit -a' to determine capacity of Open Files.
From OS Command Prompt, ulimit -n 8192 and press enter to enable more Open Files dyamically.
To make this change persist across OS restart, the next URL can be your guide.
https://glassonionblog.wordpress.com/2013/01/27/increase-ulimit-and-file-descriptors-limit/
Where their example is for 500000 capacity, use 8192 for your system, please.
Suggestions to consider for your my.cnf [mysqld] section,
thread_cache_size=100 # to support your max_used_connections of 102
max_user_connections=400 # from 0 to match max_connections requested
table_open_cache=800 # from 64 to reduce Opened_tables count
innodb_open_files=800 # from 300 to match table_open_cache requested
Implementing these details should avoid 'too many open files' message. For additional assistance, view profile, Network profile for contact information and free downloadable Utility Scripts to assist with performance tuning.

Related

Is possibile to rotate a tcpdump log?

I have the following command:
sudo tcpdump -ni enp0s3 -W 1 -C 1 -w file.cap
with this command I say: "listen on the network interface enp0s3 and capture all packets in a file whose maximum size must be 1 mb". It works, however the problem is that when the file reaches the size of 1mb, it is reset and the capture starts all over again from 0 kb, deleting all the packets.
I want that when the file is 1MB, only the older packages are deleted and the new ones are added replacing them. I don't want all packets to be deleted and acquisition restarts at 0kb. In other words, I want the file to always be around 1mb, adding the new incoming packets in place of the oldest ones.
You can use -U -W 2 with the -C size limit. It will then alternate between two files and you can concatenate them (or work on the older one).
Alternatives would be to write to a stream or pipe and not to files, at all.

Fluent Bit not saving any data on filesystem

I am new to fluent bit and currently doing a POC. I tried multiple things but couldn't make Fluent Bit save any data to filesystem.
[SERVICE]
flush 1
daemon Off
log_level trace
parsers_file parsers.conf
plugins_file plugins.conf
http_server on
http_listen 0.0.0.0
http_port 2020
storage.metrics on
storage.path /var/log/fluent-bit/buffer
storage.max_chunks_up 4
storage.sync full
storage.backlog.mem_limit 1M
[INPUT]
name cpu
tag cpu.local
# Read interval (sec) Default: 1
interval_sec 1
[INPUT]
name exec
tag d-disk
command df -h --type=ext4 | grep -v Filesystem
interval_sec 1
interval_nsec 0
[INPUT]
name mem
tag memory
interval_sec 1
[OUTPUT]
name stdout
match memory
When I go to /var/log/fluent-bit/buffer and run ls -a I see nothing.
My aim to make Fluent Bit save data on disk.
Here we have to specifically mention the buffering mechanism to use. more details.
Try adding storage.type filesystem in your INPUT section

Elasticsearch process memory locking failed

I have set boostrap.memory_lock=true
Updated /etc/security/limits.conf added memlock unlimited for elastic search user
My elastic search was running fine for many months. Suddenly it failed 1 day back. In logs I can see below error and process never starts
ERROR: bootstrap checks failed
memory locking requested for elasticsearch process but memory is not locked
I hit ulimit -as and I can see max locked memory set to unlimited. What is going wrong here? I have been trying for hours but all in vain. Please help.
OS is RHEL 7.2
Elasticsearch 5.1.2
ulimit -as output
core file size (blocks -c) 0
data seg size (kbytes -d) unlimited
scheduling policy (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 83552
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -q) 8
POSIX message queues (bytes,-q) 819200
real-time priority (-r) 0
stack size kbytes, -s) 8192
cpu time seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Here is what I have done to lock the memory on my ES nodes on RedHat/Centos 7 (it will work on other distributions if they use systemd).
You must make the change in 4 different places:
1) /etc/sysconfig/elasticsearch
On sysconfig: /etc/sysconfig/elasticsearch you should have:
ES_JAVA_OPTS="-Xms4g -Xmx4g"
MAX_LOCKED_MEMORY=unlimited
(replace 4g with HALF your available RAM as recommended here)
2) /etc/security/limits.conf
On security limits config: /etc/security/limits.conf you should have
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
3) /usr/lib/systemd/system/elasticsearch.service
On the service script: /usr/lib/systemd/system/elasticsearch.service you should uncomment:
LimitMEMLOCK=infinity
you should do systemctl daemon-reload after changing the service script
4) /etc/elasticsearch/elasticsearch.yml
On elasticsearch config finally: /etc/elasticsearch/elasticsearch.yml you should add:
bootstrap.memory_lock: true
Thats it, restart your node and the RAM will be locked, you should notice a major performance improvement.
OS = Ubuntu 16
ElasticSearch = 5.6.3
I also used to have the same problem.
I set in elasticsearch.yml
bootstrap.memory_lock: true
and i got in my logs:
memory locking requested for elasticsearch process but memory is not locked
i tried several things, but actually you need to do only one thing (according to https://www.elastic.co/guide/en/elasticsearch/reference/master/setting-system-settings.html );
file:
/etc/systemd/system/elasticsearch.service.d/override.conf
add
[Service]
LimitMEMLOCK=infinity
A little bit explanation.
The really funny thing is that systemd does not really care about ulimit settings at all. ( https://fredrikaverpil.github.io/2016/04/27/systemd-and-resource-limits/ ). You can easily check this fact.
Set in /etc/security/limits.conf
elasticsearch - memlock unlimited
check that for elasticsearch max locked memory is unlimited
$ sudo su elasticsearch -s /bin/bash
$ ulimit -l
disable bootstrap.memory_lock: true in /etc/elasticsearch/elasticsearch.yml
# bootstrap.memory_lock: true
start service elasticsearch via systemd
# service elasticsearch start
check what max memory lock settings has service elasticsearch after it is
started
# systemctl show elasticsearch | grep -i limitmemlock
OMG! In spite we have set unlimited max memlock size via ulimit , systemd
completely ignores it.
LimitMEMLOCK=65536
So, we come to conclusion.
To start elasticsearch via systemd with enabled
bootstrap.memory_lock: true
we dont need to care about ulimit settings but we need
explecitely set it in systemd config file.
the end of story.
try setting
in /etc/sysconfig/elasticsearch file
set MAX_LOCKED_MEMORY=unlimited
in /usr/lib/systemd/system/elasticsearch.service
set LimitMEMLOCK=infinity
Make sure that your elasticsearch start process is configured to unlimited. For if e.g. you start elasticsarch with another user as the one configured in /etc/security/limits.conf or as root while defining a wildcard entry in limits.conf (which is not for root) it won't work.
Test itto be sure:
you could e.g. put ulimit -a ; exit just after the "#Start Daemon" in /etc/init.d/elasticsearch and start with bash /etc/init.d/elasticsearch start (adapt accordingly to your start mechanism).
check for the actual limit when the process is running (albeit short) with:
cat /proc/<pid>/limits
You will find lines similar to this:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
<truncated>
Then depend on the runner or container (in my case it was supervisord's minfds value), you can lift the actual limitation configuration.
I hope it gives a little hint for more general cases.
Followed this post
On ubuntu 18.04 with elasticsearch 6.x, there wasn't entry LimitMEMLOCK=infinity in file /usr/lib/systemd/system/elasticsearch.service.
So adding that in that file and setting MAX_LOCKED_MEMORY=unlimited in /etc/default/elasticsearch did the trick.
The jvm options can be added in /etc/elasticsearch/jvm.options file.
If you use the tar distribution and want to monitor it with monit you
have to tell monit to use unlimited - all other places for this configuration are ignored.
Add ulimit -s unlimited at the beginning of /etc/init.d/monit, then do systemctl daemon-reload and then service monit restart and monit start $yourMonitLabel.
One thing it "can" be is that your /tmp is mounted with noexec https://discuss.elastic.co/t/not-able-to-start-elasticsearch-due-to-failed-memory-lock/158009/6 check your logs and see if it complains about .UnsatisfiedLinkError: Native library
especially CentOS/RedHat but maybe others? Might be fixed in ES 7?

Too many open files error on Ubuntu 8.04

mysqldump: Couldn't execute 'show fields from `tablename`': Out of resources when opening file './databasename/tablename#P#p125.MYD' (Errcode: 24) (23)
on checking the error 24 on the shell it says
>>perror 24
OS error code 24: Too many open files
how do I solve this?
At first, to identify the certain user or group limits you have to do the following:
root#ubuntu:~# sudo -u mysql bash
mysql#ubuntu:~$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 71680
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 71680
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
mysql#ubuntu:~$
The important line is:
open files (-n) 1024
As you can see, your operating system vendor ships this version with the basic Linux configuration - 1024 files per process.
This is obviously not enough for a busy MySQL installation.
Now, to fix this you have to modify the following file:
/etc/security/limits.conf
mysql soft nofile 24000
mysql hard nofile 32000
Some flavors of Linux also require additional configuration to get this to stick to daemon processes versus login sessions. In Ubuntu 10.04, for example, you need to also set the pam session limits by adding the following line to /etc/pam.d/common-session:
session required pam_limits.so
Quite an old question but here are my two cents.
The thing that you could be experiencing is that the mysql engine didn't set its variable "open-files-limit" right.
You can see how many files are you allowing mysql to open
mysql> SHOW VARIABLES;
Probably is set to 1024 even if you already set the limits to higher values.
You can use the option --open-files-limit=XXXXX in the command line for mysqld.
Cheers
add --single_transaction to your mysqldump command
It could also be possible that by some code that accesses the tables dint close those properly and over a point of time, the number of open files could be reached.
Please refer to http://dev.mysql.com/doc/refman/5.0/en/table-cache.html for a possible reason as well.
Restarting mysql should cause this problem to go away (although it might happen again unless the underlying problem is fixed).
You can increase your OS limits by editing /etc/security/limits.conf.
You can also install "lsof" (LiSt Open Files) command to see Files <-> Processes relation.
There are no need to configure PAM, as I think. On my system (Debian 7.2 with Percona 5.5.31-rel30.3-520.squeeze ) I have:
Before my.cnf changes:
\#cat /proc/12345/limits |grep "open files"
Max open files 1186 1186 files
After adding "open_files_limit = 4096" into my.cnf and mysqld restart, I got:
\#cat /proc/23456/limits |grep "open files"
Max open files 4096 4096 files
12345 and 23456 is mysqld process PID, of course.
SHOW VARIABLES LIKE 'open_files_limit' show 4096 now.
All looks ok, while "ulimit" show no changes:
\# su - mysql -c bash
\# ulimit -n
1024
There is no guarantee that "24" is an OS-level error number, so don't assume that this means that too many file handles are open. It could be some type of internal error code used within mysql itself. I'd suggest asking on the mysql mailing lists about this.

How to tie a network connection to a PID without using lsof or netstat?

Is there a way to tie a network connection to a PID (process ID) without forking to lsof or netstat?
Currently lsof is being used to poll what connections belong which process ID. However lsof or netstat can be quite expensive on a busy host and would like to avoid having to fork to these tools.
Is there someplace similar to /proc/$pid where one can look to find this information? I know what the network connections are by examining /proc/net but can't figure out how to tie this back to a pid. Over in /proc/$pid, there doesn't seem to be any network information.
The target hosts are Linux 2.4 and Solaris 8 to 10. If possible, a solution in Perl, but am willing to do C/C++.
additional notes:
I would like to emphasize the goal here is to tie a network connection to a PID. Getting one or the other is trivial, but putting the two together in a low cost manner appears to be difficult. Thanks for the answers to so far!
I don't know how often you need to poll, or what you mean with "expensive", but with the right options both netstat and lsof run a lot faster than in the default configuration.
Examples:
netstat -ltn
shows only listening tcp sockets, and omits the (slow) name resolution that is on by default.
lsof -b -n -i4tcp:80
omits all blocking operations, name resolution, and limits the selection to IPv4 tcp sockets on port 80.
On Solaris you can use pfiles(1) to do this:
# ps -fp 308
UID PID PPID C STIME TTY TIME CMD
root 308 255 0 22:44:07 ? 0:00 /usr/lib/ssh/sshd
# pfiles 308 | egrep 'S_IFSOCK|sockname: '
6: S_IFSOCK mode:0666 dev:326,0 ino:3255 uid:0 gid:0 size:0
sockname: AF_INET 192.168.1.30 port: 22
For Linux, this is more complex (gruesome):
# pgrep sshd
3155
# ls -l /proc/3155/fd | fgrep socket
lrwx------ 1 root root 64 May 22 23:04 3 -> socket:[7529]
# fgrep 7529 /proc/3155/net/tcp
6: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 7529 1 f5baa8a0 300 0 0 2 -1
00000000:0016 is 0.0.0.0:22. Here's the equivalent output from netstat -a:
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
Why don't you look at the source code of netstat and see how it get's the information? It's open source.
For Linux, have a look at the /proc/net directory
(for example, cat /proc/net/tcp lists your tcp connections). Not sure about Solaris.
Some more information here.
I guess netstat basically uses this exact same information so i don't know if you will be able to speed it up a whole lot. Be sure to try the netstat '-an' flags to NOT resolve ip-adresses to hostnames realtime (as this can take a lot of time due to dns queries).
The easiest thing to do is
strace -f netstat -na
On Linux (I don't know about Solaris). This will give you a log of all of the system calls made. It's a lot of output, some of which will be relevant. Take a look at the files in the /proc file system that it's opening. This should lead you to how netstat does it. Indecently, ltrace will allow you to do the same thing through the c library. Not useful for you in this instance, but it can be useful in other circumstances.
If it's not clear from that, then take a look at the source.
Take a look at these answers which thoroughly explore the options available:
How I can get ports associated to the application that opened them?
How to do like "netstat -p", but faster?

Resources