Nrpe Not pulling date NRPE: Unable to read output - linux

I'm trying to get memory metric from client machine. I installed nrpe in client machine and works well for default checks like load, users and all.
Manual output from client machine,
root#Nginx:~# /usr/lib/nagios/plugins/check_mem -w 50 -c 40
OK - 7199 MB (96%) Free Memory
But when i try from server, other metrics works but memory metrics not working,
[ec2-user#ip-10-0-2-179 ~]$ /usr/lib64/nagios/plugins/check_nrpe -H 107.XX.XX.XX -c check_mem
NRPE: Unable to read output
Other metrics works well
[ec2-user#ip-10-0-2-179 ~]$ /usr/lib64/nagios/plugins/check_nrpe -H 107.XX.XX.XX -c check_load
OK - load average: 0.00, 0.01, 0.05|load1=0.000;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.050;5.000;20.000;0;
I ensured that check_mem command has execution permission for all,
root#Nginx:~# ll /usr/lib/nagios/plugins/check_mem
-rwxr-xr-x 1 root root 2394 Sep 6 00:00 /usr/lib/nagios/plugins/check_mem*
Also here is my client side nrpe config commands
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_procs]=/usr/lib/nagios/plugins/check_procs -w 200 -c 250
command[check_http]=/usr/lib/nagios/plugins/check_http -I 127.0.0.1
command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 30 -c 20
command[check_mem]=/usr/lib/nagios/plugins/check_mem -w 30 -c 20
Can anyone help me to fix the issue?

Related

Device node in LXC is not accessible when connected via SSH

I have a problem where a physical hardware device passed through to an LXC container cannot be read from or written to when I am connected via SSH.
The device node of my physical hardware device looks like this:
myuser#myhost:~$ ls -la /dev/usb/hiddev0
crw-rw-rw- 1 root root 180, 0 Jul 30 10:27 /dev/usb/hiddev0
This is how I create and start my container:
myuser#myhost:~$ sudo lxc-create -q -t debian -n mylxc -- -r stretch
myuser#myhost:~$ sudo lxc-start -n mylxc
Then I add the device node to the LXC:
myuser#myhost:~$ sudo lxc-device -n mylxc add /dev/usb/hiddev0
Afterwards the device is available in the LXC and I can read from it after having attached to the LXC:
myuser#myhost:~$ sudo lxc-attach -n mylxc
root#mylxc:/# ls -la /dev/usb/hiddev0
crw-r--r-- 1 root root 180, 0 Aug 27 11:26 /dev/usb/hiddev0
root#mylxc:/# cat /dev/usb/hiddev0
����������^C
root#mylxc:/#
I then enable root access via SSH without a password:
myuser#myhost:~$ sudo lxc-attach -n mylxc
root#mylxc:/# sed -i 's/#\?PermitRootLogin.*/PermitRootLogin yes/g' /etc/ssh/sshd_config
root#mylxc:/# sed -i 's/#\?PermitEmptyPasswords.*/PermitEmptyPasswords yes/g' /etc/ssh/sshd_config
root#mylxc:/# sed -i 's/#\?UsePAM.*/UsePAM no/g' /etc/ssh/sshd_config
root#mylxc:/# passwd -d root
passwd: password expiry information changed.
root#mylxc:/# /etc/init.d/ssh restart
Restarting ssh (via systemctl): ssh.service.
root#mylxc:/# exit
When I connect via SSH now, the device node is there, but I cannot access it:
myuser#myhost:~$ ssh root#<lxc-ip-address>
root#mylxc:~# ls -la /dev/usb/hiddev0
crw-r--r-- 1 root root 180, 0 Aug 27 11:26 /dev/usb/hiddev0
root#mylxc:~# cat /dev/usb/hiddev0
cat: /dev/usb/hiddev0: Operation not permitted
In both cases (lxc-attach and ssh) I am the root user (verified via whoami), so this cannot be the problem.
Why am I not allowed to access the device when I am connected via SSH?
EDIT
In the meantime I found out that the error disappears when I call all the LXC initialization commands directly one after another in a script, i.e.:
sudo lxc-create -q -t debian -n mylxc -- -r stretch
sudo lxc-start -n mylxc
sudo lxc-device -n mylxc add /dev/usb/hiddev0
...
And then all the SSH configuration as described above. The device is correctly accessible via SSH then.
As soon as some time passes between lxc-start and lxc-device, the error appears, e.g.:
sudo lxc-create -q -t debian -n mylxc -- -r stretch
sudo lxc-start -n mylxc
sleep 1
sudo lxc-device -n mylxc add /dev/usb/hiddev0
...
Why is the timing relevant here? What happens during the first second within the LXC that makes the device become unaccessible?
With help from the lxc-users mailing list I found out that the restriction is intended. Access to devices has to be allowed explicitly in the LXC's config using their major/minor numbers:
lxc.cgroup.devices.allow = c 180:* rwm
The unrestricted access using lxc-attach seems to be some bug in my case. Devices should never be accessible in the LXC if not explicitly allowed.

Jetty Websocket Scaling

what is the maximum number of websockets any one has opened using jetty websocket server. I recently tried to load test the same and was able to open 200k concurrent connections on a 8 core linux VM as server and 16 clients with 4 core each. Each client was able to make 12500 concurrent connections post which they started to get socket timeout exceptions. Also I had tweaked the number of open files as well as tcp connections settings of both client and server as follows.
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sudo sysctl -w net.ipv4.tcp_wmem="4096 16384 16777216"
sudo sysctl -w net.core.somaxconn=8192
sudo sysctl -w net.core.netdev_max_backlog=16384
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=8192
sudo sysctl -w net.ipv4.tcp_syncookies=1
sudo sysctl -w net.ipv4.ip_local_port_range="1024 65535"
sudo sysctl -w net.ipv4.tcp_tw_recycle=1
sudo sysctl -w net.ipv4.tcp_congestion_control=cubic
On the contrary one 2 core machine running node was able to scale upto 90k connections.
My questions are as follows
Can we increase the throughput of jetty VM any more
What is the reason of node.js higher performance over jetty.

Can't jcmd, jps or jstat cassandra process within the docker container

$ jcmd -l
418 sun.tools.jcmd.JCmd -l
$ jstat -gcutil -t 10 250ms 1
10 not found
I am aware of the bug in jdk related to attaching jstat as root to a process running as a different user.
Here, this docker container has one user root and as can be seen below from the ps command, cassandra is running under root.
$ whoami
root
I have tried to do the following:
$ sudo -u root jcmd -l
Any help is appreciated.
Docker container is debian:jessie
running java version:
openjdk version "1.8.0_66-internal"
Here's the output of ps -ef:
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 17:40 ? 00:00:00 /bin/bash /run.sh
root 10 1 11 17:40 ? 00:02:25 java -ea -javaagent:/usr/share/c
root 375 0 0 17:49 ? 00:00:00 bash
root 451 375 0 18:00 ? 00:00:00 ps -ef
Aside: jstack successfully dumps out the stack traces of the threads.
I know at least two possible reasons why this can happen.
Java is run with -XX:+PerfDisableSharedMem option. This option helps sometimes to reduce JVM safepoint pauses, but it also makes JVM invisible to jps and jstat. This is a very likely case, because you are running Cassandra, and recent Cassandra has this option ON by default.
Java process has a different mount namespace, so that /tmp of Java process is not physically the same directory as /tmp of your shell. The directory /tmp/hsperfdata_root must be accessible in order to use jps or jstat. This is also a plausible reason since you are using docker containers.

How to optimize Redis-server for high-load?

Server: Intel® Core™ i7-3930K 6 core, 64 GB DDR3 RAM, 2 x 3 TB 6 Gb/s HDD SATA3
OS (uname -a): Linux *** 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u1 x86_64 GNU/Linux
Redis server 2.8.19
On the server to spin Redis-server, whose task is to serve requests from two PHP servers.
Problem: The server is unable to cope with peak loads and stops processing incoming requests or makes it very slowly.
Which attempts to optimize server I made:
cat /etc/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/enabled
ulimit -SHn 100032
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
cat /etc/sysctl.conf
vm.overcommit_memory=1
net.ipv4.tcp_max_syn_backlog=65536
net.core.somaxconn=32768
fs.file-max=200000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
cat /etc/redis/redis.conf
tcp-backlog 32768
maxclients 100000
What are the settings I found on redis.io, which is in blogs.
Tests
redis-benchmark -c 1000 -q -n 10 -t get,set
SET: 714.29 requests per second
GET: 714.29 requests per second
redis-benchmark -c 3000 -q -n 10 -t get,set
SET: 294.12 requests per second
GET: 285.71 requests per second
redis-benchmark -c 6000 -q -n 10 -t get,set
SET: 175.44 requests per second
GET: 192.31 requests per second
By increasing the number of customers is reduced query performance and the worst thing, Redis-server stops processing incoming requests and servers in PHP there are dozens of types of exceptions
Uncaught exception 'RedisException' with message 'Connection closed' in [no active file]:0\n
Stack trace:\n
#0 {main}\n thrown in [no active file] on line 0
What to do? What else to optimize? How do such a machine can pull customers?
Thank you!

Does iperf have a bandwidth ceiling?

I am trying to run iperf and have a throughput of 1Gig. I'm using UDP so I expect the overhead to pretty much be minimal. Still, I see it capped at 600M despite my attempts.
I have been running:
iperf -c 172.31.1.1 -u -b 500M -l 1100
iperf -c 172.31.1.1 -u -b 1000M -l 1100
iperf -c 172.31.1.1 -u -b 1500M -l 1100
Yet Anything above 600 it seems to hit a limit of about 600. For example, the output for 1000M is:
[ 3] Server Report:
[ 3] 0.0-10.0 sec 716 MBytes 601 Mbits/sec 0.002 ms 6544/689154 (0.95%)
[ 3] 0.0-10.0 sec 1 datagrams received out-of-order
I'm running this on a server with a 10Gig port and even sending it right back to itself, so there should be no interface bottlenecks.
Unsure if I am running up against an iperf limit or if there is another way to get a true 1Gig test.

Resources