my client machine has syslog-ng and my remote machine has rsyslog configuration.
my server/remote machine manages many clients and I need to differentiate which machine is sending which logs.
normally I would use syslog-ng on the server side but these machines aren't meant to have them.
Also would like to mention it isn't for apache or web servers just physical machines.
On the client's side
Tried altering and adding different options or changing them to yes/no respectively.
options {
keep_hostname(yes);
create_dirs(no);
use_dns(no);
};
for eg:keep_hostname to no, it worked but only when I changed the hostname to the machine's ip address. which is not what I want.
Using a template
template("$(ISODATE) $(FULLHOST_FROM) $(SOURCEIP) $(HOST) $(HOSTNAME) ${PROGRAM}: ${MESSAGE}\n")
output:
day time localhost abc[ID] .source.s_local SourceIP=127.0.0.1 localhost localhost (root) CMD (xyz.conf)#ID
this isn't the output I want, it is printing in the message section when I want it in the place of the "host" and I don't understand how the source ip is the loopback address.
Using structured logging
rewrite r_sourceip{
set('${SOURCEIP}' value(HOST));
};
log { source(s_local); rewrite(r_sourceip);destination(d_syslog_tcp); };
output:
and the ip is displayed in the logs as the loopback address instead of the machine ip.
day date time 127.0.0.1 syslog-ng.service: Succeeded.
Tried installing rsyslog on my client but it doesn't work
sudo add-apt-repository ppa:adiscon/v8-stable
sudo apt-get update
sudo apt-get install rsyslog
I kept running into many errors, fixing them was impossible due to the difference in OS version or type maybe.
add apt repository command not found
wget command not found
On the server's side
Using a template
which creates a folder with the client's hostname and stores the logs in that particular folder.
not the solution I want.
$template DynaFile,"/var/log/%FROMHOST-IP%/%syslogfacility-text%.log"
*.* -?DynaFile
I want the logs to appear as such
day date time `client's ip address` syslog-ng.service: Succeeded.
Can someone suggest me a solution and why I keep getting the loopback address as my client's ip?
We have setup an okd 3.11 cluster with 100+ nodes. Everything was working fine but then a worker node stopped resolving the registry service internal url. This causes new pods to be scheduled to that node fail with ImagePullBackoff error.
Failed to pull image "docker-registry.default.svc:5000/app-name/app-name:latest": rpc error: code = Unknown desc = Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on 10.*.*.71:53: server misbehaving
We tried running nslookup on the worker node and following were the results
While this doesn't work (while it works on other nodes)
[root#worker22 ~]# nslookup docker-registry.default.svc.cluster.local
Server: 10.*.*.71
Address: 10.*.*.71#53
** server can't find docker-registry.default.svc.cluster.local: SERVFAIL
This works just fine.
[root#worker22 ~]# nslookup docker-registry.default.svc.cluster.local 127.0.0.1
Server: 127.0.0.1
Address: 127.0.0.1#53
Name: docker-registry.default.svc.cluster.local
Address: 172.*.*.212
Adding server=/cluster.local/172.30.0.1 to dnsmasq conf file /etc/dnsmasq.d/origin-upstream-dns.conf works as a work around but can't find what is causing this.
I have tried adding -q to dnsmasq service's ExecStart and it shows that the dnsmasq won't query the openshift dns running locally at 127.0.0.1:53.
Dnsmasq config/resolv.conf is in order on the node.
I have tried restarting dnsmasq/NetworkManager/Docker, I have tried respawning ovs/sdn pods but still no help.
Found some documented evidence that dnsmasq can behave like that.
It has been suggested by some RedHat articles that a long running dnsmasq service may misbehave and stop resolving names. Similar cases have been reported for openshift environment as well.
The links below suggest that restarting the service would solve the problem for some time and then the issue may resurface. As stated earlier, in my case service restart didn't help but oldest remedy in IT worked (rebooting the node solved the problem).
Reference:
https://access.redhat.com/solutions/3393141
https://bugzilla.redhat.com/show_bug.cgi?id=1560489
I want to set up 2 rabbitmq servers to work in cluster.
When when trying to run
rabbitmqctl join_cluster rabbit#my_rabbit_1.my.domain.name on my_rabbit_1
I get unable to connect to epmd (port 4369) on my_rabbit_2.my.domain.name: nxdomain (non-existing domain)
I use rabbitmq:latest (debian), .erlang.cookie is the same, hosts resolve fine: I can ping both directions, nmap -6 -p 4369 my_rabbit_2.my.domain.nam returns 4369/tcp open epmd
EDIT:
tcpdump shows that while resolving hostname, rabbit or epmd performs not 2 types of DNS query: AAAA for IPv6 and A for IPv4 address, but only IPv4 which fails repeatedly with nxdomain as there is no IPv4 address available. However, it does not try AAAA DNS query, except when trying to run command like rabbitmq -n rabbit#local.machine.domain.name: then it runs AAAA query and outputs successfully. Hence the problem. How do I solve that?
Finally found solution that worked for me. Erlang documentation says that, by default, -proto_dist specifies a protocol for Erlang distribution, which defaults to inet_tcp (TCP over IPv4). So in IPv6-only environment you have to set -proto_dist inet6_tcp flag for erl.
This can be done by adding the following lines to your rabbitmq-env.conf (see RabbitMQ configuration docs):
# For rabbitmq-server
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="-proto_dist inet6_tcp"
# For rabbitmqctl
RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp"
Note that rabbitmqctl and rabbitmq-server use different erl settings: I was unable to create cluster without RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp" setting using rabbitmqctl join_cluster rabbit#host.in.my.domain. It should not be necessary in production mode. Also note that RabbitMQ configuration docs advice against using this setting except for debugging.
unable to connect to epmd (port 4369) on my_rabbit_2.my.domain.name: nxdomain (non-existing domain)
This is an error raised when the rabbitmq server is running on a hostname other than what you think it is running on, or when hostname doesn't resolve to what you think it does.
Amusingly enough I had this exact same issue last night when one instance in our cluster failed, came back on a new hostname, and somehow corrupted its internal authentication store etc.
Without the exact dns entries etc for your setup, all I can offer is general troubleshooting steps.
See this StackOverflow question for a resolution that may help you - in particular the answer by Kishor Pawar.
Are you sure you configured rabbitmq to listen on IPV6? Is there a reason you can't bind it to IPV4 as well on 127.0.0.1 for management operations?
I downloaded the Tardis branch of RedQueryBuilder and did an mvn clean install.
It runs through things for a bit then it gets to this part
[INFO] Running com.redspr.redquerybuilder.core.client.GwtTestDom
[INFO] logging for HtmlUnit thread
[INFO] [ERROR] I/O error on HTTP request
[INFO] org.apache.http.conn.HttpHostConnectException: Connection to http://50.19.99.237:53655 refused
Just wondering if there's a quick answer, like, oh your gwt is out of date, or some such other easy to solve issue.
So here was the problem, had nothing to do with GAE.
The problem was that the name of my host, in /etc/hostname, had no corresponding host entry in /etc/hosts. It was complicated by the fact that I had "search mydomain.com" in my resolv.conf, which was further complicated by the fact that mydomain.com is wildcarded, so any unknown hostnames resolve to a particular IP address.
So what happend was, the test suite would look for myhost, since it didn't find myhost in DNS or in /etc/hosts, it looked up myhost.mydomain.com, as a result of my resolv.conf, which returned a valid IP address, because of the wildcard. Then the test suite got a connection refused, because it was connecting to a totally different host. So the solution was to add 127.0.0.1 myhost myhost.mydomain.com to my /etc/hosts and it built and ran fine.
Long story short, if the host defined in /etc/hostname does not have a valid DNS or /etc/hosts entry, the build will fail, as it will either get a host unknown, or in my case a connection refused because of my DNS jiggery pokery.
I have NRPE daemon process running under xinetd on amazon ec2 instance and nagios server on my local machine.
The check_nrpe -H [amazon public IP] gives this error:
CHECK_NRPE: Error - Could not complete SSL handshake.
Both Nrpe are same versions. Both are compiled with this option:
./configure --with-ssl=/usr/bin/openssl --with-ssl-lib=/usr/lib/i386-linux-gnu/
"allowed host" entry contains my local IP address.
What could be the possible reason of this error now??
If you are running nrpe as a service, make sure you have this line in your nrpe.cfg on the client side:
# example 192. IP, yours will probably differ
allowed_hosts=127.0.0.1,192.168.1.100
You say that is done, however, if you are running nrpe under xinetd, make sure to edit the only_from directive in the file /etc/xinetd.d/nrpe.
Don't forget to restart the xinetd service:
service xinetd restart
To check if you have access to it at all attempt a simple telnet on the address:port, a ping or traceroute to see where it is blocking.
telnet IP port
ping IP
traceroute -p $port IP
Also check on the target server that the nrpe daemon is working properly.
netstat -at | grep nrpe
You also need to check the versions of OpenSSL installed on both servers, as I have seen this break checks on occasion with the SSL handshake!
check your /var/sys/system.log . In my case, it turned out my monitored IP was set to something else than the one I set in nrpe.cfg file. I don't know the cause of this change, though.
#jgritty was right.
you should edit nrpe.cfg and nrpe config files to allow your master nagios server's access:
vim /usr/local/nagios/etc/nrpe.cf
allowed_hosts=127.0.0.1,172.16.16.150
and
vim /etc/xinetd.d/nrpe
only_from= 127.0.0.1 172.16.16.150
That's somewhat of a catch-all error message for NRPE. Check your firewall rules and make sure that port is open. Also try disabling SELinux and seeing if that lets the connection through. It's likely not an SSL issue, but just an issue with the connection being refused.
It looks like you are running your Nagios server in a virtual machine on a host-only network. If this is so, this would stop any external access. Ensure that you have a NAT or Bridged Network available.
So many answers, none of them hit the reason why I ran into this issue.
It turns out that nagios has terrible cross-version support and this was caused by me having a version 2 "client" (machine being monitored) and a version 3 "server" (monitoring machine).
Once I upgraded the client to version 3, the problem went away and I could do a check_nrpe -H [client IP] without issues.
Note that I am not sure if client/server are the right terms with nagios, as in the case of an NRPE call, the server is really the machine being called, but I digress.
Make sure that you have restarted the Nagios Client Plugin as well.
I'm running nrpe using the xinetd service.
Make sure also (in addition to the above basic steps) that your nagios user is authenticating properly. In my case:
Jun 6 15:05:52 gse2 xinetd[33237]: **Unknown user: nagios**<br>[file=/etc/xinetd.d/nrpe] [line=9]
Jun 6 15:05:52 gse2 xinetd[33237]: Error parsing attribute user - DISABLING
SERVICE [file=/etc/xinetd.d/nrpe] [line=9]
Jun 6 15:05:52 gse2 xinetd[33237]: **Unknown group: nagios**<br>[file=/etc/xinetd.d/nrpe] [line=10]
Jun 6 15:05:52 gse2 xinetd[33237]: Error parsing attribute group - DISABLING
SERVICE [file=/etc/xinetd.d/nrpe] [line=10]
Jun 6 15:05:52 gse2 xinetd[33237]: Service nrpe missing attribute user - DISABLING
Was showing in the /var/log messages.
It escaped me at first, but then I did a check on ypbind service and found it was not started.
After starting ypbind, nagios user and group was authenticating properly, the error went away.
some edge cases restarting nagios-nrpe-server doesn't help, due to the fact that process was not killed or it was not properly restarted.
just kill it manually then, and start.
SSL handshake error msg.Beside the allow_host you should assign.
your nagios server is in a local lan with C type ip address such as 192.168.xxxx
when the target monitored server feedback the ssl msg to your local nagios server,the message should first comes to your public IP of your line,the message cannot across the public IP into your nagios server which ip is an internal one.
you need NAT to guide the SSL message from target server to inner nagios server.
Or you better use "GET" method which just get monitor message from the nagios client side,such as SNMP to fulfill the remote monitor of local resource of linux servers.
SSL need feedback in double direction.
Best Regards
For me setting the following in /etc/nagios/nrpe.cfg on Client worked:
dont_blame_nrpe=1
It's and ubuntu 16.04 machine.
For other possible problems, I recommend looking at nrpe logs. Here is good article for configuring logs.
If you are running Debian 9 then there is a known issue regarding this problem, caused by OpenSSL dropping support for the method NRPE uses to initiate anonymous SSL connections.
The issue seems to be fixed but the fix hasn't made it into the official packages, yet.
Currently there seems to be no secure work-around.
check configuration in /etc/xinetd.d/nrpe and verify the server IP. If it is showing only_from = 127.0.0.1 change it with Server IP .