Icinga2 Cluster HA - Service stalled in "pending" - linux

I am using Icinga2-2.3.2 cluster HA setup with three nodes in the same zone and database in a seperate server for idodb. All are Cent OS 6.5. Installed IcingaWeb2 in the active master.
Configured four local checks for each node including cluster health check as described in the documentation. Installed Icinga Classi UI in all three nodes, beacuse I am not able to see the local checks configured for nodes in Icinga Web2.
Configs are syncing, checks are executing & all three nodes are connected among them. But the status data are not syncing sometime for Icinga Classic UI.
Whenever the config changes in the master and reload, the config is syncing. After sometime when I checked all three nodes classic ui, some no of hosts & services are stalled in "pending" state in one or two nodes with different nos.
But all are ok in config master classic ui and even in the Icinga Web2 everything is ok. Above is one sceanrio, sometimes the local checks are also stalled in pending state.
I have attached the screenshot for reference.
Please help me to fix and Thanks in advance.

I was facing issue "service status Pending reload" in nagios 3, just adding contact group e.g. Linux support (email group) under Contact groups for service notification resolved my issue.

Related

How to setup stomp_interface for failover node for cassandra Opscenter

How to setup a fail-over node for Cassandra Opscenter. The Opscenter data is stored on Opscenter node itself. So to setup a failover node i need to setup an Opscenter different from current Opscenter and sync Opscenter data and config files between Opscenters.
The stomp_interface on nodes in the cluster are pointed towards Opscenter_1 how will it change automatically to Opscenter_2 when failover occurs??
There are steps on the datastax documentation that have details for doing this. At a minimum:
Mirror the configuration directories stored on the OpsCenter primary to the OpsCenter backup using the method you prefer.
On the backup OpsCenter in the failover directory, create a primary_opscenter_location configuration file that indicates the IP address of the primary OpsCenter daemon to monitor
The stomp_interface setting on the agents gets changed (address.yaml file updated as well) when failover occurs. This is why the documentations recommend making sure there is no 3rd party configuration management on it.
3 things :
If you have firewall on, allow the corresponding ports to communicate (61620,61621,9160,9042,7199)
always verify IF the cassandra is up and running, so agent can actually connect to something.
stop the agent, check again the address.yaml, restart the agent.

How to setup Jenkins with HA?

Currently we are using a Jenkins as our CI system and there is one master server and slaves which are provisioned by Saltstack on Openstack. If our Jenkins master server goes down, we need to create a new master and we need to pull the files from the old master & put it in new ones but it's gonna take at least 30mins.
Is there any way to setup Jenkins with High Availability?
I already check with Gearman Plugin, however if the Gearman server goes down for some reason, we need to setup a HA for Gearman also.
Is there any other ways to setup a High Availability for Jenkins?
Jenkins doesn't have a great HA story; the best you can do with the open source version is to put all of the files in $JENKINS_HOME on a shared file system, and then have a cold standby master machine that you can spin up if the active master goes down. That would reduce your failover time to however long it takes for the master to restart, which is usually just a few minutes.
You could also look at CloudBees' Jenkins Enterprise offering, which includes a High Availability Plugin.
I use cluster from scratch doc to create a Jenkins WAN-HA active/passive cluster. See the attached Architecture Diagram for Jenkins HA using pacemaker .
/etc/init.d/jenkins will need to be converted to be an ocf agent script. Currently I manually start up Jenkins via systemd on pcmk-2 server when pcmk-1 is down.

DataStax commnity AMI installation doesn't join other nodes

I kicked off a 6 node cluster as per the documentation on http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installAMILaunch.html. All worked ok. It's meant to be a 6 node cluster - I can see the 6 nodes working on EC dashborad. I can see OpsWork working on node 0. But the nodes are not seeing each other... I dont have access to OpsWork via browser but I can ssh to each node and verify cassandra is working.
What do I need to do so that they join the cluster. Note they all in the same VPC, same subnet in the same IP range with the same cluster name. All launched using the AMI specified in the document.
Any help will be much appreciated.
Hope your listen address is configured. Add the "auto_bootstrap": false attribute to each node, and restart each node. Check the logs too. That would be of great help.
In my situation, turning on broadcast address to public-ip caused a similar issue. Make the broadcast address your private-ip, or just leave it untouched. If broadcasts address is a must have, have your architect modify the firewall rules.

Icinga2 cluster node local checks not executing

I am using Icinga2-2.3.2 cluster HA setup with three nodes in the same zone and database in a seperate server for idodb. All are Cent OS 6.5. Installed IcingaWeb2 in the active master.
Configured four local checks for each node including cluster health check as described in the documentation. Installed Icinga Classi UI in all three nodes, beacuse I am not able to see the local checks configured for nodes in Icinga Web2.
Configs are syncing, checks are executing & all three nodes are connected among them. But the configured local checks, specific to the node alone are not happening properly and verified it in the classic ui.
a. All local checks are executed only one time whenever
- one of the node is disconnected or reconnected
- configuration changes done in the master and reload icinga2
b. But after that, only one check is hapenning properly in one node and the remaining are not.
I have attached the screenshot of all node classic ui.
Please help me to fix and Thanks in advance.

I could not submit a job to the executing node in condor apart from the central manager

I have a condor pool which consist of 4 dedicated machine one is set as a centeral manager, submitting, and executing node while the other three is set to be executing nodes I used CentOS 5.4 as an OS for all the machines. My problem is when I submitted a job from the central manager it works just on the central manager so when I specify in the JDL file that the job should run in any machine apart from the central manager the job stay in hold and does not run. When I type condor_status all nodes appear. I keep the daemon MASTER, STARTD in the daemon list for the executing nodes. Does any one come across this problem?
There's not enough information to answer your question, but the first thing to do is to run condor_q -analyze <jobid> and see what it tells you. See the Condor manual Section 2.6.5: Why is the job not running?
One possible cause is that you're not telling Condor to transfer your input/output files for you, and your nodes have different "filesystem domains", so Condor is unable to find a host which shares a common filesystem with your submit host.

Resources