DataStax Opscenter do not sees remote agents (windows) - cassandra

I am new to Cassandra.
I'm trying to deploy a test environment.
Win server 2012 (192.168.128.71) -> seed node
Win server 2008 (192.168.128.70) -> simple node
Win server 2008 (192.168.128.69) -> simple node
On all nodes, I installed the same version Cassandra (2.0.9 from Datastax).
Disabled windows firewall.
The cluster Ring formed. But on each node I see
Test Cluster (Cassandra 2.0.9) 1 of 3 agents connected
Node does not see the Remote Agent. On each PC, the agent service is running.
In file datastax_opscenter_agent-stderr, I see the following line
log4j:ERROR Could not read configuration file [log4j.properties].
log4j:ERROR Ignoring configuration file [log4j.properties].
Please tell me the possible cause how can I diagnose.
Thanks in advance!

The problem is that you have OpsCenter server running on all machines in the cluster. Agents connect to the local OpsCenter servers, so when you open the UI for one of them, you only see one agent connected.
To fix this, stop the server processes (DataStax_OpsCenter_Community) on all machines except for one, and add stomp_interface: <server-ip> to the address.yaml for the agents on all machines, then restart the agents.

Related

Adding workstation nodes in HPC Pack 2016

I am using Microsoft HPC Pack 2016 update 2 on a local network and on-premise cluster. We have employed topology 5 (all nodes on the enterprise network). Head node is successfully setup and running. The problem is that after manual installation of HPC Pack 2016 update 2 on different Windows 10 workstations which are all on the same local network, some cannot be found and added to the cluster using the HPC Cluster Manager. I can’t see them on the HPC Cluster Manager running on the head node, neither through “resource management > nodes”, nor using the wizard to add node. While the same steps to install and add node work for some of the workstations, it does not work on some others. Is there any way to track down to find the cause?
In my case the problem was due to trust relationship. This can be verified using nltest /trusted_domains command. Resetting the trust relationship fixed the problem.

How to setup stomp_interface for failover node for cassandra Opscenter

How to setup a fail-over node for Cassandra Opscenter. The Opscenter data is stored on Opscenter node itself. So to setup a failover node i need to setup an Opscenter different from current Opscenter and sync Opscenter data and config files between Opscenters.
The stomp_interface on nodes in the cluster are pointed towards Opscenter_1 how will it change automatically to Opscenter_2 when failover occurs??
There are steps on the datastax documentation that have details for doing this. At a minimum:
Mirror the configuration directories stored on the OpsCenter primary to the OpsCenter backup using the method you prefer.
On the backup OpsCenter in the failover directory, create a primary_opscenter_location configuration file that indicates the IP address of the primary OpsCenter daemon to monitor
The stomp_interface setting on the agents gets changed (address.yaml file updated as well) when failover occurs. This is why the documentations recommend making sure there is no 3rd party configuration management on it.
3 things :
If you have firewall on, allow the corresponding ports to communicate (61620,61621,9160,9042,7199)
always verify IF the cassandra is up and running, so agent can actually connect to something.
stop the agent, check again the address.yaml, restart the agent.

memsql aggregator doesn't start on CentOS 6.7

We're currently evaluating memsql and have two setups. One is running on CentOS 6.7, one on CentOS 7.1.
While using CentOS 7.1, after a system reboot the master has all services started, but the CentOS 6.7 variant does not and shows that the aggregator is offline. We had to run memsql-ops cluster-start found in MemSql leaf down on Single server Cluster. We're wondering if this is related to the init.d/systemctl diffs on the machines. Any reply appreciated!
Cheers,
µatthias
currently Ops only sets up a Sys-V style init script in /etc/init.d when it is installed by root. However, once Ops starts up correctly it should immediately check whether or not MemSQL is running. If it is not running but it should be Ops will start the cluster automatically. Can you confirm that you didn't run memsql-ops cluster-stop before shutting down the cluster? If you do that, when Ops comes back up it will not start the MemSQL cluster.

Icinga2 cluster node local checks not executing

I am using Icinga2-2.3.2 cluster HA setup with three nodes in the same zone and database in a seperate server for idodb. All are Cent OS 6.5. Installed IcingaWeb2 in the active master.
Configured four local checks for each node including cluster health check as described in the documentation. Installed Icinga Classi UI in all three nodes, beacuse I am not able to see the local checks configured for nodes in Icinga Web2.
Configs are syncing, checks are executing & all three nodes are connected among them. But the configured local checks, specific to the node alone are not happening properly and verified it in the classic ui.
a. All local checks are executed only one time whenever
- one of the node is disconnected or reconnected
- configuration changes done in the master and reload icinga2
b. But after that, only one check is hapenning properly in one node and the remaining are not.
I have attached the screenshot of all node classic ui.
Please help me to fix and Thanks in advance.

Unable to add compute nodes to HPC Cluster

I am trying to setup HPC cluster environment with Azure VMs as head node and compute nodes.
Head node is working properly. However, when I try to add compute nodes from HPC Cluster Manager of head node, compute nodes don't show up. If I try to open the HPC Cluster Manager from compute nodes, it asks for the head node and when I provide the name of the head node, it fails with the below error.
"Failed to communicate with remote SDM store. Connection Failed. A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.195.15.194:9893"
All the VM have Windows Server 2012 R2 and they are in the same VNet/Domain.
Any pointers to resolve this issue?
The following 2 templates should help you out. Basically a step by step wizard experience.
https://azure.microsoft.com/en-us/documentation/templates/create-hpc-cluster-custom-image/
https://azure.microsoft.com/en-us/documentation/templates/create-hpc-cluster/

Resources