1.we have requirement to form cluster using custom strategy .
we have extended DiscoveryStrategyFactory and reading ip address from windows host file.
2.it is working currently on local but cluster not forming when same program is run on another server.
console and hazelcast xml
Building a Thingsboard cluster
I need help setting up a Thingsboard cluster, the documentation online is very limited.
The cluster will contain 2 Zookeeper nodes and 4 Thingsboard nodes with Cassandra DB.
Should Zookeeper be installed separately?
A step-by-step guide would be much appreciated!
I cannot provide you detailed step-by-step instructions to setup a ThingsBoard cluster. I can point you into the right direction by sharing the different documents you need to do so.
Bottom line, the following tasks must be completed:
Install and configure a ZooKeeper ensemble.
Check the ZooKeeper documentation for further installation details. Keep in mind that you need at least three different ZK-nodes in a clustered environment and that you always need an odd number of ZK nodes (3,5,7,...). It is a very very very bad idea to build a cluster consisting out of two ZK-nodes, check split brain condition that might appear under these circumstances! Basically you setup the number of individual nodes you wish to use and change the configuration file to enable the different nodes as an ensemble. This is documented quite well in the ZK-docs.
Install and configure a Cassandra cluster.
Again you will setup the number of individual nodes you need for your Cassandra cluster and modify the individual configuration files to convert them into a Cassandra cluster. Check Cassandra documentation for details. Be sure to check proper configuration using the nodetool status command as described at the end of the document. All your nodes should be up and running.
Install and configure a ThingsBoard cluster.
Use the instructions provided with ThingsBoard single node setup.
Install Java
Skip External database installation
ThingsBoard service installation
Configure ThingsBoard to use the external database - Cassandra
Go to Cluster setup and apply the configuration steps depicted (ZK, Cassandra and RPC). Keep in mind to point to ALL members of your ZK, Cassandra cluster. You can also use IP-addresses instead of host names.
Return to single node setup and run the installation script at ONE NODE only!
Start ThingsBoard service
If everything went well, you should be able to access your ThingsBoard nodes directly using the URL http://[NODE_IP]:8080. You can verify proper cluster operation by creating a tenant on one node and check its presence on another node.
I don't know if using an even number of ThingsBoard nodes is a good idea. The documentation does not mention anything about this.
One final remark, you could/should consider putting a proxy in front of your ThingsBoard cluster to provide load balancing to your web clients and improve user experience. This way you shouldn't share the individual host addresses with your users and you will prevent node overloading due to the fact that everybody is using the same web-address to access your dashboard(s). You could also proxy your MQTT broker to provide load balancing as well.
Good luck in setting up your cluster!
Zookeeper needs at least 3 nodes to run in a cluster mode. Each node voting and the valid replica count to gain the QUORUM is 3.
I am new to cassandra database , i have configured multi-node cassandra , configured like one node per machine
I want to access the cassandra db using one name ( like Scan name in Oracle RAC) from client applications is it possible to do it in that manner,
Our developers are used to the Oracle RAC scan name they are expecting something similer thing in multi node cassandra as well
At the moment I can access the induvidual nodes using seperate IP's which i have assigned
Can any one help me on this
What I think your describing is service discovery. You should have a look at Consul (https://www.consul.io/) to define the contact points for your application to connect to Cassandra.
Essentially, you can use Consul to manage a single DNS entry (https://www.consul.io/docs/agent/dns.html) for your applications to use without having to hardcode an IP etc.
I am trying to run a PySpark job on a Mesosphere cluster but I cannot seem to get it to run. I understand that Mesos does not support cluster deploy mode for PySpark applications and that it needs to be run in client mode. I believe this is where the problem lies.
When I try submitting a PySpark job I am getting the output below.
... socket.hpp:107] Shutdown failed on fd=48: Transport endpoint is not connected [107]
I believe that a spark job running in client mode needs to connect to the nodes directly and this is being blocked?
What configuration would I need to change to be able to run a PySpark job in client mode?
When running PySpark in client mode (meaning the driver is running where you invoke Python) the driver becomes the Mesos Framework. When this happens, the host the framework is running on needs to be able to connect to all nodes in the cluster, and they need to be able to connect back, meaning no NAT.
If this is indeed the cause of your problems, there are two environment variables that might be useful. If you can get a VPN in place, you can set LIBPROCESS_IP and SPARK_LOCAL_IP both to the IP of the host machine that cluster nodes can use to connect back to the driver.
Obviously I need to start a zookeeper server on one of the cluster machines, then I need other client machines to connect to this server.
The way I did it is that I used ssh to connect to the headnode, I found a zk server running on the port 2181. So, I used ifconfig to get the machine's IP address (for example and i then had my worker nodes connect to:
However, my MR job now completes but it works slowly and the output is not correct. I suspect that I'm doing something wrong with Zookeeper, especially that I didn't follow a tutorial and improvised my steps.
HDInsight has multiple zookeeper servers. Not sure if specifying one might be the cause of the problem you are seeing.
I wrote an example a while back that uses Storm to write to HBase (both servers on the same Azure Virtual Network,) and as part of the configuration, I had to specify the three zookeeper servers for the component that writes to hbase. (https://azure.microsoft.com/en-us/documentation/articles/hdinsight-storm-sensor-data-analysis/ is the article.)
From the cluster head node, you can probably ping zookeeper0, zookeeper1, and zookeeper2 to find the IP address of each.
I have a spark application deployed on the cluster. I want to run the application with some variables passed from another application running on a remote machine. For example I will pass a query string from the application running remotely and I want my spark application to listen to that and process the query and give back the response to the caller.
Is it possible to do with any library or feature provided by spark.
A Spark application is like any other application. An application can take remote commands in a million different ways. Perhaps most common is to make the application an HTTP server. Then it can be remote controlled through a web interface or a REST API.
If you're using Spark through Scala, the Play Framework is a popular option.