How Can we create multiple instances of Ignite? - multithreading

How Can we create multiple instances of Ignite when multiple threads are trying to access same Ignite instance.
Ignite ignite = Ignition.start("conf/example-ignite-config.xml");
Here I have started/created one Ignite instance. But using same xml I have to create multiple instances of same without other thread to wait for it to get free.

Ignite instances are thread safe. You should call Ignition.start once and share it between all threads.
Alternatively, once Ignite has been started, you can obtain Ignite instance with Ignition.ignite() methods.

For this, I created a ClientGroup, from one Ignite started in Client mode, i.e.(Cluster group of nodes started in client mode.)
and when I need a Ignite node (grid) in client mode I just called a ignite() method of ClientGroup which gets instance of grid.

Related

The proper way of defining map loaders from application - hazelcast

I have an application using Hazelcast in the embedded mode. I use MapLoaders and indexes. I want to change my application so I can use an existing external Hazelcast cluster to which I will connect using the client api and here lies my problem. There is no way of defining those map loaders in the client app, so I need to use the server mode. Right now I'm using the JoinConfig so I can join the cluster and define my map loaders, but if I understand correctly, by joining the cluster my app will become a part of the cluster itself (and host some data partitions) and this is something I would like to avoid. So is there another way of connecting to this external cluster so my app doesn't start hosting the cache data, or is my approach correct and I shouldn't mind being a part of the cluster?
If you use Hazelcast Client, it just connects to the cluster, but it does not become part of the cluster. It seems like what you want.
Concerning Map Loaders, please check the related code sample.

Preventing Cassandra Node from Being Overwhelemed

When in Java, I create a Cassandra cluster builder, I provide a list of multiple Cassandra nodes as shown below:
Cluster cluster = Cluster.builder().addContactPoint(host1, host2, host3, host4).build();
But from what I understand, the connector connects only to the first host in the list that is available, and that host becomes my connection point to the Cassandra cluster.
Now, my question is if my Java application reads/writes huge amount of data from/to Cassandra, then doesn't my Java application overwhelm the node that it is connected to?
Is there a way to configure my connection such that it uses multiple nodes of Cassandra for its reads/writes? What is the common practice?
It uses the contact point to find the rest of the nodes in the cluster, then creates a pool of connections to all the hosts and balances the requests among them. It doesn't only connect to the hosts you provide unless you use the whitelist load balancing policy or a custom one.
If your worried about overwhelming nodes use the RoundRobinLoadBalancingPolicy (DC aware if multiple DCs) and it will distribute it amongst all of them evenly. If you have hot spots of data and use the TokenAware policy you may have it uneven, but you shouldn't need to worry about it.

Is there a way to remove a rogue node from our hazelcast cluster?

We are currently running a hazelcast cluster using it to communicate information on a queue to be picked up by a single node in the cluster. We are vulnerable however to a "rogue" node that joins the cluster but without the right version of software to handle the request in a way that's proper.
Is there a way proactively remove rogue nodes of this nature in a way that prevents them from actively re-joining the cluster? I haven't been able to see a way from the documentation.
It looks like you are using default hazelcast xml. You better need to have a custom hazelcast xml with updated Group credentials.

Inter application communication in apache spark streaming

I have two applications : App1 and App2.
On a single cluster I have to spawn 5 instances os App1 and 1 instance of App2.
What would be the best way to send data from the 5 App1 instances to the single App2 instance ?
Right now I am using Kafka to send data from one spark application to the spark application but the setup doesn't seem right and I hope there is a better way to do this.
Apache Ignite might be useful to you.
Apache Ignite provides an implementation of Spark RDD abstraction
which allows to easily share state in memory across multiple Spark
jobs, either within the same application or between different Spark
applications.

How does a Spark Application work?

I am trying to implement a simple Spark SQL Application that takes a query as input and processes the data. But because I need to cache the data and I have to maintain a single SQL Context object. I am not able to understand how I can use same SQL context and keep getting queries from user.
So how does an application work? When an application is submitted to cluster, does it keep running on the cluster or performs a specific task and shuts down immediately after the task?
Spark application has a driver program that starts and configures the Spark Context. Driver program can be inside your application and you can use the same Spark Context throughout the life of your application.
Spark Context is thread safe, so multiple users can use it to run jobs concurrently.
There is an open source project Zeppelin that does just that.

Resources