Access Cassandra Via Network Load Balancer(AWS) in different AWS account - cassandra

I have installed cassandra(three nodes) on kubernetes in aws account.
I want to open the same cassandra via aws endpoint(through NLB) to different aws account so that i can access this cassandra for read/write purpose.
I am using spark(in different AWS account) to load data in cassandra but i am getting this WARN while loading the data.
WARN ChannelPool: [s0|/10.0.246.140:32034] Error while opening new channel (ConnectionInitException: [s0|connecting...] Protocol initialization request, step 1 (STARTUP {CQL_VERSION=3.0.0, DRIVER_NAME=DataStax Java driver for Apache Cassandra(R), DRIVER_VERSION=4.7.2, CLIENT_ID=b52c9022-561a-48d3-bd98-893c6c17f0c3, APPLICATION_NAME=Spark-Cassandra-Connector-application_1606197155514_0510}): failed to send request (java.nio.channels.NotYetConnectedException))
Has anybody open Cassandra via NLB , do i need to make separate routes for each node in cassandra in NLB? if yes, how to do that?

You need to define a K8s service and expose it through an Ingress controller such as Traefik so clients (such as your Spark app) can connect to your Cassandra cluster from outside the Kubernetes cluster.
If you're using the DataStax Cassandra Operator (cass-operator), it makes it a lot easier since it comes pre-configured with a service that you can use. See the Ingress examples we have included in Connecting to Cassandra from outside the Kubernetes cluster.
If you weren't already aware, have a look at open-source K8ssandra. It is a ready-made platform for running Apache Cassandra in Kubernetes using the DataStax Cassandra Operator under the hood but with all the tools bundled together:
Reaper for automated repairs
Medusa for backups and restores
Metrics Collector for monitoring with Prometheus + Grafana
Traefik templates for k8s cluster ingress
Since all these components are open-source, they are all free to use and don't require a licence or paid subscription but still comes with a robust community support. Cheers!

Related

Highly available, redundant Redis-cluster over kubernetes

The objective is to create a highly available redis cluster using kubernetes for a nodeJS client. I have already created the architecture as below:
Created a Kubernetes cluster of Kmaster with 3 nodes (slaves).
Then I created statefulsets and persistent volumes (6 - one for each POD).
Then created Redis pods 2 on each node (3 Master, 3 replicas of respective master).
I need to understand the role of Redis Sentinel hereafter, how does it manage the monitoring, scaling, HA for the redis-cluster PODs across the nodes. I understand Sentinel should be on each node and doing its job but what should be the right architecture here?
P.S. I have created a local setup for now, but ultimately this goes on Azure so any suggestions w.r.to az is also welcome.
Thanks!
From an Azure perspective, you have two options and if you are very specific to option two but are looking for the Sentinel architecture piece, there is business continuity and high availability options in both IaaS (Linux VM scale sets) and PaaS services that go beyond the Sentinel component.
Azure Cache for Redis (PaaS) where you choose & deploy your desired service tier (Premium Tier required for HA) and connect your client applications. Please see: Azure Cache for Redis FAQ and Caching Best Practice.
The second option is to deploy a solution (as you have detailed) as an IaaS solution built from Azure VMs. There are a number of Redis Linux VM images to choose from the Azure Marketplace or there is the option to create a Linux VM OS image from your on-premise solution and migrate that to Azure. The Sentinel component is enabled on each server (master, slavea, and slaveb, ...). There are networking and other considerations too. For building a system from scratch, please see: How to Setup Redis Replication (with Cluster-Mode Disabled) in CentOS 8 – Part 1 and How to Setup Redis For High Availability with Sentinel in CentOS 8 – Part 2

Is it possible to deploy specific Java dependencies to nodes on Hazelcast Cloud?

I'm submitting Java runnables to hazelcast executor service, the runnables need access to specific business objects and also to hazelcast client.
Is there any way to deploy a hazelcast cluster on hazelcast cloud with specific dependencies?
#newlogic, according to Hazelcast Cloud Documentation, User Code Deployment feature is enabled. All you need to do is to enable it on the client as well & configure it, as documented here: https://docs.hazelcast.org/docs/latest/manual/html-single/index.html#client-user-code-deployment-beta

presto + how to manage presto servers stop/start/status action

we installed the follwing presto cluster on Linux redhat 7.2 version
presto latest version - 0.216
1 presto coordinator
231 presto workers
on each worker machine we can use the follwing command in order to verify the status
/app/presto/presto-server-0.216/bin/launcher status
Running as 61824
and also stop/start as the follwing
/app/presto/presto-server-0.216/bin/launcher stop
/app/presto/presto-server-0.216/bin/launcher start
I also searches in google about UI that can manage the presto status/stop/start
but not seen any thing about this
its very strange that presto not comes with some user interface that can show the cluster status and do stop/start action if we need to do so
as all know the only user interface of presto is show status and not have the actions as stop/start
in the above example screen we can see that the active presto worker are only 5 from 231 , but this UI not support stop/start actions and not show on which worker presto isn't active
so what we can do about it?
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
example what we are expecting from the UI , - partial list
.
.
.
Presto currently uses discovery service where workers announce themselves to join the cluster, so if a worker node is not registered there is no way for coordinator or discovery server to know about its presence and/or restart it.
At Qubole, we use an external service alongside presto master that tracks nodes which do not register with discovery service within a certain interval. This service is responsible for removing such nodes from the cluster.
One more thing we do is use monit service on each of presto worker nodes, which ensures that presto server is restarted whenever it goes down.
You may have to do something similar for cluster management , as presto does not provide it right now.
In my opinion and experience managing prestosql cluster, it matters of service discovery in architecture patterns.
So far, it uses following patterns in the open source release of prestodb/prestosql:
server-side service discovery - it means a client app like presto cli or any app uses presto sdk just need to reach a coordinator w/o awareness of worker nodes.
service registry - a place to keep tracking available instances.
self-registration - A service instance is responsible for registering itself with the service registry. This is the key part that it forces several behaviors:
Service instances must be registered with the service registry on startup and unregistered on shutdown
Service instances that crash must be unregistered from the service registry
Service instances that are running but incapable of handling requests must be unregistered from the service registry
So it keeps the life-cycle management of each presto worker to each instance itself.
so what we can do about it?
It provides some observability from presto cluster itself like HTTP API /v1/node and /v1/service/presto to see instance status. Personally I recommend using another cluster manager like k8s or nomad to manage presto cluster members.
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
No opinion on good/bad. Take k8s for example, you can manage all presto workers as one k8s deployment and manage each presto worker in one pod. It can use Liveness, Readiness and Startup Probes to automate the instance lifecycle with a few YAML code. E.g., the design of livenessProbe of helm chart stable/presto. And cluster manageer like k8s does provide web UI so that you can touch resources to act like an admin. . Or you can choose to write more Java code to extend Presto.

On what nodes should Kafka Connect distributed be deployed on Azure Kafka for HD Insight?

We are running a lot of connectors on premise and we need to go to Azure. These on premise machines are running Kafka Connect API on 4 nodes. We deploy this API executing this on all these machines:
export CLASSPATH=/path/to/connectors-jars
/usr/hdp/current/kafka-broker/bin/connect-distributed.sh distributed.properties
We have Kafka deployed on Azure Kafka for HD Insight. We need at least 2 nodes running the distributed Connect API and we don't know where to deploy them:
On head nodes (which we still don't know what they are for)
On worker nodes (where kafka brokers live)
On edge nodes
We also have Azure AKS running containers. Should we deploy the distributed Connect API on AKS?
where kafka brokers live
Ideally, no. Connect uses lots of memory when batching lots of records. That memory is better left to the page cache for the broker.
On edge nodes
Probably not. That is where you users are interacting with your cluster. You wouldn't want them poking at your configurations or accidentally messing up the processes in other ways. For example, we had someone fill-up an edge-nodes local disk because they were copying large volumes of data in and out of the "edge".
On head nodes
Maybe? But then again, those are only for cluster admin services, and probably have little memory.
Better solution - run dedicated instances outside of HD Insights in Azure that are only running Kafka Connect. Perhaps running them as containers in Kubernetes because they are completely stateless services and only need access to your sources. sinks, and Kafka brokers for transferring data. This way, they can be upgraded and configured separately from what Hortonworks and HDInsights provides.

What's the recommended ElasticSearch deployment on Windows Azure?

Bearing in mind that the ElasticSearch-Zookeeper plugin doesn't support v0.90 release.
With unicast, what's your strategy on updating your list of IPs? I.e. when upgrading/scaling-up/down.
What client-side connectivity (from web/worker role) to the cluster? Do you:
a) implement your own round-robin/failover implementation across all nodes in the cluster
b) spin up a local (non-data/non-master) elasticsearch process on the client machine that joins the unicast cluster. The application will only connect to localhost
Where do you store your data? Azure blob gateway?
Can you share your detailed story on your ElasticSearch experience on azure, and any particular points/issues to watch out for?
Cheers
Just a note about this. We are on the way on releasing Azure plugin for Elasticsearch. It will help to allow automatic discovery of your Elasticsearch nodes. I think that we will have something public in the next weeks.
Also, I recommend to use local storage. Azure blob will be used in the future to allow snapshots (and restore) feature when Elasticsearch 1.0 will be out.
Hope this helps
Update: Plugin is now available here: https://github.com/elasticsearch/elasticsearch-cloud-azure
It's in no way certified by ElasticSearch, but I wrote a blog post about my experience running ES on Azure: http://thomasardal.com/running-elasticsearch-in-a-cluster-on-azure/

Resources