ArangoDB API access via Mesos cluster on AWS - arangodb

I have an ArangoDB cluster "framwork" running on a Mesos cluster and running on AWS. I am extremely new to Mesos.
I can access Mesos and I can access ArangoDB through the endpoint that Mesos provides:
http://mymesoscluster/service/arangodb3
I have a Java service that is not running in the Mesos cluster that I would like to use ArangoDB. What I cannot find is how to reference ArangoDB from the API perspective.
For example, the java driver code performs a login that needs a host:
#Bean
public ArangoDB arangoDB() {
ArangoDB arangoDB = new ArangoDB.Builder()
.host("????", 8529)
.user("root").password( "somepassword").build();
return arangoDB;
}
I see some Mesos docs about API access with a token
Authentication HTTP API Endpoint
. I do not think that that will get me past the ArangoDB driver login.
Perhaps it is not possible with the java driver?

You need to make the service available from outside.
https://docs.mesosphere.com/1.11/deploying-services/expose-service/
https://docs.mesosphere.com/1.8/usage/service-discovery/marathon-lb/marathon-lb-advanced-tutorial/
If you want to access arangodb from inside, please be aware of authentication, SSL?.
Please check back with all served services in the cluster
master:5050/v1/axfr

Related

How to retrieve kubernetes cluster data with googleapis in node?

After researching the topic I found this documentation for retiring k8 clusters from GCP. However, I could not find any code examples of utilizing those API's and when I import google from googleapis I can't find the function in it that would be used for that purpose. For example, to get SQL data there is sqladmin, but nothing for retrieving k8 data. So what property of google do I need?
This is confusing.
There are 2 distinct APIs that you must use:
Google's Kubernetes Engine API (see link). This API is used to create, read, update and delete Kubernetes clusters. Google provides a Node.js SDK documented here.
The standard, generic Kubernetes API (see link). This API is used to create, read, update and delete resources on (any) Kubernetes client and, for obvious(ly good) reasons, it is the API that you must use to interact with (an existing) Kubernetes Engine cluster too (because these are just like every other Kubernetes cluster). Kubernetes provides official and community-supported libraries that implement the Kubernetes API. You'll need to pick one of the community-supported libraries for Node.js (as there's no official library).
The general process is to:
Use the Kubernetes Engine API to e.g. projects.locations.cluster.get details of an existing GKE cluster
Use the returned Cluster object to build a configuration object (the equivalent of building a context object in a kubeconfig file)go
Use the context object with the Kubernetes API library to authenticate to the cluster and program it e.g. list Deployments, create Services etc.
go-- I have code for this step but it's written in Golang not JavaScript.
The kubectl is the better tool to talk to a Kubernetes cluster.
But, you can retrieve some data from GKE. See below:
GCLOUD_TOKEN=$(gcloud auth print-access-token)
curl -sS -X GET -H "Authorization: Bearer ${GCLOUD_TOKEN}" "https://container.googleapis.com/v1beta1/projects/some-project-id/locations/us-east4-a/clusters"
This way you hit the API REST endpoint authenticated. Because you obtained a token from gcloud auth print-access-token on GCLOUD_TOKEN variable wherein you'll fill the Authorization header.
So you get a JSON reply as desired.
The documentation you linked has more details itself.

Access Cassandra Via Network Load Balancer(AWS) in different AWS account

I have installed cassandra(three nodes) on kubernetes in aws account.
I want to open the same cassandra via aws endpoint(through NLB) to different aws account so that i can access this cassandra for read/write purpose.
I am using spark(in different AWS account) to load data in cassandra but i am getting this WARN while loading the data.
WARN ChannelPool: [s0|/10.0.246.140:32034] Error while opening new channel (ConnectionInitException: [s0|connecting...] Protocol initialization request, step 1 (STARTUP {CQL_VERSION=3.0.0, DRIVER_NAME=DataStax Java driver for Apache Cassandra(R), DRIVER_VERSION=4.7.2, CLIENT_ID=b52c9022-561a-48d3-bd98-893c6c17f0c3, APPLICATION_NAME=Spark-Cassandra-Connector-application_1606197155514_0510}): failed to send request (java.nio.channels.NotYetConnectedException))
Has anybody open Cassandra via NLB , do i need to make separate routes for each node in cassandra in NLB? if yes, how to do that?
You need to define a K8s service and expose it through an Ingress controller such as Traefik so clients (such as your Spark app) can connect to your Cassandra cluster from outside the Kubernetes cluster.
If you're using the DataStax Cassandra Operator (cass-operator), it makes it a lot easier since it comes pre-configured with a service that you can use. See the Ingress examples we have included in Connecting to Cassandra from outside the Kubernetes cluster.
If you weren't already aware, have a look at open-source K8ssandra. It is a ready-made platform for running Apache Cassandra in Kubernetes using the DataStax Cassandra Operator under the hood but with all the tools bundled together:
Reaper for automated repairs
Medusa for backups and restores
Metrics Collector for monitoring with Prometheus + Grafana
Traefik templates for k8s cluster ingress
Since all these components are open-source, they are all free to use and don't require a licence or paid subscription but still comes with a robust community support. Cheers!

Is it possible to deploy specific Java dependencies to nodes on Hazelcast Cloud?

I'm submitting Java runnables to hazelcast executor service, the runnables need access to specific business objects and also to hazelcast client.
Is there any way to deploy a hazelcast cluster on hazelcast cloud with specific dependencies?
#newlogic, according to Hazelcast Cloud Documentation, User Code Deployment feature is enabled. All you need to do is to enable it on the client as well & configure it, as documented here: https://docs.hazelcast.org/docs/latest/manual/html-single/index.html#client-user-code-deployment-beta

presto + how to manage presto servers stop/start/status action

we installed the follwing presto cluster on Linux redhat 7.2 version
presto latest version - 0.216
1 presto coordinator
231 presto workers
on each worker machine we can use the follwing command in order to verify the status
/app/presto/presto-server-0.216/bin/launcher status
Running as 61824
and also stop/start as the follwing
/app/presto/presto-server-0.216/bin/launcher stop
/app/presto/presto-server-0.216/bin/launcher start
I also searches in google about UI that can manage the presto status/stop/start
but not seen any thing about this
its very strange that presto not comes with some user interface that can show the cluster status and do stop/start action if we need to do so
as all know the only user interface of presto is show status and not have the actions as stop/start
in the above example screen we can see that the active presto worker are only 5 from 231 , but this UI not support stop/start actions and not show on which worker presto isn't active
so what we can do about it?
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
example what we are expecting from the UI , - partial list
.
.
.
Presto currently uses discovery service where workers announce themselves to join the cluster, so if a worker node is not registered there is no way for coordinator or discovery server to know about its presence and/or restart it.
At Qubole, we use an external service alongside presto master that tracks nodes which do not register with discovery service within a certain interval. This service is responsible for removing such nodes from the cluster.
One more thing we do is use monit service on each of presto worker nodes, which ensures that presto server is restarted whenever it goes down.
You may have to do something similar for cluster management , as presto does not provide it right now.
In my opinion and experience managing prestosql cluster, it matters of service discovery in architecture patterns.
So far, it uses following patterns in the open source release of prestodb/prestosql:
server-side service discovery - it means a client app like presto cli or any app uses presto sdk just need to reach a coordinator w/o awareness of worker nodes.
service registry - a place to keep tracking available instances.
self-registration - A service instance is responsible for registering itself with the service registry. This is the key part that it forces several behaviors:
Service instances must be registered with the service registry on startup and unregistered on shutdown
Service instances that crash must be unregistered from the service registry
Service instances that are running but incapable of handling requests must be unregistered from the service registry
So it keeps the life-cycle management of each presto worker to each instance itself.
so what we can do about it?
It provides some observability from presto cluster itself like HTTP API /v1/node and /v1/service/presto to see instance status. Personally I recommend using another cluster manager like k8s or nomad to manage presto cluster members.
its very bad idea to access each worker machine and see if presto is up or down
why presto not have centralized UI that can do stop/start action ?
No opinion on good/bad. Take k8s for example, you can manage all presto workers as one k8s deployment and manage each presto worker in one pod. It can use Liveness, Readiness and Startup Probes to automate the instance lifecycle with a few YAML code. E.g., the design of livenessProbe of helm chart stable/presto. And cluster manageer like k8s does provide web UI so that you can touch resources to act like an admin. . Or you can choose to write more Java code to extend Presto.

About Livy session for Jupyterhub on AWS EMR Spark

My customer has a AD connector configured on Jupyterhub installed on AWS EMR so that different users will be authenticated on jupyterhub via AD. The current understanding is when different users submit their spark job through Jupyter notebook on Jupyterhub to the shared underlying EMR spark engine, the spark job will be submitted via Livy to spark engine. Each Livy session will has a related spark session mapped to it(that is my current understanding and correct me if I am wrong)
The question is whether different Jupyterhub user will share the same Livy session (then different spark session) or different Livy session (then different spark session)?
The only limited material I can find is:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub.html
see this arch pic here
Thanks very much in advance!
As far as I know (tested on an HDP distribution) by default the Livy server will create a different Spark driver and so a different sessions for each user. The server is reachable through a kerberized HTTP interface, so the user has to come with a valid ticket and the corresponding session will be run under his name. It seems to be the way to go since, in this case, the user will have access to his own resources (data, YARN queue and so on). In this case, the livy server impersonates the user, it runs a Spark job as if it were the user (see Granting Livy the Ability to Impersonate.
By checking in the doc I've seen that you can configure exactly in the same way the Livy server in EMR.
By default, YARN jobs submitted this way run as user livy, regardless of the user who initiated the job. By setting up user impersonation you can have the user ID of the notebook user also be the user associated with the YARN job. Rather than having jobs initiated by both shirley and diego associated with the user livy, jobs that each user initiates are associated with shirley and diego respectively. This helps you to audit Jupyter usage and manage applications within your organization.
So you have the choice to use impersonation (run as distinct users) or not (run as a single livy user).

Resources