How to properly connect client application to Scylla or Cassandra? - cassandra

Let's say I have a cluster of 3 nodes for ScyllaDB in my local network (it can be AWS VPC).
I have my Java application running in the same local network.
I am concerned how to properly connect app to DB.
Do I need to specify all 3 IP addresses of DB nodes for the app?
What if over time one or several nodes die and get resurrected on other IPs? Do I have to manually reconfigure application?
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
I would be much grateful for a code sample of how to connect Java app to multi-node cluster.

You need to specify contact points (you can use DNS names instead of IPs) - several nodes (usually 2-3), and driver will connect to one of them, and will discover the all nodes of the cluster after connection (see the driver's documentation). After connection is established, driver keeps the separate control connection opened, and via it receives the information about nodes that are going up & down, joining or leaving the cluster, etc., so it's able to keep information about cluster topology up-to-date.
If you're specifying DNS names instead of the IP addresses, then it's better to specify configuration parameter datastax-java-driver.advanced.resolve-contact-points as true (see docs), so the names will be resolved to IPs on every reconnect, instead of resolving at the start of application.

Alex Ott's answer is correct, but I wanted to add a bit more background so that it doesn't look arbitrary.
The selection of the 2 or 3 nodes to connect to is described at
https://docs.scylladb.com/kb/seed-nodes/
However, going forward, Scylla is looking to move away from differentiating between Seed and non-Seed nodes. So, in future releases, the answer will likely be different. Details on these developments at:
https://www.scylladb.com/2020/09/22/seedless-nosql-getting-rid-of-seed-nodes-in-scylla/

Answering the specific questions:
Do I need to specify all 3 IP addresses of DB nodes for the app?
No. Your app just needs one to work. But it might not be a bad idea to have a few, just in case one is down.
What if over time one or several nodes die and get resurrected on other IPs?
As long as your app doesn't stop, it maintains its own version of gossip. So it will see the new nodes being added and connect to them as it needs to.
Do I have to manually reconfigure application?
If you're specifying IP addresses, yes.
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
By abstracting the need for a specific IP, using something like Consul. If you wanted to, you could easily build a simple restful service to expose an inventory list or even the results of nodetool status.

Related

NodeJS TLS/TCP server in need of an external firewall

Problem:
I have an AWS EC2 instance running FreeBSD. In there, I'm running a NodeJS TLS/TCP server. I'd like to create a set of rules (in my NodeJS application) to be able to individually block IP addresses programmatically based on a few logical conditions.
I'd like to run an external (not on the same machine/instance) firewall or load-balancer, that I can control from NodeJS programmatically, such that when certain conditions are given, I can block a specific remote-address(IP) before it reaches the NodeJS instance.
Things I've tried:
I have initially looked into nginx as an option, running it on a second instance, and placing my NodeJS server behind it, but after skimming through the NGINX
Cookbook
Advanced Recipes for High Performance
Load Balancing I've learned that only the NGINX Plus (the paid version) allows for remote/API control & customization. While I believe that paying $3500/license is not too much (considering all NGINX Plus' features), I simply can not afford to buy it at this point in time; in addition the only feature I'd be using (at this point) would be the remote API control and the IP address blocking.
My second thought was to go with the AWS/ELB (elastic-load-balancer) by integrating AWS' SDK into my project. That sounded feasible, unfortunately, after reading a few forum threads and part of their documentation (unless I'm mistaken) it seems these two features I need are not available on the AWS/ELB. AWS seems to offer an entire different service called WAF that I honestly don't understand very well (both as a service and from a feature-stand-point).
I have also (briefly) looked into CloudFlare, as it was recommended in one of the posts, here on Sackoverflow, though I can't really tell if their firewall would allow this level of (remote) control.
Question:
What are my options? What would you guys recommend I did?
I think Nginx provide such kind of functionality please refer to link
If you want to block an IP with Node TCP you can just edit a nginx config file and deny IP address.
Frankly speaking, If I were you, I would use AWS WAF but if you don’t want to use it, you can simply use Node JS
In Node JS You should have a global array variable where you will store all blocked IP addresses and upon connection, you will check whether connected host IP is in blocked IP variable. However there occurs a problem when machine or application is restarted, you will lose all information about blocked IP-s. So as a solution to that you can just setup Redis (It is key-value database but there are also other datatypes) DB and store blocked IP-s there. Inasmuch as Redis DB is in RAM all interaction with DB will be instantly and as long as machine or node is restarted, Redis makes a backup on hard drive and it syncs from it and continue to work in RAM with old databases.

Puppet: Is it possible to set a master/client architecture without knowing my clients addressess?

Situation: I have to prepare to manage a large amount of remote servers.
Problem: They are going to be in different private networks. So I won't be able to reach them from outside, but they can easily reach my master node.
Is it sufficient that my client nodes know how to reach my master node in order for them to communicate?
Absolutely.
We have exactly that, "all in cloud" server infrastructure on multiple cloud providers, plus numbers of puppet manageable workstations on different continents and one puppet server responsible for hundreds of nodes and additional puppet dashboard server. They all communicate without any problems across Internet.
Something similar to this:
Puppet Infrastructure

Sync mongodb instances

What is the best solution to sync a mongodb instance in local server with dynamic IP (set by ISP) with a mongodb instance in public server (eg. Amazon AWS)? Can i do that from node.js ?
You can do this in a number of ways, but first to address the public/dynamic IP issue you will want to either use a hostname --> IP address mapping that you maintain (/etc/hosts or your own DNS servers) or look into one of the dynamic DNS solutions.
Once you have the changing IP address problems solved, the question is how to keep the systems in sync. The most obvious way is to have the two nodes in a replica set - if your connection is reliable enough this might work, though you will probably want to put an arbiter locally or remotely for whatever side of the connection you want to do writes on when the connection is flakey (in a 2 node set, if either node is down then they are both secondary and cannot take writes).
Another option is to use the mongo connector which lets you sync to arbitrary destinations, including another MongoDB instance.
That project will give you a pretty good idea of what you need to do (in python) to provide such a syncing service. You will need to write something similar in node.js to achieve a proper sync and essentially you will need to tail the oplog on one host and apply it to the other on a regular basis, depending on your requirements.

Connect to Neo4j Cluster via Rest-Interface

I am currently picking a database for my next project (node.js) and I have a question that needs to be answered before I can make a final decision.
I wanted to know if it was possible to setup a HA cluster and connect to it via the REST interface. The most likely solution that comes to my mind is to put HAproxy in front so that it evenly distributes the requests amongst the Neo4j servers. But what about transactions because HAproxy could easily send two requests that belong to the same transaction to different servers.
I believe for writing to an HA cluster, writing to the slaves isn't any faster than just writing to the master since the slaves need to sync with the master.

Separate service on NodeJS server

I want to know how to structure my NodeJS server.
I want to separate services proposed on my website to mount cluster in the future and to have many servers (each allowed to one special task).
Example :
The 'main' server which have one project : ExpressJS and Database
The 'communication server' which have one project : Chat + Forum
Others projects : For complex computing (generating chart / stats / emailing)
Could you explain me different approach for this type of complex website ?
Like Benjamin Gruenbaym said, the architecture belongs somewhere else.
If you are wondering about how to setup the applications on an individual server, there are a few things to keep in mind.
NodeJS runs in a single process, so it should ideally take up 1 core of the CPU. If you run a database on the same server, that is another core. So it may be fine to host all node applications on the same server, if it has a sufficient number of cores.
To run two different Node processes on the same machine, you simply start them one after another, but make sure that they listen on different ports.
To make sure that you can scale out your application later, it is important that you use domain names, instead of IP adresses when you identify your services to each other. So the nodeJS app should know about the database as mydatabase.mycompany.com, not as 192.168.1.10 or any other ip address. This will allow you to later move the database to another network address or to use a load balancer.

Resources