Connect to Neo4j Cluster via Rest-Interface - node.js

I am currently picking a database for my next project (node.js) and I have a question that needs to be answered before I can make a final decision.
I wanted to know if it was possible to setup a HA cluster and connect to it via the REST interface. The most likely solution that comes to my mind is to put HAproxy in front so that it evenly distributes the requests amongst the Neo4j servers. But what about transactions because HAproxy could easily send two requests that belong to the same transaction to different servers.

I believe for writing to an HA cluster, writing to the slaves isn't any faster than just writing to the master since the slaves need to sync with the master.

Related

How to properly connect client application to Scylla or Cassandra?

Let's say I have a cluster of 3 nodes for ScyllaDB in my local network (it can be AWS VPC).
I have my Java application running in the same local network.
I am concerned how to properly connect app to DB.
Do I need to specify all 3 IP addresses of DB nodes for the app?
What if over time one or several nodes die and get resurrected on other IPs? Do I have to manually reconfigure application?
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
I would be much grateful for a code sample of how to connect Java app to multi-node cluster.
You need to specify contact points (you can use DNS names instead of IPs) - several nodes (usually 2-3), and driver will connect to one of them, and will discover the all nodes of the cluster after connection (see the driver's documentation). After connection is established, driver keeps the separate control connection opened, and via it receives the information about nodes that are going up & down, joining or leaving the cluster, etc., so it's able to keep information about cluster topology up-to-date.
If you're specifying DNS names instead of the IP addresses, then it's better to specify configuration parameter datastax-java-driver.advanced.resolve-contact-points as true (see docs), so the names will be resolved to IPs on every reconnect, instead of resolving at the start of application.
Alex Ott's answer is correct, but I wanted to add a bit more background so that it doesn't look arbitrary.
The selection of the 2 or 3 nodes to connect to is described at
https://docs.scylladb.com/kb/seed-nodes/
However, going forward, Scylla is looking to move away from differentiating between Seed and non-Seed nodes. So, in future releases, the answer will likely be different. Details on these developments at:
https://www.scylladb.com/2020/09/22/seedless-nosql-getting-rid-of-seed-nodes-in-scylla/
Answering the specific questions:
Do I need to specify all 3 IP addresses of DB nodes for the app?
No. Your app just needs one to work. But it might not be a bad idea to have a few, just in case one is down.
What if over time one or several nodes die and get resurrected on other IPs?
As long as your app doesn't stop, it maintains its own version of gossip. So it will see the new nodes being added and connect to them as it needs to.
Do I have to manually reconfigure application?
If you're specifying IP addresses, yes.
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
By abstracting the need for a specific IP, using something like Consul. If you wanted to, you could easily build a simple restful service to expose an inventory list or even the results of nodetool status.

Puppet: Is it possible to set a master/client architecture without knowing my clients addressess?

Situation: I have to prepare to manage a large amount of remote servers.
Problem: They are going to be in different private networks. So I won't be able to reach them from outside, but they can easily reach my master node.
Is it sufficient that my client nodes know how to reach my master node in order for them to communicate?
Absolutely.
We have exactly that, "all in cloud" server infrastructure on multiple cloud providers, plus numbers of puppet manageable workstations on different continents and one puppet server responsible for hundreds of nodes and additional puppet dashboard server. They all communicate without any problems across Internet.
Something similar to this:
Puppet Infrastructure

Redis deployment configuration - master slave replication

Currently I have two servers which I have deployed node.js/Express.JS based web services API. I am using Redis for caching the JSON strings.
What will be the best option deploying this setup in to production? I see here it advices to go with a dedicated server redis. OK. I take it and use a dedicated server for running redis master. Can I use existing app servers as slave nodes? Note : these app servers are running an Node/Express application.
What other other options do I have?
You can.
It all depends on the load that those other servers have, it's a problem of resource sharing. To be honest my main issue with your architecture is not the dedicated vs the non-dedicated servers, it's the fact that you are placing a Redis server (master or not) on a host that most likely will be facing the internet (expressJS app), meaning, it's quite exposed.
If you can simulate HTTP load into your Node/Express JS servers, see the difference between running some benchmark tests on your dedicated server vs the non dedicated ones:
On a running redis server type in:
redis-benchmark -q -n 100000
If the app servers are being hammered and using all cores frequently you should see a substantial difference in the benchmarks.
My suggestion is, go ahead with your first setup and add monitoring for the redis response times, and only act when you have to, which might be now if the benchmarks show very poor results.
As a side note, consider the option of not sharing hosts for services that you expose to the internet with services that perform internal functions to your application.

Master Slave Memcache replication

Scenario :
Our architecture is based on memcached and with increasing number of user base we need to replicate cache to multiple servers.
We were easily able to replicate between two servers using repcached utility.
We want to load balanced memcached in such a manner that it will work on shards which we define in our code.
The question over here is how can we replicate to multiple servers.
We have changed peer port for replication but still we are not able to replicate keys.
As far as I have been reading, memcached-repcached is capable of 2 node replication setup only.
Some other options out there:
alternative to memcached that can persist to disk
Good luck!

Keeping Multiple Servers in a Cluster In-Sync?

I'm currently managing a cluster of PHP-FPM servers, all of which tend to get out of sync with each other. The application that I'm using on top of the app servers (Magento) allows for admins to modify various files on the system, but now that the site is in a clustered set up modifying a file only modifies it on a single instance (on one of the app servers) of the various machines in the cluster.
Is there an open-source application for Linux that may allow me to keep all of these servers in sync? I have no problem with creating a small VM instance that can listen for changes from machines to sync. In theory, the perfect application would have small clients that run on each machine to be synced, which would talk to the master server which would then decide how/what to sync from each machine.
I have already examined the possibilities of running a centralized file server, but unfortunately my app servers are spread out between EC2 and physical machines, which makes this unfeasible. As there are multiple app servers (some of which are dynamically created depending on the load of the site), simply setting up a rsync cron job is not efficient as the cron job would have to be modified on each machine to send files to every other machine in the cluster, and that would just be a whole bunch of unnecessary data transfers/ssh connections.
I'm dealing with setting up a similar solution. I'm half way there. I would recommend you use lsyncd, which basically monitors the disk for changes and then immediately (or whatever interval you want) automatically syncs files to a list of servers using rsync.
The only issue I'm having is keeping the server lists up to date, since I can spin up additional servers at any time, I would need to have each machine in the cluster notified whenever a machine is added or removed from the cluster.
I think lsyncd is a great solution that you should look into. The issue I'm having may turn out to be a problem for you as well, and that remains to be solved.
Instead of keeping tens or hundreds of servers cross-synchronized it would be much more efficient, reliable, and most of all simple maintaining just one "admin node" and replicating changes from that to all your "worker nodes".
For instance at our company we use a Development server -> Staging server -> Live backends workflow where all the changes are transferred across servers using a custom php+rsync front end. That allows the developers to push updates to a Staging server in the live environment, test out changes, and roll them to Live backends incrementally.
A similar approach could very well work in your case as well. Obviously it's not a plug-and-play solution, but I see it as the easiest way to go - both in terms of maintainability and scalability.

Resources