Master Slave Memcache replication - linux

Scenario :
Our architecture is based on memcached and with increasing number of user base we need to replicate cache to multiple servers.
We were easily able to replicate between two servers using repcached utility.
We want to load balanced memcached in such a manner that it will work on shards which we define in our code.
The question over here is how can we replicate to multiple servers.
We have changed peer port for replication but still we are not able to replicate keys.

As far as I have been reading, memcached-repcached is capable of 2 node replication setup only.
Some other options out there:
alternative to memcached that can persist to disk
Good luck!

Related

How to properly connect client application to Scylla or Cassandra?

Let's say I have a cluster of 3 nodes for ScyllaDB in my local network (it can be AWS VPC).
I have my Java application running in the same local network.
I am concerned how to properly connect app to DB.
Do I need to specify all 3 IP addresses of DB nodes for the app?
What if over time one or several nodes die and get resurrected on other IPs? Do I have to manually reconfigure application?
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
I would be much grateful for a code sample of how to connect Java app to multi-node cluster.
You need to specify contact points (you can use DNS names instead of IPs) - several nodes (usually 2-3), and driver will connect to one of them, and will discover the all nodes of the cluster after connection (see the driver's documentation). After connection is established, driver keeps the separate control connection opened, and via it receives the information about nodes that are going up & down, joining or leaving the cluster, etc., so it's able to keep information about cluster topology up-to-date.
If you're specifying DNS names instead of the IP addresses, then it's better to specify configuration parameter datastax-java-driver.advanced.resolve-contact-points as true (see docs), so the names will be resolved to IPs on every reconnect, instead of resolving at the start of application.
Alex Ott's answer is correct, but I wanted to add a bit more background so that it doesn't look arbitrary.
The selection of the 2 or 3 nodes to connect to is described at
https://docs.scylladb.com/kb/seed-nodes/
However, going forward, Scylla is looking to move away from differentiating between Seed and non-Seed nodes. So, in future releases, the answer will likely be different. Details on these developments at:
https://www.scylladb.com/2020/09/22/seedless-nosql-getting-rid-of-seed-nodes-in-scylla/
Answering the specific questions:
Do I need to specify all 3 IP addresses of DB nodes for the app?
No. Your app just needs one to work. But it might not be a bad idea to have a few, just in case one is down.
What if over time one or several nodes die and get resurrected on other IPs?
As long as your app doesn't stop, it maintains its own version of gossip. So it will see the new nodes being added and connect to them as it needs to.
Do I have to manually reconfigure application?
If you're specifying IP addresses, yes.
How is it done properly in big real production cases with tens of DB servers, possibly in different data centers?
By abstracting the need for a specific IP, using something like Consul. If you wanted to, you could easily build a simple restful service to expose an inventory list or even the results of nodetool status.

Cassandra data clone to another cassandra database(different servers)

My question is above mentioned, have a cassandra database and wanted to use another server with this data. How can i move this all keyspace's data?
I have snapshots but i dont know can i open it to another server.
Thanks for your helps
Unfortunately, you have limited options to move data across clouds primarily COPY command or sstableloader (https://docs.datastax.com/en/cassandra/2.1/cassandra/migrating.html) or if you plan to maintain a like-to-like setup (same number of nodes) across clouds then simply copying snapshots under data would work.
If you are moving to IBM Softlayer, you maybe able to use software-defined storage solutions that get deployed on bare metal and provide features like thin clones that will allow you to create clones of cassandra clusters in matter of minutes and provide incredible space savings. This is rather useful for creating clones for dev/test purposes. Checkout Robin Systems, you may find them interesting.
The cleanest way to migrate your data from one cluster to another is using the sstableloader tool. This will allow you to stream the contents of your sstables from a local directory to a remote cluster. In this case the new cluster can also be configured differently and you also don't have to worry about assigned tokens.

Why Cluster NodeJS vs Docker, And What About Einhorn?

I have been a fan of the ease with which I can create/compose application functionality using NodeJS. NodeJS, to me, is easy.
When looking at how to take advantage of multi-core machines (and then also considering the additional complexity of port specific apps - like a web app on 80/443), my original solutions looked at NodeJS Cluster (or something like pm2) and maybe a load balancer.
But I'm wondering what would be the downside (or the reason why it wouldn't work) of instead running multiple containers (to address the multi-core situation) and then load balancing across their respective external ports? Past that, would it be better to just use Einhorn or... how does Einhorn fit into this picture?
So, the question is - for NodeJS only (because I'm also thinking about Go) - am I correct in considering "clustering" vs "multiple docker containers with load balancing" as two possible ways to utilize multiple cores?
As a separate question, is Einhorn just an alternative third-party way to achieve the same thing a NodeJS clustering (which could also be used to load balance a Go app, for example)?
Docker is starting to take on more and more of the clustering and load-balancing aspects we used to handle independently, either directly or by idiomatic usage patterns. With NodeJS for example, you can have one nginx or haproxy container load balance between multiple NodeJS containers. I prefer using something like fig, and also setting the restart-policy so that the containers are restarted automatically. This removes the need for other clustering solutions in most cases.

Sync mongodb instances

What is the best solution to sync a mongodb instance in local server with dynamic IP (set by ISP) with a mongodb instance in public server (eg. Amazon AWS)? Can i do that from node.js ?
You can do this in a number of ways, but first to address the public/dynamic IP issue you will want to either use a hostname --> IP address mapping that you maintain (/etc/hosts or your own DNS servers) or look into one of the dynamic DNS solutions.
Once you have the changing IP address problems solved, the question is how to keep the systems in sync. The most obvious way is to have the two nodes in a replica set - if your connection is reliable enough this might work, though you will probably want to put an arbiter locally or remotely for whatever side of the connection you want to do writes on when the connection is flakey (in a 2 node set, if either node is down then they are both secondary and cannot take writes).
Another option is to use the mongo connector which lets you sync to arbitrary destinations, including another MongoDB instance.
That project will give you a pretty good idea of what you need to do (in python) to provide such a syncing service. You will need to write something similar in node.js to achieve a proper sync and essentially you will need to tail the oplog on one host and apply it to the other on a regular basis, depending on your requirements.

Connect to Neo4j Cluster via Rest-Interface

I am currently picking a database for my next project (node.js) and I have a question that needs to be answered before I can make a final decision.
I wanted to know if it was possible to setup a HA cluster and connect to it via the REST interface. The most likely solution that comes to my mind is to put HAproxy in front so that it evenly distributes the requests amongst the Neo4j servers. But what about transactions because HAproxy could easily send two requests that belong to the same transaction to different servers.
I believe for writing to an HA cluster, writing to the slaves isn't any faster than just writing to the master since the slaves need to sync with the master.

Resources