Keeping replicas of Redis instances in sync ?

Keeping replicas of Redis instances in sync ? - linux

It is possible to create repicas of a Redis instance? If yes, what is the overhead of keeping them in sync (apart from the network traffic) ?

See the Redis documentation on setting up replication scenarios.
Since there is a delay between synchronisation, you'll probably need additional application-side logic to keep access that uses the same data to the same server instance. In some cases you may also need to issue additional 'slaveof' commands to one instance in case another goes down/comes back up.
If you need more concrete info, you should elaborate a bit on your use case, i.e. in what environment you're using Redis (i.e. Rails app cluster, custom client...).

Related

Multiple instances of Cassandra on each node in the cluster

Is it possible to have a cluster in Cassandra where each of the server is running multiple instances of Cassandra(each instance is part of the same cluster).
I'm aware that if there's a single server in the cluster, then it's possible to run multiple instances of Cassandra on it, but is it also possible to have multiple such servers in the cluster. If yes, how will the configuration look like(listen address,ports etc)?
Even if it was possible, I understand that there might not be any performance benefits at all, just wanted to know if it's theoretically possible.

Yes, it's possible & such setup is often used for testing, for example, using CCM, although it creates multiple interfaces on loopback (127.0.0.2, ...). DataStax Enterprise also has so-called Multi-instance.
You need carefully configure your instances separating ports, etc. Right now, potentially using the Docker could be the simpler solution to implement it.
But why do you need to do it? Until you have really biffy machine, with a lot of RAM & multiple SSDs, this won't bring you additional performance.

Yes, it is possible even i have worked with 5 instance running in one server in production cluster.
Trust me still it is still running but the generic issues i had is high GC all the time, dropped mutations and high latency so of course it is not good to have this kind of setup.
but for your questions's answer yes it is possible and can be in production also.

Concurrent users without database

I can't seem to get this concept right in my head. If I have a website that gets 1 million concurrent users, without any databases at all, will I need to scale? I'm Using Node.js and Socket.IO. Also is there a way I could simulate something like this on my localhost?

Having one million user, or connections, on Socke.io, doesn't mean you have to scale, but depending on what they are doing, you would probably do. Having a data base adds storage but has nothing more to do with the need for scaling the Node.JS server.
You can create a test to try to insert as much as you want using a loop to connect and then try to emit an event for each of then.

For scaling node you can use a cluster. A single instance of Node.js runs in a single thread. To take advantage of multi-core systems, the user will sometimes want to launch a cluster of Node.js processes to handle the load. https://nodejs.org/api/cluster.html#cluster_cluster
To simulate high load, there are open source tools you can use for free: http://www.opensourcetesting.org/category/performance/

Make Node/MEANjs Highly Available

I'm probably opening up a can of worms with regard to how many hundreds of directions can be taken with this- but I want high availability / disaster recovery with my MEANjs servers.
Right now, I have 3 servers:
MongoDB
App (Grunt'ing the main application, this is the front end
server)
A third server for other processing on the back-end
So at the moment, if I reboot my MongoDB server (or more realistically, it crashes for some reason), I suddenly see this in my App server terminal:
MongoDB connection error: Error: failed to connect to
[172.30.3.30:27017] [nodemon] app crashed - waiting for file changes
before starting...
After MongoDB is back online, nothing happens on the app server until I re-grunt.
What's the best practice for this situation? You can see in the error I'm using nodeMon to monitor changes to the app. I bet upon init I could get my MongoDB server to update a file on the app server within nodemon's view to force a restart? Or is there some other tool I can use for this? Or should I be handling my connections to the db server more gracefully so the app doesn't "crash"?
Is there a way to re-direct to a secondary mongodb in case the primary isn't available? This would be more apt to HA/DR type stuff.

I would like to start with a side note: Given the description in the question and the comments to it, I am not convinced that using AWS is a wise option. A PaaS provider like Heroku, OpenShift or AppFog seems to be more suitable, especially when combined with a MongoDB service provider. Running MongoDB on EBS can be quite a challenge when you are new to MongoDB. And pretty expensive, too, as soon as you need provisioned IOPS.
Note In the following paragraphs, I simplified a few things for the sake of comprehensibility
If you insist on running it on your own, however, you have an option. MongoDB itself comes with means of automatic, transparent failover, called a replica set.
A minimal replica set consists of of two data bearing nodes and a so called arbiter. Write operations go to the node currently elected "primary" only, and reads do, too, unless you explicitly allow or request reads to be performed on the current "secondary". The secondary constantly syncs to the primary. If the current primary goes down for some reason, the former secondary becomes elected primary.
The arbiter is there so that there is always a quorum (qualified majority would be an equivalent term) of members to elect the current secondary to be the new primary. This quorum is mainly important for edge cases, but since you can not rule out these edge cases, an uneven number of members is a hard requirement for a MongoDB replica set (setting aside some special cases).
The beauty of this is that almost all drivers, and the node.js for sure, are replica set aware and deal with the failover procedure pretty gracefully. They simply send the reads and writes to the new primary, without any change to be done at any other point.
You only need to deal with some cases during the failover process. Without going into much detail, you basically check for certain errors in the according callbacks and redo the operation, if you encounter one of those errors and redoing the operation is feasible.
As you might have noticed, the third member, the arbiter, does not hold much data. It is a very lightweight process and can basically run on the cheapest instance you can find.
So you have data replication and automatic, transparent failover with relative ease at the cost of the cheapest VM you can find, since you would need two data bearing nodes anyway if you used any other means.

How to share Azure Redis Cache between environments?

We want to save a few bucks and share our 1GB dedicated Azure Redis Cache between Development, Test, QA and maybe even production.
Is there a better way than prefixing all keys with an environment string like "Dev_[key]", "Test_[key]" etc.
We are using the StackExchange Redis client for .NET.
PS: We tried using the cheap 250GB (Shared infrastructure), but had very slow performance. Read operations were consistent between 600-800ms... without any load (for a ~300KB object). Upgrading to dedicated 1GB services changed that to 30-40ms. See more here: StackExchange.Redis with Azure Redis is unusably slow or throws timeout errors

One approach is to use multiple Redis databases. I'm assuming this is available in your environment :)
Some advantages over prefixing your keys might be:
data is kept separate, you can flushdb in test and not touch the production data
keys are smaller and consume less memory
The main disadvantage would be not taking advantage of multiple cores, like you could do if you ran multiple instances of Redis on the same server. Obviously not an issue in this case. Also note that this feature is not deprecated, like one of the answers suggests.
Another thing I've seen people complain about is that databases are numbered, they don't have meaningful names. Some people create a hash in database 0 that maps each number to a name.

Here is another idea to save some bucks: use separate Redis cache machines for each environment - so no problems with the keys, but stop them when you don't use them, like in the weekend and during nights. Probably more than 50% of the time you are not using them. I think it would be easy to start and stop them with some PowerShell script, we are using AWS and here it is possible.
Now from what I see the Redis persistence in Azure is not enabled, but they started working on it http://feedback.azure.com/forums/169382-cache/status/191763 - it would be nice to do a RDB snapshot before stopping and then on start to load it. So if you need to save some values and reload them on start you should do it manually (with your own service).

Node.js and redis on same server?

We're using cloud hosting (Linode) to host a node.js based (and socket.io) chat app, with redis as the main DB. We haven't launched yet, but we're looking at hosting redis and node.js on the same machine (8 gb instance, redis limited to 5 GB for instance). All communication will be held in redis (ie straight from client to redis, no variables for dialog in node.js). To avoid network travel times amonsgt other bottle necks, we are looking at hosting redis and node.js on the same server. I can't find anything in documentation that would state this is a bad idea, but our sysops guy isn't convinced. Are there any drawbacks to going down this route?

Refer to a very similar answer I posted on a similar matter on SO: Redis deployment configuration - master slave replication where the OP was having a similar issue but his concern was more related to performance.
My main issue with your solution (a simply side note on the other answer) is the simple fact that your node.js application by design has to be cloud facing, e.g., the internet whereas your Redis or other Databases shouldn't.
It doesn't mean you'll have a security issue by all means, but in my opinion it's a best practice to only expose the hosts you really need to, e.g., the ones usually serving content directly to the user.
By not deploying Redis to an internet facing host you enforce a lot of security constrains simply by the topology design of your network.
Is it ok to host those services in the same box:
Yes, do run the benchmarks from time to time to check if you need to scale horizontally or just bump up the flooded hosts.
Check: Redis deployment configuration - master slave replication
Will I have security problems by having Redis or another service facing the internet?
If you know what you are doing - no, you won't have security issues. I still wouldn't do it though.

There is very little speed increase in cutting out the "network bottleneck"; redis is so fast that the latency is negligible. The issue with that approach is that if your box crashes then all of your data held in redis is gone unless you are replicating to a slave. If it goes down any other process not on that machince won't beable to access until it restarts. This may be fine for a staging env but I'd guard against it in production.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string