Keeping Multiple Servers in a Cluster In-Sync? - linux

I'm currently managing a cluster of PHP-FPM servers, all of which tend to get out of sync with each other. The application that I'm using on top of the app servers (Magento) allows for admins to modify various files on the system, but now that the site is in a clustered set up modifying a file only modifies it on a single instance (on one of the app servers) of the various machines in the cluster.
Is there an open-source application for Linux that may allow me to keep all of these servers in sync? I have no problem with creating a small VM instance that can listen for changes from machines to sync. In theory, the perfect application would have small clients that run on each machine to be synced, which would talk to the master server which would then decide how/what to sync from each machine.
I have already examined the possibilities of running a centralized file server, but unfortunately my app servers are spread out between EC2 and physical machines, which makes this unfeasible. As there are multiple app servers (some of which are dynamically created depending on the load of the site), simply setting up a rsync cron job is not efficient as the cron job would have to be modified on each machine to send files to every other machine in the cluster, and that would just be a whole bunch of unnecessary data transfers/ssh connections.

I'm dealing with setting up a similar solution. I'm half way there. I would recommend you use lsyncd, which basically monitors the disk for changes and then immediately (or whatever interval you want) automatically syncs files to a list of servers using rsync.
The only issue I'm having is keeping the server lists up to date, since I can spin up additional servers at any time, I would need to have each machine in the cluster notified whenever a machine is added or removed from the cluster.
I think lsyncd is a great solution that you should look into. The issue I'm having may turn out to be a problem for you as well, and that remains to be solved.

Instead of keeping tens or hundreds of servers cross-synchronized it would be much more efficient, reliable, and most of all simple maintaining just one "admin node" and replicating changes from that to all your "worker nodes".
For instance at our company we use a Development server -> Staging server -> Live backends workflow where all the changes are transferred across servers using a custom php+rsync front end. That allows the developers to push updates to a Staging server in the live environment, test out changes, and roll them to Live backends incrementally.
A similar approach could very well work in your case as well. Obviously it's not a plug-and-play solution, but I see it as the easiest way to go - both in terms of maintainability and scalability.

Related

Puppet: Is it possible to set a master/client architecture without knowing my clients addressess?

Situation: I have to prepare to manage a large amount of remote servers.
Problem: They are going to be in different private networks. So I won't be able to reach them from outside, but they can easily reach my master node.
Is it sufficient that my client nodes know how to reach my master node in order for them to communicate?
Absolutely.
We have exactly that, "all in cloud" server infrastructure on multiple cloud providers, plus numbers of puppet manageable workstations on different continents and one puppet server responsible for hundreds of nodes and additional puppet dashboard server. They all communicate without any problems across Internet.
Something similar to this:
Puppet Infrastructure

IIS (Win2012) on EC2 with auto scaling

We are looking at moving around 100 websites that we have on a dedicated web server, from our current hosting company; and host these sites on a EC2 Windows 2012 server.
I've looked at the type of EC2 instances available. Am I better going for a m1.small (or t1.micro with auto scaling). With regards auto scaling, how does it work, if I upload a file to the master instance, when are the other instances updated ? Is it when the instances are auto scaled again ?
Also, I will be needing to host a mail enable (mail server) application. Any thoughts on best practice for this ? Am I better off hosting 1 server for everything, or splitting it across instances...?
When you are working with EC2, you need to start thinking about how your applications are designed and deployed differently.
Autoscaling works best when your instances follow shared nothing architecture. The instances themselves should never store persistent data. They should also be able to be automatically set up at launch.
Some applications are not designed to work in this environment. They require local file storage, or other issues.
You probably wont be using micro instances. They are mostly designed for very specific low utilization workloads.
You can run a mail server on ec2, but you will have to use an Elastic IP and whitelist the instances sending mail. By default, EC2 instances are on the spamhaus block list.

Preventing single entry point in configuration management master/agent setup

I'm researching configuration management software like Puppet. My primary concern is preventing a single entry point to all of our internal servers. Take this scenario for example.
Somehow, access is gained into the master configuration server. From there a user would then be able to gain relatively easy access to manipulate or ultimately gain access to other servers controlled by the master.
The primary goal is to prevent a single point entry into the network, even if said master configuration is not available to the public internet.
tl;dr How can I prevent single point access to all other servers in a master/agent configuration management setup?
If you are thinking about delegating the task of defining the Puppet rules to other people (eg. technician), you can have create a Puppet master (Master A), have a test machine connected to Master A, then make them commit the code to Git or SVN.
You control a second Puppet master (Master B), which you pull the code from Git or SVN. All your machine in network connect to Master B. Once you are happy with the code, you can ask Puppet to push it to all your machine.
This way, access to all machine configuration only on Master B, which only you handle and access.
You can't with the default configuration.
The way Puppet is designed is to have agents to contact the master. Even if you are behind multiple firewalls you need to allow the agents to enter into your internal network. Even if you routinelly allow connections from the DMZ to the internal network, you may still need to manage machines in the open internet. What Puppet requires is to open your internal network to the open internet.
The risk of this client-pull design is that if you can hack into a machine with an agent you can contact the master, and if the master have any vulnerability you can hack into it and from them you can control all machines with agents, plus you can mount an attack into your internal network. So, if a vulnerability is exploited in the Puppet master communication channel with the agents, then Puppet becomes an attack vector (a huge one, as you maybe managing all your infrastructure with it, and you have allowed access from outside to your LAN).
With a master-push design this could be minimized, as the master would be one single point to protect and would be inside the safe internal network, with connections only going from inside to outside.
There is a pending feature request (4 years old!) in PuppetLabs (http://projects.puppetlabs.com/issues/2045) titled Push functionality in puppetmaster to clients. Reading the comments on that feature request and finding things like the following comment makes me wonder if the Puppet developers really understant what is the problem:
Ultimately, it isn’t all that high a priority, either – almost every risk that opening the port to the master exposes is also exposed by having the master reach out and contact the client. There is little or no change in actual risk to the model proposed.
However, while the developers realize the problem, others are designing their own solutions (like https://github.com/tomas-edwardsson/puppet-push).
Update:
I've found a presentation by Bernd Strößenreuther titled Best practices on how to turn Your environment into a Puppet managed environment available as PDF at http://stroessenreuther.info/pub/Puppet_getting_started.pdf
He suggest to establish ssh connections from the master to the agents and open a reverse tunnel so that the agents can connect to the master. These connections could be started in a cron job periodically. In this way you don't have to open your internal network for incoming connections yet the agents have access to the master data.
Now, regarding the pull mechanism, it may seem like a bad design but actually it is essential to allow very automated environments to work. For example, in an elastic network (like EC2 with autoscaling) where servers are started and halted automatically, the servers need to be able to configure themselves right away, so they boot up and the first thing they do is contact the master for an updated configuration. That would be harder if you have to push the configuration to each server periodically, because they would need to wait for the master (seconds, minutes, or hours; that is unacceptable in some applications).

Deploying updates to production node.js code

This may be a basic question, but how do I go about effeciently deploying updates to currently running node.js code?
I'm coming from a PHP, JavaScript (client-side) background, where I can just overwrite files when they need updating and the changes are instantly available on the produciton site.
But in node.js I have to overwrite the existing files, then shut-down and the re-launch the application. Should I be worried by potential downtime in this? To me it seems like a more risky approach than the PHP (scripting) way. Unless I have a server cluster, where I can take down one server at a time for updates.
What kind of strategies are available for this?
In my case it's pretty much:
svn up; monit restart node
This Node server is acting as a comet server with long polling clients, so clients just reconnect like they normally would. The first thing the Node server does is grab the current state info from the database, so everything is running smoothly in no time.
I don't think this is really any riskier than doing an svn up to update a bunch of PHP files. If anything it's a little bit safer. When you're updating a big php project, there's a chance (if it's a high traffic site it's basically a 100% chance) that you could be getting requests over the web server while you're still updating. This means that you would be running updated and out-of-date code in the same request. At least with the Node approach, you can update everything and restart the Node server and know that all your code is up to date.
I wouldn't worry too much about downtime, you should be able to keep this so short that chances are no one will notice (kill the process and re-launch it in a bash script or something if you want to keep it to a fraction of a second).
Of more concern however is that many Node applications keep a lot of state information in memory which you're going to lose when you restart it. For example if you were running a chat application it might not remember who users were talking to or what channels/rooms they were in. Dealing with this is more of a design issue though, and very application specific.
If your node.js application 'can't skip a beat' meaning it is under continuous bombardment of incoming requests, you just simply cant afford that downtime of a quick restart (even with nodemon). I think in some cases you simply want a seamless restart of your node.js apps.
To do this I use naught: https://github.com/superjoe30/naught
Zero downtime deployment for your Node.js server using builtin cluster API
Some of the cloud hosting providers Node.js (like NodeJitsu or Windows Azure) keep both versions of your site on disk on separate directories and just redirect the traffic from one version to the new version once the new version has been fully deployed.
This is usually a built-in feature of Platform as a Service (PaaS) providers. However, if you are managing your servers you'll need to build something to allow for traffic to go from one version to the next once the new one has been fully deployed.
An advantage of this approach is that then rollbacks are easy since the previous version remains on the site intact.

Server Farm Sync

What is the preferred method of keeping a server farm synchronized? It's currently a pain to have to upload to multiple servers. Looking for a balance of ease of use and cost. I read somewhere that a DFS can do it, but that's something that requires the servers to run on a domain. Are there any performance issues with using a DFS?
We use SVN to retain the server files in specific repositories and then have a script that executes to pull the latest files out of SVN onto each of the servers in the webfarm (6 servers). This employs the TortoiseSVN utility as it has an easier command line interface for the admins and updates all the machines from a single server, usually the lowest IP address in the pool.
We ensure no server has any local modifications for the checked out repository to avoid conflicts and we get a change log with the file histories in SVN with the benefits of roll back too. We also include any admin scripts so these get the benefit of versioning and change logs.

Resources