I've been building a web app in node (and have built others in asp.net-mvc) and recently trying to understand the concept of scaling an app to many users. Assuming a web app gets thousands (or millions) of concurrent users, I understand that the load should be split to more than one instance of node.
my question is, do all these instances run on the same server (virtual machine)? if so, do they (should they) access the same database? if so, does this mean (if I use nginx) that ngingx would be responsible for routing the different requests to the different node instances?
and assuming uploaded files are saved on the file system, do the different instances access the same directories? if not, if a person uploads images to the file system, and connects later and is routed to a different instance of node, how does he access the images he uploaded earlier? is there some sort of sync process done to the file systems?
any resources/articles regarding this would be very helpful!
Related
I'm building a web app that will scale into a linux cluster with tomcat and nginx. There will be one nginx web server load balancing multiple tomcat app servers. And a database server in behind. All running on CentOS 6.
The app involves users uploading photos. I plan to keep all the images on the file system of the front nginx box and have pointers to them stored in the database. This way nginx can serve them full speed without involving the app servers.
The app resizes the image in the browser before uploading. So file size will not be too extreme.
What is the most efficient/reliable way of writing the images from the app servers to the nginx front end server? I can think of several ways I could do it. But I suspect some kind of network files system would be best.
What are current best practices?
Assuming you do not use CMS (Content Management System), you could use the following options :
If you have only one front end web server then the suggestion would be to store it locally on the web server in a local Unix filesystem.
If you have multiple web servers, you could store the files on a SAN or NAS shared network device. This way you would not need to synchronize the files across the servers. Make sure that the shared resource is redundant else if it goes down, your site will be down.
I have read somewhere that it is best practice to divide data in different drives in Linux server. I have only remembered few things like
Mount /var/www to new Folder in New volume so that all website data stays there
Also Put Logs in that Volume
Use MySQL RDS as new instance
So that Application is stateless and nothing chnages on main machine.
I could not get the idea of being stateless
How should i do it
The idea of being stateless is that the instance stores no persistent data (database, user files, etc). That way you could duplicate the instance, put it behind a load balancer and scale horizontally to more traffic without duplicating persistent data.
Using multiple volumes on a stateless machine is optional. It can help if you need to increase io throughput, but outside of that it doesn't provide many advantages. It can make it more difficult if you are trying to build stateless instances since an AMI will only include the root volume.
I want to know how to structure my NodeJS server.
I want to separate services proposed on my website to mount cluster in the future and to have many servers (each allowed to one special task).
Example :
The 'main' server which have one project : ExpressJS and Database
The 'communication server' which have one project : Chat + Forum
Others projects : For complex computing (generating chart / stats / emailing)
Could you explain me different approach for this type of complex website ?
Like Benjamin Gruenbaym said, the architecture belongs somewhere else.
If you are wondering about how to setup the applications on an individual server, there are a few things to keep in mind.
NodeJS runs in a single process, so it should ideally take up 1 core of the CPU. If you run a database on the same server, that is another core. So it may be fine to host all node applications on the same server, if it has a sufficient number of cores.
To run two different Node processes on the same machine, you simply start them one after another, but make sure that they listen on different ports.
To make sure that you can scale out your application later, it is important that you use domain names, instead of IP adresses when you identify your services to each other. So the nodeJS app should know about the database as mydatabase.mycompany.com, not as 192.168.1.10 or any other ip address. This will allow you to later move the database to another network address or to use a load balancer.
I'm currently managing a cluster of PHP-FPM servers, all of which tend to get out of sync with each other. The application that I'm using on top of the app servers (Magento) allows for admins to modify various files on the system, but now that the site is in a clustered set up modifying a file only modifies it on a single instance (on one of the app servers) of the various machines in the cluster.
Is there an open-source application for Linux that may allow me to keep all of these servers in sync? I have no problem with creating a small VM instance that can listen for changes from machines to sync. In theory, the perfect application would have small clients that run on each machine to be synced, which would talk to the master server which would then decide how/what to sync from each machine.
I have already examined the possibilities of running a centralized file server, but unfortunately my app servers are spread out between EC2 and physical machines, which makes this unfeasible. As there are multiple app servers (some of which are dynamically created depending on the load of the site), simply setting up a rsync cron job is not efficient as the cron job would have to be modified on each machine to send files to every other machine in the cluster, and that would just be a whole bunch of unnecessary data transfers/ssh connections.
I'm dealing with setting up a similar solution. I'm half way there. I would recommend you use lsyncd, which basically monitors the disk for changes and then immediately (or whatever interval you want) automatically syncs files to a list of servers using rsync.
The only issue I'm having is keeping the server lists up to date, since I can spin up additional servers at any time, I would need to have each machine in the cluster notified whenever a machine is added or removed from the cluster.
I think lsyncd is a great solution that you should look into. The issue I'm having may turn out to be a problem for you as well, and that remains to be solved.
Instead of keeping tens or hundreds of servers cross-synchronized it would be much more efficient, reliable, and most of all simple maintaining just one "admin node" and replicating changes from that to all your "worker nodes".
For instance at our company we use a Development server -> Staging server -> Live backends workflow where all the changes are transferred across servers using a custom php+rsync front end. That allows the developers to push updates to a Staging server in the live environment, test out changes, and roll them to Live backends incrementally.
A similar approach could very well work in your case as well. Obviously it's not a plug-and-play solution, but I see it as the easiest way to go - both in terms of maintainability and scalability.
I am planning to use cloudfoundry paas service (from VMWare) for hosting my node.js application. I have seen that it has support for mongo and redis in the service layer and node.js framework. So far so good.
Now I need to store my mediafiles(images uploaded by users) to a filesystem. I have the metadata stored in Mongo.
I have been searching internet, but have not yet got good information.
You cannot do that for the following reasons:
There are multiple host machines running your application. They each have their own filesystems. Each running process in your application would see a different set of files.
The host machines on which your particular application is running can change moment-to-moment. Indeed, they will change every time you re-deploy your application. Every time a process is started on a new host machine, it will see an empty set of files. Every time a process is stopped on an old host machine, all the files would be permanently deleted.
You absolutely must solve this problem in another way.
Store the media files in MongoDB GridFS.
Store the media files in an object store such as Amazon S3 or Rackspace Cloud Files.
Filesystem in most cloud solutions are "ephemeral", so you can not use FS. You will have to use solutions like S3/ DB for such purpose