I have Nginx set up as a reverse proxy in front of my express application.
So every request that comes to Nginx is proxied to express running on 4 ports. Both Nginx and express run on the same hosts .
After having read that all the static content should be served by Nginx and Express should be left for dynamic requests only, I gave it a shot and set up the Nginx config . It works perfectly . So now all JS / CSS and HTML assets are served by Nginx itself.
Now how do I prove that this is a better setup in terms of numbers ? Should I use some tool to simulate requests ( to both older and the newer setup ), and compare the average load times of assets ?
Open your browser => Dev tools => Networks
Here you can see the network wait time and download time for every request. So you can open your webpage and compare it with both the configs.
This can be helpful on a local env so latency has minimal effect on testing.
Other than that you can do a load test. Google load testing tools!
In a word, "benchmark." You have two configurations. You need to understand the efficiencies under each model. To do so you need to instrument the hosts to collect data on the finite resources (CPU, DISK, MEMORY, NETWORK and related sub statistics) as well as response times.
Any performance testing tool which exercises the HTTP interface and allows for the collection and aggregation of your monitoring data while under test should do the trick. You should be able to collect information on the most common paths through your site, the number of users on your system for any given slice of time, the average session duration (iteration interval) all from an examination of the logs. The most common traversals then become the basis for the business processes you will need to replicate with your performance testing tool.
If you have no engaged in performance testing efforts before then this would be a good time to tag someone in your organization who does this work on an ongoing basis. The learning curve is steep and (if you haven't done this before and you have no training or mentor) fairly long. You can burn a lot of cycles on poor tests/benchmark executions before you get it "right" where you can genuinely compare the performance of configuration A to configuration B.
Related
I'm working on several different webapps written in node. Each webapp has very little traffic (maybe a few HTTP requests per day) so I run them all on a single machine with haproxy as a reverse proxy. It seems each webapp is consuming almost 100MB RAM memory which adds up to a lot when you have many webapps. Because each webapp receives so little traffic I was wondering if there is a way to have all the webapps turned off by default but setup so that they automatically start if there is an incoming HTTP request (and then turn off again if there hasn't been any HTTP requests within some fixed time period).
Yes. These a dozen different ways to handle this. With out more details not sure the best way to handle this. One option is using node VM https://nodejs.org/api/vm.html Another would be some kind of Serverless setup. See: https://www.serverless.com/ Honestly, 100MB is a drop in the bucket with ram prices these days. Quick google shows 16GB ram for $32 or to put that differently, 160 nodes apps. I'm guessing you could find better prices on EBay or a something like that.
Outside learning this would be a total waste of time. Your time is worth more than the effort it would take to set this up. If you only make minimum wage in the US it'd take you less than 4 hours to make back the cost of the ram. Better yet go learn Docker/k8s and containerize each of those apps. That said learning Serverless would be a good use of time.
Assuming traffic/server load is not a factor ...
(Taken further, we could even assume that there are zero visitors, and I just happen to visit one of my websites in a "vacuum")
... Would there theoretically be any difference in the loading time if I were to host only a single site on my VPS vs. hosting multiple sites using the "name-based" method?
(Even if it is minuscule, I would still like to know—and why, ideally!)
So there are a tons of different ways to look at this, most importantly is what type of applications are running.
What I mean by this is if your running a static webpage for each and using simple domain based routing (nginx or apache) you will see no difference, other than added disk space.
On the other hand you could be running more advanced web applications, for most cases (provided traffic is not a factor) when a request is made to the web server processes it and returns the response, only using possessing time during the request. This also will see no difference.
But! When the application requires added programs and background processing you will see a performance difference minuscule but as you add more "domains" you will see greater performance hit.
Static Pages: No difference (besides disk space)
Web Applications: Difference based on non-request based processing
You are asking what is at the root of shared hosting. Which is amazing for static and basic programs but not so good when you scale it up to larger applications.
Sidenote: This is assuming the applications are not of different run-times and requirements thus having a python + mySql and a node.js + MongoDB at the same time on a weak server would see a performance hit as the services are always running
Are there any special tuning tips for strengthening an API built on top of the hapijs framework?
Especially if you have lots of concurrent request (+10000/sec) that are accessing the DB?
I'm using PM2 to start my process in "cluster mode" to be able to load-balance to different cores on the server
I don't need to serve static content, so there's no apache/nginx proxy
update 17:11
Running tests with 1000 requests/sec (with loader.io) results in this curve - ok, so far. but I'm wondering if there is still room for improvements.
(hardware: 64gb / 20 core digital ocean droplet)
In the end I just used a combination of node's http and the body-parser module to achieve what I needed.
But I think that this was only viable, because my application had just two endpoint (one GET, one POST).
If your application logic is rather complicated and you want to stick with hapi, think about using a load-balancer and dividing the load to multiple VMs.
Loadtest results (new setup on an even smaller DO droplet):
First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
And...
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.
Is there any benchmark or comparison which is faster: place nginx in front of node and let it serve static files directly or use just node and serve static files using it?
nginx solution seems to be more manageable for me, any thoughts?
I'll have to disagree with the answers here. While Node will do fine, nginx will most definitely be faster when configured correctly. nginx is implemented efficiently in C following a similar pattern (returning to a connection only when needed) with a tiny memory footprint. Moreover, it supports the sendfile syscall to serve those files which is as fast as you can possibly get at serving files, since it's the OS kernel itself that's doing the job.
By now nginx has become the de facto standard as the frontend server. You can use it for its performance in serving static files, gzip, SSL, and even load-balancing later on.
P.S.: This assumes that files are really "static" as in at rest on disk at the time of the request.
I did a quick ab -n 10000 -c 100 for serving a static 1406 byte favicon.ico, comparing nginx, Express.js (static middleware) and clustered Express.js. Hope this helps:
Unfortunately I can't test 1000 or even 10000 concurrent requests as nginx, on my machine, will start throwing errors.
EDIT: as suggested by artvolk, here are the results of cluster + static middleware (slower):
Either way, I'd setup Nginx to cache the static files...you'll see a HUGE difference there. Then, whether you serve them from node or not, you're basically getting the same performance and the same load-relief on your node app.
I personally don't like the idea of my Nginx frontend serving static assets in most cases, in that
1) The project has to now be on the same machine - or has to be split into assets (on nginx machine) & web app (on multiple machines for scaling)
2) Nginx config now has to maintain path locations for static assets / reload when they change.
I have a different interpretation of #gremo's charts. It looks to me like both node and nginx scale at the same number of requests (between 9-10k). Sure the latency in the response for nginx is lower by a constant 20ms, but I don't think users will necessarily perceive that difference (if your app is built well).
Given a fixed number of machines, it would take quite a significant amount of load before I would convert a node machine to nginx considering that node is where most of the load will occur in the first place.
The one counterpoint to this is if you are already dedicating a machine to nginx for load balancing. If that is the case then you might as well have it serve your static content as well.
FWIW, I did a test on a rather large file download (~60 MB) on an AWS EC2 t2.medium instance, to compare the two approaches.
Download time was roughly the same (~15s), memory usage was negligible in both cases (<= 0.2%), but I got a huge difference in CPU load during the download:
Using Node + express.static(): 3.0 ~ 5.0% (single node process)
Using nginx: 0.3 ~ 0.7% (nginx process)
That's a tricky question to answer. If you wrote a really lightweight node server to just serve static files, it would most likely perform better than nginx, but it's not that simple. (Here's a "benchmark" comparing a nodejs file server and lighttpd - which is similar in performance to ngingx when serving static files).
Performance in regard to serving static files often comes down to more than just the web-server doing the work. If you want the highest performance possible, you'll be using a CDN to serve your files to reduce latency for end-users, and benefit from edge-caching.
If you're not worried about that, node can serve static files just fine in most situation. Node lends itself to asynchronous code, which it also relies on since it's single-threaded and any blocking i/o can block the whole process, and degrade your applications performance. More than likely you're writing your code in a non-blocking fashion, but if you are doing anything synchronously, you may cause blocking, which would degrade how fast other clients can get their static files served. The easy solution is to not write blocking code, but sometimes that's not a possibility, or you can't always enforce it.
Use Nginx to cache static files served by Node.js. The Nginx server is deployed in front of the Node.js server(s) to perform:
SSL
Termination:
Terminate HTTPS traffic from clients, relieving your upstream web and
application servers of the computational load of SSL/TLS encryption.
Configuring Basic Load Balancing with
NGINX:
set up NGINX Open Source or NGINX Plus as a load balancer in front of
two Node.js servers.
Content
Caching:
Caching responses from your Node.js app servers can both improve
response time to clients and reduce load on the servers, because
eligible responses are served immediately from the cache instead of
being generated again on the server.
I am certain that purely node.js can outperform nginx in a lot of aspect.
All said I have to stay NginX has an in-built cache, whereas node.js doesn't come with it factory installed (YOU HAVE TO BUILD YOUR OWN FILE CACHE).
The custom file cache does outperform nginx and any other server in the market as it is super simple.
Also Nginx runs on multiple cores. To use the full potential of Node you have to cluster node servers. If you are interested to know how then please pm.
You need to deep digger to achieve performance nirvana with node, that is the only problem. Once done hell yeah... it beats Nginx.