Heroku Autoscale with Node.js - node.js

I have seen many implementations in Ruby for dynamically scaling heroku dynos and workers such as heroku-autoscale. How about node.js? Is there any modules for managing heroku instances for node.js?

I don't know of anything that already exists, but you can always interrogate the logs to see how busy your application is at the current time.
However, be very very wary of auto-scaling as it is a very complex topic. For instance, let's say you get busy, and your database is a bottleneck. You see a slow site, and thus crank on more dynos to try and speed up the queue. This creates more traffic on the database and thus compounds the issue, and costs you more money.
Only auto-scale if you can be 100% sure in every way that scaling that particular process will alleviate any problems you might be having.

Related

NodeJS Monitoring Website (Worker Threads?/Multi Process?)

I am doing small project of application that will monitor some servers.
It will base on telnet port check, ping, and also it will use libraries to connect directly to databases (MSSQL, Oracle, MySQL) to check their status.
I wonder what will be the best effective solution for this idea, currently with around 30 servers it works quite smooth, around 2.5sec to check status for all of them (running async). However I am worried that in the future with more servers it might get worse. Hence thinking about using some alternative like Worker Threads maybe? or some multi processing? Any ideas? Everything is happening in internal network so I do not expect huge latency.
Thank you in advance.
Have you ever tried the PM2 cluster mode:
https://pm2.keymetrics.io/docs/usage/cluster-mode/
The telnet stuff is TCP, which Node.js does very well using OS-level networking events. The connections to databases can vary. In the case of Oracle, you'll likely be using the node-oracledb. Those are SQL*Net connections that rely on the OCI libs and Node.js' thread pool. The thread pool defaults to four threads, but you can grow it up to 128 per Node.js process. See this doc for info:
https://oracle.github.io/node-oracledb/doc/api.html#-143-connections-threads-and-parallelism
Having said all that, other than increasing the size of the thread pool, I wouldn't recommend you make any changes. Why fight fires before they're burning? No need to over-engineer things. You're getting acceptable performance given the current number of servers you have.
How many servers do you plan to add in, say, 5 years? What's the difference in timing if you run the status checks for half of the servers vs all of them? Perhaps you could use that kind of data to make an educated guess as to where things would go.
As you add new ones, keep track of the total time to check the status. Is it slipping? If so, look into where the time is being spent and write the solution that will help.

I'm not sure how to correctly configure my server setup

This is kind of a multi-tiered question in which my end goal is to establish the best way to setup my server which will be hosting a website as well as a service (using Socket.io) for an iOS (and eventually an Android) app. Both the app service and the website are going to be written in node.js as I need high concurrency and scaling for the app server and I figured whilst I'm at it may as well do the website in node because it wouldn't be that much different in terms of performance than something different like Apache (from my understanding).
Also the website has a lower priority than the app service, the app service should receive significantly higher traffic than the website (but in the long run this may change). Money isn't my greatest priority here, but it is a limiting factor, I feel that having a service that has 99.9% uptime (as 100% uptime appears to be virtually impossible in the long run) is more important than saving money at the compromise of having more down time.
Firstly I understand that having one node process per cpu core is the best way to fully utilise a multi-core cpu. I now understand after researching that running more than one per core is inefficient due to the fact that the cpu has to do context switching between the multiple processes. How come then whenever I see code posted on how to use the in-built cluster module in node.js, the master worker creates a number of workers equal to the number of cores because that would mean you would have 9 processes on an 8 core machine (1 master process and 8 worker processes)? Is this because the master process usually is there just to restart worker processes if they crash or end and therefore does so little it doesnt matter that it shares a cpu core with another node process?
If this is the case then, I am planning to have the workers handle providing the app service and have the master worker handle the workers but also host a webpage which would provide statistical information on the server's state and all other relevant information (like number of clients connected, worker restart count, error logs etc). Is this a bad idea? Would it be better to have this webpage running on a separate worker and just leave the master worker to handle the workers?
So overall I wanted to have the following elements; a service to handle the request from the app (my main point of traffic), a website (fairly simple, a couple of pages and a registration form), an SQL database to store user information, a webpage (probably locally hosted on the server machine) which only I can access that hosts information about the server (users connected, worker restarts, server logs, other useful information etc) and apparently nginx would be a good idea where I'm handling multiple node processes accepting connection from the app. After doing research I've also found that it would probably be best to host on a VPS initially. I was thinking at first when the amount of traffic the app service would be receiving will most likely be fairly low, I could run all of those elements on one VPS. Or would it be best to have them running on seperate VPS's except for the website and the server status webpage which I could run on the same one? I guess this way if there is a hardware failure and something goes down, not everything does and I could run 2 instances of the app service on 2 different VPS's so if one goes down the other one is still functioning. Would this just be overkill? I doubt for a while I would need multiple app service instances to support the traffic load but it would help reduce the apparent down time for users.
Maybe this all depends on what I value more and have the time to do? A more complex server setup that costs more and maybe a little unnecessary but guarantees a consistent and reliable service, or a cheaper and simpler setup that may succumb to downtime due to coding errors and server hardware issues.
Also it's worth noting I've never had any real experience with production level servers so in some ways I've jumped in the deep end a little with this. I feel like I've come a long way in the past half a year and feel like I'm getting a fairly good grasp on what I need to do, I could just do with some advice from someone with experience that has an idea with what roadblocks I may come across along the way and whether I'm causing myself unnecessary problems with this kind of setup.
Any advice is greatly appreciated, thanks for taking the time to read my question.

Should each website be its own `node.js` process

We host about 150 websites (possibly scaling to 300+) that we are considering migrating to node.js. Most of the sites are fairly low traffic <1mil pageviews per month.
Should each website be it's own node.js process, or should we serve all websites using the same node.js process (or small set of load balanced processes). Is there a technical limit or a reasonable limit to the number of node processes per server?
Process per site: Feels inefficient, but I don't know if it actually is inefficient. Would ensure one buggy site doesn't affect other sites.
Process per core/small set of processes: Likely higher performance, but what happens when I need to update a sites codebase, won't it take down other sites? Also, code failures in one site would affect other sites.
Ideally, I would prefer one process per site so that we could host all sites from each worker server. That way when load increases we can just spin up another identical worker server and load balance between the two without having to arbitrarily say SiteA goes to ServerA and SiteB goes to ServerB. Any node.js gurus available to offer some wisdom?
All static file requests will be handled likely by Nginx or something like Varnish.
There are a lot of issues at play here. The big picture answer is, it depends... as it always does when you bring in the whole "performance" discussion. That being said, the simplest way to get a solid Node set up is to note the following basic facts about NodeJS, and I will also comment on their implications as they pertain to your questions.
The concurrency you get with Node works really good in certain situations, namely IO heavy operations. What we're really talking about here is minimizing the amount of downtime to wait for the next request. Because of this, Node works really well in an environment where there is one process per core on a machine. Node does really well at maximizing the amount of CPU available to serve requests under heavy load. This being said, if you have literally ZERO other work going on in your even loop, you can see minor performance increases (in terms of max requests/second/processor core) by having multiple node processes per core. But, I've never seen any benefit from increasing this number past 3. Even under circumstances where the entire event loop was literally just a file server.
On the process per site comment. This is a bad idea for many reasons. For one, a well put together node server can process thousands of requests per second. Our (company name omitted) servers, hosted through Amazon EC2 on medium clusters (lots of ram, mid CPU clock, 4 cores), typically fail around 3000 requests per second per cluster. Our servers do a fair bit of CPU work, for simple file servers I'm sure you can do much better. Strictly speaking, sure, per site, you will be able to serve more requests by launching each site in its own process/core/escalating quickly here! But it's not necessary from a cost and over complication of your architecture point of view. What I WOULD recommend, is investing in a setup with a lot of RAM. The ability for your server to cache often requested files will effect your performance infinitely more than launching an abundance of processes for a given machine.
On the whole RAM thing. The number of processes you want to launch for a given core is dependant on two things. One is how much synchronous work done in your event loop. The more synchronous work, the more time between a given request coming in and the event loop being ready to adress the next one. If you have a busy event loop, you will be in a situation where you require more processes/CPU Core. The other thing that can effect this, particularly relevant for file servers, is the amount of RAM. Node runs much better in a high ram environment, but you can say this about ANY file server really... What this has to do with, is the number of active asynchronous operations. One downside of the way node works, is under heavy loads, you can get a large number of event handlers active at once. This is great for concurrency/simplicity, however, if your server is busy waiting around for a lot of async disk/IO to happen it will slow down and crash much sooner than if you had plenty of RAM. If you don't have enough RAM to handle all of these event handlers, you will want to keep to the 1 process/core arrangement. Otherwise, it is easier for Node to spin up many event handlers simultaneously, and again cause you to crash sooner than you would otherwise.
I don't really have enough information to tell you what you SHOULD do. This depends entirely too much on the architecture of your specific server, sites, size of your sites, amount of data... etc. But these three pieces of knowledge are the basic things that help you get the most out of your Node server. To be honest, your idea about load balancing mixed with the considerations above, should do nicely for you. Surely, microoptimizations are possible, but if you do these things, you should easily see requests/second in the thousands before you start experiencing crashes because of DDOS type of conditions.
No, don't do it. Keep it simple! And check out http://12factor.net/.
A few hundred processes is nothing compared to the simplicity you otherwise lose. It would be a terrible decision, on so many levels, to have more than one site (or, "logical application unit") served by a single Node process.
If you're asking this question, you may want to explore Node more before you "migrate" to Node. Error handling and separation of concerns are more complicated in Node than in other situations. Specifically, neither the domain nor cluster APIs are mature. But really it's the philosophy of clean and simple application deployment that you'd be violating. I could go on and on.

Are databases attached to dynos in heroku?

I want to try out heroku, but am not quite sure if I understand all terms correctly.
I have an app with node.js and redis & my main focus is scaling and speed.
In a traditional environment I would have two servers in front of a load balancer; both servers are totally independent, share the same code and have an own redis instance. Both servers don't know of each other (the data is synched by a third party server, but that is not of interest for that case).
I would then push a load balancer in front of them. Know I could easily scale, as both instances are not aware of each other and I could just add more instances if I wish.
Can I mirror that environment in a dyno or can't I attach a redis instance to a dyno?
If something is unclear, please ask, as I'm new to paas!
As I understand it: I would have a dyno for my node-app and would just add another instance of it. That's cool, but would they share the same redis or can I make them independent?
You better forget traditional architectures and try to think it this way:
A dyno is a process processing HTTP requests, the absolute minimum of an app instance on heroku.
For one application instance you can have as many dynos you want and
it is totally transparent . No need to think about servers, load
balancing, etc... everything is taken care.
A redis instance is a basically a service used by the application
instance and therefore by one or more dynos. Again, servers, load
balancing, etc all is taken care.
Maybe you want to review the How it works on heroku.com now again.
You can have as many dynos for one URL as you want - you just change the value in the controller. This is actually one of the best features of Heroku - you don't care about servers, you increase the number of dynos and by this increase the number of requests which can be processed simultaneously.
Same thing with redis - it basically doesn't work that you add instances, you just switch to a more performant plan, see https://addons.heroku.com/redistogo. Again, forget about servers.

Why do IIS application pools need to be recycled?

Application pools in IIS are recycled very frequently and I can't figure out why. I remember reading about a possible issue in IIS6 that meant you were forced to recycle but a quick search now turns up empty. On IIS6 or 7 you can turn off the idle time, duration and specific time recycle options so no problems there.
So why does every .net site recycle the application pool? If a site didn't have any memory leaks could you set up a site that never needed to recycle?
Also failing this, what would be the best way to ensure background tasks are called, is there any auto restart modules for IIS or should an external service be used to make those calls?
It sounds like it is possible to do if you really wanted/needed to?
Websites are intended to keep running (albeit in a stateless nature). There are a myriad of reasons why app pool recycling can be beneficial to the hosting platform to ensure both the website and the server run at optimum. These include (but not limited to) dynamically compiled assemblies remaining in the appdomain, use of session caching (with no guarantee of cleanup), other websites running amok and resources getting consumed over time etc. An app pool can typically serve more than one website, so app pool recycling can be beneficial to ensure everything runs smoothly.
Besides the initial boot when the app fires up again, the effect should be minimal. Http.sys holds onto requests while a new worker process is started so no requests should be dropped.
From https://weblogs.asp.net/owscott/why-is-the-iis-default-app-pool-recycle-set-to-1740-minutes
You may ask whether a fixed recycle is even needed. A daily recycle is
just a band-aid to freshen IIS in case there is a slight memory leak
or anything else that slowly creeps into the worker process. In theory
you don’t need a daily recycle unless you have a known problem. I used
to recommend that you turn it off completely if you don’t need it.
However, I’m leaning more today towards setting it to recycle once per
day at an off-peak time as a proactive measure.
My reason is that, first, your site should be able to survive a
recycle without too much impact, so recycling daily shouldn’t be a
concern. Secondly, I’ve found that even well behaving app pools can
eventually have something sneak in over time that impacts the app
pool. I’ve seen issues from traffic patterns that cause excessive
caching or something odd in the application, and I’ve seen the very
rare IIS bug (rare indeed!) that isn’t a problem if recycled daily. Is
it a band-aid? Possibly, but if a daily recycle keeps a non-critical
issue from bubbling to the top then I believe that it’s a good
proactive measure to save a lot of troubleshooting effort on something
that probably isn’t important to troubleshoot. However, if you think
you have a real issue that is being suppressed by recycling then, by
all means, turn off the auto-recycling so that you can track down and
resolve your issue. There’s no black and white answer. Only you can
make the best decision for your environment.
There's a lot more useful/interesting info for someone relatively unlearned in the IIS world (like me), I recommend you read it.

Resources