I have a Rails app which uses PhantomJS in order to parse pages. I have moved the PhantomJS parser into a NodeJS server. Since PhantomJS is expensive to process, I just have one PhantomJS instance per CPU (using the cluster within a server).
Now, I want to be able to have more machines with NodeJS, and from Rails be able to just say: hey, process this URL. Right now is working with just one machine. But I don't know how would you structure this multi-server NodeJS servers.
So, right now whenever I want to parse a site, I send a request to my NodeJS machine, and once that has been parsed, NodeJS posts backs to Rails. But, how do you scale parallely with multiple servers in NodeJS, in a way that it is clever enough to when a URL is received it is sent to the server that has less jobs processing?
Basically, you'll need to build a proxy server on top of your application servers to handle the routing of traffic.
Not sure which hosting provider you are using, but AWS makes this really easy with Opsworks or Elasticbeanstalk.
Some options include:
AWS OpsWorks
AWS Elastic Beanstalk
Custom HA Proxy Load Balancer
AWS Opsworks Using an AWS elastic load balancer (ELB), you can spin up a new NodeJS stack in the AWS Opsworks service and attach an elastic load balancer to these instances. From there you can designate an autoscaling instance to grow and handle more traffic as needed. AWS ELBs round robin traffic to instances depending on a number of configurable metrics (CPU load, time, network traffic).
AWS Elastic Beanstalk: See this tutorial on how to build a nodejs app using cluster in an autoscaling environment.
HA Proxy: If you're looking to run your own load balancer, you could look into spinning up your own load balancer to handle the traffic.
Related
I created a Node server that receives events through webhooks, handles them, and posts their data to one API endpoint. Currently I'm deploying it using AWS Elastic Beanstalk, but I don't know if it's the best option.
I don't need load balancers.
I don't need web servers like Apache/Nginx.
My Node server does not have any ports to receive requests, since it's a simple server that only handles webhooks events. So the EBS service will always be without metrics for requests (severe health status - because doesn't handle any of the health requests).
Should I use another type of AWS service? Docker?
Finally, I went for it with the App Runner AWS service for running containers. No load balancers, just elastic sizing. No web servers.
I have a node.js RESTful API application. There is no web interface (at least as of now) and it is just used as an API endpoint which is called by other services.
I want to host it on Amazon's AWS cloud. I am confused between two options
Use normal EC2 hosting and just provide the hosting url as the API endpoint
OR
Use Amazon's API Gateway and run my code on AWS Lambda
Or can I just run my code on EC2 and use API Gateway?
I am confused on how EC2 and API Gateway are different when it comes to a node.js RESTful api application
Think of API Gateway as an API management service. It doesn't host your application code, it does provide a centralized interface for all your APIs and allows you to configure things like access restrictions, response caching, rate limiting, and version management for your APIs.
When you use API Gateway you still have to host your API's back-end application code somewhere like Lambda or EC2. You should compare Lambda and EC2 to determine which best suits your needs. EC2 provides a virtual Linux or Windows server that you can install anything on, but you pay for every second that the server is running. With EC2 you also have to think about scaling your application across multiple servers and load balancing the requests. AWS Lambda hosts your functions and executes them on demand, scales out the number of function containers automatically, and you only pay for the number of executions (and it includes a large number of free executions every month). Lambda is going to cost much less unless you have a very large number of API requests every month.
I've got a NodeJS application that does some moderately intense logic work when a user requests it. For example, a user on the frontend can click Analyze and the server will perform the work, which could take 30 seconds to 1 minute (non-blocking)
My app is not aimed at the wide public but at an audience of a few thousand. So there is a chance that several people might analyze at the same time.
I'm currently planning to deploy the app via Elastic Beanstalk, but I am not sure exactly how it will deal with a server when it is busy and if I have to implement some kind of custom signal to tell the load balancer to send requests to another instance, if the current one is busy performing analysis.
I understand that Lambdas are often held up as an option in this case, but I would much prefer to keep it simple and keep the code in my Node app.
How should I design this to ensure the app could handle doing analysis and still handling other requests normally?
Elastic Beanstalk uses Autoscaling Group to launch and maintain the EC2 instances required to run the Application. With Autoscaling Groups you can increase/decrease the EC2 instance count dynamically with Autoscaling Scaling policies. By default, Autoscaling Group provides scaling based on CPU, Network IN, Network Out, Request Count, Latency etc.. You can use any of these metrics and Scale-up your infrastructure dynamically.
You can refer to AWS Documentation here for more information.
As developers we wrote microservices on Azure Service Fabric and we can run them in Azure in some sort of PaaS concept for many customers. But some of our customers do not want to run in the cloud, as databases are on-premises and not going to be available from the outside, not even through a DMZ. It's ok, we promised to support it as Azure Service Fabric can be installed as a cluster on-premises.
We have an API-gateway microservice running inside the cluster on every virtual machine, which uses the name resolver, and requests are routed and distributed accordingly, but the API that the API gateway microservice provides is the entrance for another piece of client software which our customers use, that software runs outside of the cluster and have to send requests to the API.
I suggested to use an Load Balancer like HA-Proxy or Nginx on a seperate machine (or machines) where the client software send their requests to and then the reverse proxy would forward it to an available machine inside the cluster.
It seems that is not what our customer want, another machine as load balancer is not an option. They suggest: make the client software smarter to figure out which host to go to, in other words: we should write our own fail-over/load balancer inside the client software.
What other options do we have?
Install Network Load Balancer Feature on each of the virtual machine to give the cluster a single IP address, is this even possible? Something like https://www.poweradmin.com/blog/configuring-network-load-balancing-in-windows-server/
Suggest an API gateway outside the cluster, like KONG https://getkong.org/
Something else ?
PS: The client applications do not send many requests per second, maybe a few per minute.
Very similar problem, we have a many services and Service Fabric Cluster that runs on-premises. When it's time to use the load balancer we install IIS on the same machine where Service Fabric cluster runs. As the IIS is a good load balancer we use IIS as a reverse proxy only for API Gateway. Kestrel hosting is using for other services that communicate by HTTP. The API gateway microservice is the single entry point for all clients and has always static URI inside SF, we used that URI to configure IIS
If you do not have possibility to use IIS then look at Using nginx as HTTP load balancer
You don't need another machine just for HTTP forwarding. Just use/run it as a service on the cluster.
Did you consider using the built in Reverse Proxy of Service Fabric? This runs on all nodes, and it will forward http calls to services inside the cluster.
You can also run nginx as a guest executable or inside a Container on the cluster.
We have also faced the same situation when started working with service fabric cluster. We configured Application Gateway as Proxy but it would not provide the function like HTTP to HTTPS redirection.
For that, we configured Nginx Instead of Azure Application Gateway as Proxy to Service Fabric Application.
Our system has 3 main components:
A set of microservices running in AWS that together comprise a webapp.
A very large monolithic application that is hosted within our network, and comprises of several other webapps, and exposes a public API that is consumed by the AWS instances.
A locally hosted (and very large) database.
This all works well in production.
We also have a testing version of the monolith that is inaccessible externally.
I would like to able to spin up any number of copies of the AWS environment for testing or demo purposes that can access the demo testing version of the monolith. However, because it's a test system, it needs to remain inaccessbile to the public. I know how to achieve this with AWS easily enough (security groups etc.), but how can I secure the monolith so it can be accessed ONLY by any number of dynamically created instances running in AWS (given that the IP addresses are dynamic and can therefore not be whitelisted)?
The only idea I have right now is to use an access token, but I'm not sure how secure that is.
Edit - My microservices are each running on an EC2 instance.
Assuming you are running your microservices on EC2, if you want API calls from your application servers running in AWS to come from a known IP/IPs then this can be accomplished by using a NAT instance or a proxy. This way even though your application servers are dynamic, the apparent source of the requests is not.
For a NAT you would run your EC2 instances in a private subnet and configure them to send all of their Internet traffic out over the NAT instance which will have a constant IP. Using a proxy server or fleet of proxy servers can be accomplished in much the same way, but would require your microservice applications be configured to use it.
The better approach would be to simply not send the traffic to your microservices over the public Internet.
This can be accomplished by establishing a VPN from your company network to your VPC. Alternatively, you could establish a Direct Connect to bridge the networks.
Side note, if your microservices are actually running in AWS Lambda then this answer does not apply.