how does microservice architecture with sync communication scales? [closed]

how does microservice architecture with sync communication scales? [closed] - node.js

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Suppose I have two services book service and order service.
Book service gives information about books - name, id, publishing house, summary, author etc - basically finding books for the user based on some query, it may also use Machine learning. It has one endpoint /books/search which takes some query and gives out results.
Order service helps in creating an order using userid and book id. It also has one endpoint /order/create.
There is one relational database, which holds books, orders and users. It also has some read replicas to make querying faster for book service. I have a books.js file for book service, which I deployed on EC2 instance and order.js file which I deployed on another EC2 instance.
There is a api gateway which forwards the request to appropriate EC2 instance.
Suppose the traffic on books service increase drastically, how do I scale the book service programmatically? does AWS API Gateway help in that?
if from the start I deployed multiple instances of books service, lets say 3 instances, can they have the same address and port so that API gateway forward user request easily OR they need to have different address and port. Does API gateway has the capability to load balance the requests to book service EC2 instances or will I need to add a load balancer also?
I understand how scaling with async communication is done - basically you have a queue from SQS, there is EC2 instance that monitors the queue and spins up or down the EC2 instances based on size of the queue. There is some place to store the status of EC2 instances based on which the monitor instance scales down the instances. The created instances take messages from the SQS queue, do the processing, store the result somewhere, sends the notification like email to the user.
In this however the user doesnt get the response immediately. I want the user to get immediate response.
Kindly answer the question without using function service like lambda or docker service like ECS as I don't understand them fully.
Also, can anyone point to some good resources to learn about microservices with nodeJS and AWS. When I tried to find such resources, I found them to be very high level and not really talking in step by step manner.

Suppose the traffic on books service increase drastically, how do I scale the book service programmatically?
You don't need to scale the service programatically. The default way is to place the service instance into an autoscaling group and place an internal load balancer in front of the autoscaling group. Now the task would be to find the proper metrics to scale the service.
https://aws.amazon.com/about-aws/whats-new/2020/03/api-gateway-private-integrations-aws-elb-cloudmap-http-apis-release/
does AWS API Gateway help in that?
Actually it may, depends on.. The API GW could provide additional metrics you could use to scale the service, but I'm not sure how useful the metrics will be for your case
https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-metrics.html

Related

Isolate AWS lambda function

In a hobby side project I am creating an online game where the user can play a card game by implementing strategies. The user can submit his code and it will play against other users strategies. Once the user has submitted his code, the code needs to be run on server side.
I decided that I want to isolate code execution of user submitted code into an AWS lambda function. I want to prevent the code from stealing my AWS credentials, mining cryptocurrency and doing other harmful activity.
My plan is to do following:
Limit code execution time
Prevent any communication to internet & internal services (except trough the return value).
Have a review process in place, which prevents execution of user submitted code before it is considered unharmful
Now I need your advice on how to achieve best isolation:
How do I configure my function, so that it has no internet access?
How do I configure my function, so that it has no access to my internal services?
Do you see any other possible attack vector?

How do I configure my function, so that it has no internet access?
Launch the function into an isolated private subnet within a VPC.
How do I configure my function, so that it has no access to my internal services?
By launching the function inside the isolated private subnet you can configure which services it has access to by controlling them via the security groups and further via Route Table this subnet attached including AWS Network ACLs.
Do you see any other possible attack vector?
There could be multiple attack vectors here :
I would try to answer from the security perspective in AWS Services. The most important would be to add AWS Billing Alerts setup, just in case there is some trouble at least you'll get notified and take necessary action and I am assuming you already have MFA setup for your logins.
Make sure you configure your lambda with the least privilege IAM Role
Create a completely separate subnet dedicated to launching the lambda function
Create security for lambda and control this lambda access to other services in your solution.
Have a separate route table for the subnet where you allow only the selected services or be very specific with corresponding IP addresses as well.
Make sure you use Network ACLs to configure all the outgoing traffic from the subnet by adding ACL as well as an added benefit.
Enable the VPC flow logs and have the necessary Athena queries with analysis in place and add alerts using AWS CloudWatch.
The list can be very long when you want to secure this deployment fully in AWS. I have added just few.

I'd start by saying this is very risky and allowing people to run their own code in your infrastructure can be very dangerous. However, that said, here's a few things:
Limiting Code Execution Time
This is already built in to Lambda. Functions have an execution limit on time which you can configure easily through IaC, the AWS Console or the CLI.
Restricting Internet Access
By default Lambda functions can be thought of as existing outside the constraints of a VPC for more applications. They therefore have internet access. I guess you could put your Lambda function inside a private subnet in a VPC and then configure the networking to not allow connections out except to locations you want.
Restricting Access to Other Services
Assuming that you are referring to AWS services here, Lamdba functions are bound by IAM roles in relation to other AWS services they can access. As long as you don't give the Lambda function access to something in it's IAM role, it won't be able to access those services unless a potentially malicious user provides credentials via some other means such as putting them in plain text in code which could be picked up by an AWS SDK implementation.
If you are referring to other internal services such as EC2 instances or ECS services then you can restrict access using the correct network configuration and putting your function in a VPC.
Do you see any other possible attack vector?
It's hard to say for sure. I'd really advise against this completely without taking some professional (and likely paid and insured) advice. There are new attack vectors that can open up or be discovered daily and therefore any advice now may completely change tomorrow if a new vulnerability is discovered.
I think your best bets are:
Restrict the function timeout to be as low as phyisically possible (allowing for cold starts).
Minimise the IAM policy for the function as far as humanly possible. Careful with logging because I assume you'll want some logs but not allow someone to run GB's of data in to your CloudWatch logs.
Restrict the language used so you are using one language that you're very confident in and that you can audit easily.
Run the lambda in a private subnet in a VPC. You'll likely want a seperate routing table and you will need to audit your security groups and network ACL's closely.
Add alerts and VPC logs so you can be sure that a) if something does happen that shouldn't then it's logged and traceable and b) you are able to automatically get alerted on the problem and rectify it as soon as possible.
Consider who will be reviewing the code. Are they experienced and trained to spot attack vectors?
Seek paid, professional advice so you don't end up with security problems or very large bills from AWS.

Using Azure Service Fabric to Manually Control and Spawn Job-Processing Agents

Currently I'm investigating possibility to use Azure Service Fabric and its Reliable Services in order to implement my problem domain architecture.
Problem domain: I am currently doing a research on distributed large-scale web crawling architectures involving dozens of parallel agents that should crawl web-servers and download resources for further indexing.
I've found useful academic paper which describes Azure-based distributed web-crawling architecture: Link to .pdf paper and I'm trying to implement and try out prototype based on this design.
So basic high-level look of design is something like this figure below:
The idea: Central Web Crawling System Engine (further - CWCE) runs in an infinite loop until program is aborted and fetches Service Bus Queue Message which contains URL of page to be crawled. CWCE component then checks hostname of this URL and consults Agent Registrar SQL database if alive agent already exists for given hostname. If not, CWCE then does one of the following procedures:
If number of alive agents (A_alive) is equal to Max value (upper bound limit of agents, provided by application administrator) CWCE waits until A_alive < Max value
If A_alive < Max, CWCE tries to create new Agent and assign hostname to it. (agent is then registered in SQL Registrar database).
Each Agent runs on its own partition (URL hostname, for example: example.com) and recursively crawls only pages of this hostname while discovering external hostnames URLs and adding them to Service Bus Queue for other agent processings.
The benefit of this architecture would be horizontal scaling of agents and near-linear workload increase of crawling effectiveness.
However, I am very new in Azure Service Fabric and therefore would like to ask if this PaaS layer is capable of solving this problem? Main questions:
Would it be possible to manually create new web crawling agent instances through the programmable code and pass them hostname parameter using Azure Service Fabric? (Maybe using FabricClient class for manipulating cluster and creating service instances?)
Which ASF programming model fits this parallel long-running agents scenario the best? Stateless services, stateful services or Actor Model? Each agent might run as long-running task, since it recursively crawls specific hostname URLs and listens for the queue.
Would it be possible to control and change this upper bound limit of Max alive agents during runtime of application?
Would it be possible to have infinite-loop stateless service CWCE component which continuously listens for the queue messages in order to spawn up new agents?
I am not sure whether the selected ASF PaaS layer is the best solution for this distributed web-crawling system use-case, so your insights would be so much valuable for me. Any helpful resource links would also be so beneficial.

Service Fabric will allow you to implement the architecture that you want.
Would it be possible to manually create new web crawling agent instances through the programmable code and pass them hostname parameter using Azure Service Fabric? (Maybe using FabricClient class for manipulating cluster and creating service instances?)
Yes. The service you will develop and deploy to Service Fabric will be a ServiceType. Service Types don't actually run, instead, from the ServiceType you can create the actual Services, which are named. A single Service (eg ServiceA), will have a number of Instances, to allow scaling and availability. You can programmatically create and remove services of a given type and pass parameters to them, so every service will know what URL to crawl.
Check an example here.
Which ASF programming model fits this parallel long-running agents scenario the best? Stateless services, stateful services or Actor Model? Each agent might run as long-running task, since it recursively crawls specific hostname URLs and listens for the queue.
I would choose Stateless services, because they will be the most efficient in terms of resource utilization and the easiest to manage (no need to store state and manage state, partitioning and replicas). The only thing you need to consider is that every service will eventually crash and restart, so you need to store the current crawling location in a permanent store, not in memory.
Would it be possible to control and change this upper bound limit of Max alive agents during runtime of application?
Yes. Service Fabric services run in Nodes (Virtual Machines) and in Azure, they are managed by Virtual Machine Scale Sets. You can easily add and remove nodes from the VMSS which Will allow you to adjust the total compute and memory power that you want and the actual number of services is already controlled by you as specified in point 1.
Would it be possible to have infinite-loop stateless service CWCE component which continuously listens for the queue messages in order to spawn up new agents?
Absolutely. Message-driven microservices are very common. It's technically not an infinite loop, but a service with a Bus Communication Listener. I found one here as a reference, but I don't know if it's production ready

Which Azure services are PaaS [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm trying to compare AWS and Azure for a custom web app that's essentially like any canned content management system. It requires web hosting, database, email, storage, security, some way to process ASP.NET but with high availability and load balanced.
The PaaS/IaaS distinction can sometimes be grey (in part because companies tend to use marketing jargon that portrays IaaS type services as maintenance free). From a small business perspective its quite clear though. If a service involves the SMB spend time maintaining rather than developing, its in the IaaS camp. Since I'm a single developer with limited time, a PaaS model for all services would be preferable. The ideal would be all services (web hosting, database, email, etc are offered as a zero maintenance scalable service rather than have to spin up and manage individual instances.
I find AWS can do everything but a drawback is that one still needs to manage instances (i.e. I would need to keep the software on instances updated, track instances, manage network, security, etc.) S3 doesn't process scripts. AWS Beanstalk and Optworks are still essentially mostly helper apps for starting up an IaaS type environment. (whereas say DynamoDB would count as a PaaS type service). Recently Microsoft has dropped prices on Azure which makes it an attractive alternative
In short, I am looking for a list of services offered by Azure which are actually no maintenance services that don't require I patch software or need to spin up instances to handle traffic spikes (e.g. web hosting, script processing, database, email, etc..)

web hosting, database, email, storage, security, some way to process ASP.NET but with high availability and load-balanced
All of the above are standard features which any matured cloud provider will have in the toolkit. In regards to MSFT Azure:
For web hosting - you have PaaS solutions such as App service plan
and App service environment. The upkeep of the platform (as the name suggests) is with Azure but note that any components that you deploy as part of the package belong to dev and test teams respectively
For database and storage - for a complete PaaS solution you have Azure SQL Server Database and Azure SQL Server Managed Instances, but as I said earlier you will anyways have to own any custom deployment (security policies, VNET injection and IAM's yourself)

Azure service bus: Use functions vs service fabric vs web job? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I am considering three ways to to build a service bus topic listener:
Azure functions: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-service-bus
Service fabric: https://iamrufio.com/2017/04/21/azure-service-bus-listener-with-azure-service-fabric/
Web job: https://code.msdn.microsoft.com/Processing-Service-Bus-84db27b4
I'm not sure which way to go. I'm leaning towards Azure functions since it has a direct out-of-the-box service bus integration. However since it's fairly new I'm not sure if it's a safe option.
Service fabric, from what I've read, offers most resiliency and support.
And a web job would be safest to pick since everything is easily configurable but I'm afraid I'll be reinventing the wheel as no out-of-the-box support is provided.
What direction would be best?

It's a very open ended question. You should look at the requirements that you have and other constraints such as budget. For example, running a production grade Service Fabric cluster would require at least 5 nodes. Versus running webjob that would require a hosting plan with some scale out (for HA). Versus running with Azure Functions using consumption plan, where you'd pay per execution only after free grant 1 million requests and 400,000 GB-s of resource consumption per month is used up.
I would suggest to start simple, with Azure Functions. Create your prototype and see if that's what you need. Are you running into issue or not. With Functions utilization of Azure Service Bus could be somewhat limited. For example, you can't dead-letter a message as you either have to return successfully to complete it or throw an exception to retry. You can't defer a message, rather instead would need to send another message. Nor can you use transactional option by using send-via feature of Azure Service Bus.
If you find yourself requiring those features, WebJob would be my next candidate. You will have to look how you'd utilize it. Most likely you'll need to create your own receiving pump and handle things Functions offered for free, but you'll have the flexibility required to create multiple connections, configure clients the way you need, etc.
And only after that, if you see that aside from Service Bus you have requirements like data partitioning, or HA, or DR, or deploying and scaling out multiple services, I'd be more serious about Service Fabric.
Each of these 3 technologies has its place and use cases.

Waiting for a service to be ready (Service Fabric)

I have four services running on Azure Service Fabric, but two of those 4 services depend on another one, is there a way to make a service initialization wait until another service announces it is ready?

No. There's no ordering to service creation (services can be created at any time, not just during a deployment from your build machine), and what does it even mean for your service to be ready? From our perspective it means the Failover Manager found nodes that the service is able to run on and the code packages have been activated on those nodes. The platform doesn't know what your service code does though. From your perspective it probably means "when it's responding to my requests" otherwise it's not "ready," which can happen at any time during the service's lifetime for any number of reasons:
Service was just deployed and its communication stack hasn't opened an endpoint yet
Service instance/replica moved and its communication stack is spinning back up on a new node
Service partition is in quorum loss and not accepting write operations
etc.
This is an ongoing thing that your services need to be prepared to handle. If two of services can't do any work until they are able to talk to another service, then they need to poll for that service they depend on until it's available through an endpoint on that service that you define.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string