I have a service using Azure Kubernetes cluster and AKS load balancer.
I want to forward some HTTP(client) requests to all instances.
is there any way to configure this behavior using AKS or Kubernetes in general?
Say I have XYZ API running two replicas/instances.
XYZ-1 pod instance
XYZ-2 pod instance
I have some rest API requests to the app domain.com/testendpoint
Currently, using AKS load balancer it sends the requests in a round-robin fashion to XYZ-1 and XYZ-2. I am looking to see if it is possible to forward a request to both instances (XYZ-1 and XYZ-2) when the request endpoint is testendpoint and all other API requests use the same round-robin order.
The use case to refresh a service in-memory data via a rest call once a day or twice and the rest call will be triggered by another service when needed. so want to make sure all pod instances update/refresh in-memory data by an HTTP request.
if it is possible to forward a request to both instances (XYZ-1 and XYZ-2) when the request endpoint is testendpoint
This is not a feature in the HTTP protocol, so you need a purpose built service to handle this.
The use case to refresh a service in-memory data via a rest call once a day or twice and the rest call will be triggered by another service when needed. so want to make sure all pod instances update/refresh in-memory data by an HTTP request.
I suggest that you create a new utility service, "update-service" - that you send the call once a day to. This service then makes a request to every instance of XYZ, like XYZ-1 and XYZ-2.
Related
I'm in the process of creating a kubernetes application. I have one microservice that performs an operation on data that it receives.
My API does some basic authentication and request validation checks, however here is where I'm confused as to what to do.
The API gateway has two endpoints, one which performs an individual operation, forwarding the request to the microservice, and another which receives an array of data, which it then sends off as individual requests to the microservice, allowing it to scale independently
The main thing I'm worried about is regarding scaling. The microservice depends on external APIs and could take a while to respond, or could fail to scale quickly enough. How would this impact the API gateway?
I'm worried that the API gateway could end up being overwhelmed by external requests, what is the best way to implement this?
Should I use some kind of custom metric and somehow tell kubernetes to not send traffic to api gateway pods that are handling more than X requests? Or should I set a hard cap using a counter on the API gateway to limit the number of requests that pod is handling by returning an error or something?
I'm use node.js for the API gateway code so aside from memory limits, I'm not sure if there's an upper limit to how many requests the gateway can handle
Is there a way to exclude an AppService instance from the Load Balancer:
Via the portal?
Via the SDK?
Via the SDK would be ideal, then we could set the MakeVisibleToLoadBalance flag (if such a thing existed) once all initialization completed.
If it's only available via the portal, it would be good to set n seconds after an instance is loaded before it becomes visible to the load balancer.
Reason:
When we restart an instance (e.g. via advanced restart), the metrics show a significant increase in response times, every time.
I believe the cause is the load balancer thinks the machine is available but it really hasn't completed initialization, so requests that the load balancer sends to that instance are significantly delayed.
Another reason is we may observe an instance is performing poorly, it would be great if we could exclude that instance until either it recovered or was restarted.
//As per the discussion with wallismark in the 'comments'. Copied the helpful comments to answer.
To fix the 'reason'/scenarios you have mentioned above, you could leverage ApplicationInitialization method. Every time your application starts, this can be because of a new worker coming online (horizontal scaling) or even just a cold start caused by a new deployment, config change etc. The ApplicationInitialization will be executed to warm up the site before accepting requests on that worker.
So the Application Initialization Module, handy feature that allows you to warm your app prior to the application receiving requests to help avoid the cold-start or slow initial load times when the app is restarted. Please checkout - https://ruslany.net/2015/09/how-to-warm-up-azure-web-app-during-deployment-slots-swap/
- It has also been implemented for all other operations in which a new worker is provisioned (such as auto scale, manual scale or Azure fabric maintenance). But, you cannot exclude the instance from the load balancer.
If your requirement fits, you could leverage ARR affinity; in a multi-instance deployment, ensures that the client is routed to the same instance for the life of the session. You can set this option to Off for stateless applications.
Typically, the Scale-out (trigger) -multiple running copies of your WebApps and handle the load balancing configurations necessary to distribute incoming requests across all instances. When you have more than one instance a request made to your WebApp can go to any of them using a load-balancer that will decide which instance to route the request based on how busy each instance is at the time.
To share more information on this feature - On load-balancer is that once a request from your browser is made to the site, it will add a ARRAffinity cookie to it (with the response) containing the specific instance id that will make the next request from this browser go to the same instance. You can use this feature to send a request to a specific instance of our site. You can find the setting in the App Service's Application Settings:
When multiple apps are run in the same App Service plan, each scaled-out instance runs all the apps in the plan.
Details
I need to permanently expose an API. I will be doing this using a simple node.js Express app running in a Docker container managed by AWS ECS.
The endpoints exposed by the app's API have to be constantly available, although cold starts won't be a problem so I am happy to spin up a container from scratch per request.
My Actual Question
In order to keep the costs down, I was wondering if it's possible to keep the API exposed via an AWS ALB and have the requests routed to the Docker container, but have the request wait to be handled until the container has been spun up? So, in essence, the container is sleeping between requests.
Extra Context
I cannot use API Gateway and Lambda functions for this as the payload is too large (over 10mb) for API Gateway to handle, and I cannot use a pre-signed URL from S3 because the inbound request is going to be a POST request of content-type application/json which means that it cannot be handled by pre-signed S3 URLs.
I have one Azure App Service in which I have created 5 instances using App Service Plan
Scale Out option. Now I am not sure how does Azure load balances requests between this instances? I am not seeing any load balancer for it.
Also how can I know that which request is being served by which instance?
The load balancer is created automatically and you can't see it.
Basically it sends the requests to instances at random, though it can be made "sticky" with ARR Affinity.
You can find the setting in the App Service's Application Settings:
If it is on, the load balancer will attach a cookie to responses if they don't already have it.
It makes it so that future requests hit the same instance.
Though of course if the instance is no longer there (because of auto-scale for example), then it will again go to a random instance.
The WEBSITE_INSTANCE_ID environment variable can tell you in the back-end which instance is handling the request.
You can find a list of available variables here: https://github.com/projectkudu/kudu/wiki/Azure-runtime-environment
I was working on a side project and i deiced to redesign my Skelton project to be as Microservices, so far i didn't find any opensource project that follow this pattern. After a lot of reading and searching i conclude to this design but i still have some questions and thought.
Here are my questions and thoughts:
How to make the API gateway smart enough to load balnce the request if i have 2 node from the same microservice?
if one of the microservice is down how the discovery should know?
is there any similar implementation? is my design is right?
should i use Eureka or similar things?
Your design seems OK. We are also building our microservice project using API Gateway approach. All the services including the Gateway service(GW) are containerized(we use docker) Java applications(spring boot or dropwizard). Similar architecture could be built using nodejs as well. Some topics to mention related with your question:
Authentication/Authorization: The GW service is the single entry point for the clients. All the authentication/authorization operations are handled in the GW using JSON web tokens(JWT) which has nodejs libray as well. We keep authorization information like user's roles in the JWT token. Once the token is generated in the GW and returned to client, at each request the client sends the token in HTTP header then we check the token whether the client has the required role to call the specific service or the token has expired. In this approach, you don't need to keep track user's session in the server side. Actually there is no session. The required information is in the JWT token.
Service Discovery/ Load balance: We use docker, docker swarm which is a docker engine clustering tool bundled in docker engine (after docker v.12.1). Our services are docker containers. Containerized approach using docker makes it easy to deploy, maintain and scale the services. At the beginning of the project, we used Haproxy, Registrator and Consul together to implement service discovery and load balancing, similar to your drawing. Then we realized, we don't need them for service discovery and load balancing as long as we create a docker network and deploy our services using docker swarm. With this approach you can easily create isolated environments for your services like dev,beta,prod in one or multiple machines by creating different networks for each environment. Once you create the network and deploy services, service discovery and load balancing is not your concern. In same docker network, each container has the DNS records of other containers and can communicate with them. With docker swarm, you can easily scale services, with one command. At each request to a service, docker distributes(load balances) the request to a instance of the service.
Your design is OK.
If your API gateway needs to implement (and thats probably the case) CAS/ some kind of Auth (via one of the services - i. e. some kind of User Service) and also should track all requests and modify the headers to bear the requester metadata (for internal ACL/scoping usage) - Your API Gateway should be done in Node, but should be under Haproxy which will care about load-balancing/HTTPS
Discovery is in correct position - if you seek one that fits your design look nowhere but Consul.
You can use consul-template or use own micro-discovery-framework for the services and API-Gateway, so they share end-point data on boot.
ACL/Authorization should be implemented per service, and first request from API Gateway should be subject to all authorization middleware.
It's smart to track the requests with API Gateway providing request ID to each request so it lifecycle could be tracked within the "inner" system.
I would add Redis for messaging/workers/queues/fast in-memory stuff like cache/cache invalidation (you can't handle all MS architecture without one) - or take RabbitMQ if you have much more distributed transaction and alot of messaging
Spin all this on containers (Docker) so it will be easier to maintain and assemble.
As for BI why you would need a service for that? You could have external ELK Elastisearch, Logstash, Kibana) and have dashboards, log aggregation, and huge big data warehouse at once.