Node js service that manages k8s jobs - is it feasible?

Node js service that manages k8s jobs - is it feasible? - node.js

I need to create a node.js service that is supposed to run a large number (potentially hundreds) of scheduled jobs simultaneously. This service should also expose a REST interface to allow end users to perform CRUD on these jobs.
At first I thought of going for agenda.js and since we use k8s, launching a few instances so we could deal with this amount of jobs.
However, I also thought of another idea and wanted to see if somebody already done if - since we use k8s, I thought of harnessing the power of k8s jobs and create a service that will communicate with k8s api and manage the jobs.
Is it feasible? What things do I have to take into consideration if i'm going in this direction?

what you want is basically the definition of kubernetes operator, and yes, it is possible to do what you want.
in your case, you can use the kubernetes client for nodejs

Related

Recommended Azure service to replace Azure functions

We have a service running as an Azure function (Event and Service bus triggers) that we feel would be better served by a different model because it takes a few minutes to run and loads a lot of objects in memory and it feels like it loads it every time it gets called instead of keeping in memory and thus performing better.
What is the best Azure service to move to with the following goals in mind.
Easy to move and doesn't need too many code changes.
We have long term goals of being able to run this on-prem (kubernetes might help us here)
Appreciate your help.

To achieve first goal:
Move your Azure function code inside a continuous running Webjob. It has no max execution time and it can run continuously caching objects in its context.
To achieve second goal (On-premise):
You need to explain this better, but a webjob can be run as a console program on-premise, also you can wrap it into a docker container to move it from on-premise to any cloud but if you need to consume messages from an Azure Service Bus you will need an On-Premise-Azure approach connecting your local server to the cloud with a VPN or expressroute.
Regards.

There are a couple of ways to solve the said issue, each with slightly higher amount of change from where you are.
If you are just trying to separate out the heavy initial load, then you can do it once in a Redis Cache instance and then reference it from there.
If you are concerned about how long your worker can run, then Webjobs (as explained above) can work, however, that is something I'd suggest avoiding since its not where Microsoft is putting its resources. Rather look at durable functions. Here an orchestrator function can drive a worker function. (Even here be careful, that since durable functions retain history after running for very very very long times, the history tables might get too large - so probably program in something like, restart the orchestrator after say 50,000 runs (obviously the number will vary based on your case)). Also see this.
If you want to add to this, the constrain of portability then you can run this function in a docker image that can be run in an AKS cluster in Azure. This might not work well for durable functions (try it out, who knows :) ), but will surely work for the worker functions (which would cost you the most compute anyways)
If you want to bring the workloads completely on-prem then Azure functions might not be a good choice. You can create an HTTP server using the platform of your choice (Node, Python, C#...) and have that invoke the worker routine. Then you can run this whole setup inside an image on an AKS cluster on prem and to the user it looks just like a load balanced web-server :) - You can decide if you want to keep the data on Azure or bring it down on prem as well, but beware of egress costs if you decide to move it out once you've moved it up.

It appears that the functions are affected by cold starts:
Serverless cold starts within Azure
Upgrading to the Premium plan would move your functions to pre-warmed instances, which should counter the problem you are experiencing:
Pre-warmed instances for Azure Functions
However, if you potentially want to deploy your function/triggers to on-prem, you should spin them out as microservices and deploy them with containers.
Currently, the fastest way would probably be to deploy the containerized triggers via Azure Container Instances if you don't already have a Kubernetes Cluster running. With some tweaking, you can deploy them on-prem later on.

There are few options:
Move your function app on to premium. But it will not help u a lot at the time of heavy load and scale out.
Issue: In that case u will start facing cold startup issues and problem will be persist in heavy load.
Redis Cache, it will resolve your most of the issues as the main concern is heavy loading.
Issue: If your system is multitenant system then your Cache become heavy during the time.
Create small micro durable functions. It will be not the answer of your Q as u don't want lots of changes but it will resolve your most of the issues.

Does it make sense to replicate a Node.js Kubernetes server pod many times on the same Kubernetes node?

We have an app that handles request which could take several minutes to return a response. Does it make sense to put this app in a pod and replicate many times in the same node so we can handle each request on a new thread (Considering nodejs is single threaded)?

The use-case here is quite a bit, unsteady. As per your say your application is a single threaded one, and you want to increase the pod as soon as a new request is fired only if the previous pod is busy or has a lock, in simplest terms a new pod to come with a new request if the previous pod is busy.
Kubernetes is an Orchestrator for containers and deploying a monolithic application on kubernetes not just bring down all the tremendous that kubernetes can do but also bring lots of overhead with Deployment and automation issues.
Also, the nice thing when you break away from a monolith (= single thread) into a (micro) service oriented architecture you can have an isolated event loop for each service. Because every Node process would be running isolated inside a container!
I would advise you to consider your architectural design of application, break them into multi threaded and then revisit kubernetes.
However quoting the same https://www.dataversity.net/use-kubernetes-deploy-monolithic-apps/# A Linux shell is a Linux shell is a Linux shell. You can make it work, and the following can be a way ahead.
Strategic Solution: You can declare an HPA[Horizontal Pod Autoscaler] for your deployment with a flag of --max-replicas=xx, then you need to write a job using request metrics, that whenever there is request to the service the Deployment should be scaled automatically and descaled likewise. Also you will have to downscale as soon as the request ends. You should use v2beta2 apiVersion of HPA, as it allows that type of metrics.
Also what I think is that you will have to use the v2beta2 apiVersion of HPA because you will have to keep the request count to unary so that the requests doesn't generate 5XX as kubernetes service will send the request to the same pod if such metric isn't set.

Logically seperate azure kubernetes deployments

I have a kubernetes cluster created and deployed with app.
First if I deployed with firstapp.yaml which created a pod and a service to expose the pod externally .
If i have two nodes in the cluster and then make another deployment with secondapp.yaml .
I noticed ,that the second deployment went to different node. Although this is desired behaviour for logical seperation .
Is it something that's provided by kubernetes. How will it manage deployments made using different files? will they always go on seperate nodes (if there are nodes provisioned) ?
If not, what is the practice to be followed if i want logical seperation between two nodes which i want to behave as two environments , let's say dev and qa environment.

No, they will not necessary go to different nodes. Scheduler determines where to put the pod based on different criteria.
As for your last question - it makes no sense. You can use namespaces\network policies to separate environments, you shouldn't care on which node(s) your pods are. Thats the whole point of having a cluster.
You can use placement constraints to achieve what you ask for, but it makes no sense at all.
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

Agreed with #4c74356b41.
As an addition. It does not matter where your pods are, you can have multiple replicas of your application split between say 50 nodes, they can still communicate with each other (services, service discovery, network CNI) and share resources etc.
And yes, this is a default behavior of Kubernetes, which you can influence by taints, toleration's, resources, limits Node affinity and anti-affinity (you can find a lot of information about each of those in documentation or simply googling it). Also where the Pods are scheduled is dependent on Node capacity. Your pod has been set to a particular Node because Scheduler calculated it had best score, first taking into account mentioned conditions. You can find details about the process here.
Again as #4c74356b41 mentions, if you want to split your cluster into multiple environments, lets say for different teams or as you mention for dev and qa environments you can use namespaces for that. They are basically making a smaller clusters in your cluster (note that this is more of a logical separation and not a separation from security perspective, until you add other components like
for example roles)
You can just add a namespace field to your deployment YAML to specify into which namespace you want to deploy your pods - still does not matter on which nodes they are. Depending on your use case.
Please note that what I wrote is oversimplified and I didn't mention many things in between, which you can easily find in most Kubernetes tutorials.

Running Locust in distributed mode on Azure functions

I am building a small utility that packages Locust - performance testing tool (https://locust.io/) and deploys it on azure functions. Just a fun side project to get some hands on with the serverless craze.
Here's the git repo: https://github.com/amanvirmundra/locust-serverless.
Now I am thinking that it would be great to run locust test in distributed mode on serverless architecture (azure functions consumption plan). Locust supports distributed mode but it needs the slaves to communicate with master using it's IP. That's the problem!!
I can provision multiple functions but I am not quite sure how I can make them talk to each other on the fly(without manual intervention).
Thinking out loud:
Somehow get the IP of the master function and pass it on to the slave functions. Not sure if that's possible in Azure functions, but some people have figured a way to get an IP of azure function using .net libraries. Mine is a python version but I am sure if it can be done using .net then there would be a python way as well.
Create some sort of a VPN and map a function to a private IP. Not sure if this sort of mapping is possible in azure.
Some has done this using AWS Lambdas (https://github.com/FutureSharks/invokust). Ask that person or try to understand the code.
Need advice in figuring out what's possible at the same time keeping things serverless. Open to ideas and/or code contributions :)
Update
This is the current setup:
The performance test session is triggered by an http request, which takes in number of requests to make, the base url, and no. of concurrent users to simulate.
Locustfile define the test setup and orchestration.
Run.py triggers the tests.
What I want to do now, is to have master/slave setup (cluster) for a massive scale perf test.
I would imagine that the master function is triggered by an http request, with a similar payload.
The master will in turn trigger slaves.
When the slaves join the cluster, the performance session would start.

What you describe doesn't sounds like a good use-case for Azure Functions.
Functions are supposed to be:
Triggered by an event
Short running (max 10 minutes)
Stateless and ephemeral
But indeed, Functions are good to do load testing, but the setup should be different:
You define a trigger for your Function (e.g. HTTP, or Event Hub)
Each function execution makes a given amount of requests, in parallel or sequentially, and then quits
There is an orchestrator somewhere (e.g. just a console app), who sends "commands" (HTTP call or Event) to trigger the Function
So, Functions are "multiplying" the load as per schedule defined by the orchestrator. You rely on Consumption Plan scalability to make sure that enough executions are provisioned at any given time.
The biggest difference is that function executions don't talk to each other, so they don't need IPs.
I think the mentioned example based on AWS Lambda is just calling Lambdas too, it does not setup master-client lambdas talking to each other.
I guess my point is that you might not need that Locust framework at all, and instead leverage the built-in capabilities of autoscaled FaaS.

Setting up Redis on Azure cloud service worker role

I'm creating a cloud service where I have a worker role running some heavy processing in the background, for which i would like a Redis instance to be running locally on the worker.
What i want to do is set up the worker role project in a way that the Redis instance is installed/configured when the worker is deployed.
The redis database would be cleared on every job startup.
I've looked at the MSOpenTech redis for windows with nuget installation, but i'm unsure how i would get this working on the worker role instance. Is there a smart way to set it up, or would it be by command-line calls?
Thanks.

I'm not expecting to get this marked as an answer, but just wanted to add the add that this is a really bad approach for a real-world deployment.
I can understand why you might want to do this from a learning perspective, however in a production environment its a really bad idea, for several reasons:
You cannot guarantee when a Worker Role will be restarted by the Azure Service Fabric (and you're not guaranteed to get the underlying VM in the same state before it went down) - you could potentially be re-populating the cache simply because the role was re-started.
In a real-world implementation of Redis, you would run multiple nodes within a cluster so you benefit from a) the ability to automatically split your dataset among multiple nodes and b) continue operations when a subset of the nodes are experiencing failures - running within a Worker Role doesn't give you any of this. You also run the risk of multiple Redis instances (unaware of each other) every time you scale-out your Worker Role.
You will need to manage your Redis installation within the Worker Role and they simply aren't designed for this. PaaS Worker Roles are designed to run the Worker Role Package that is deployed and nothing else. If you really want to run Redis yourself, you should probably look at IaaS VM's.
I would recommend that you take a look at the Azure Redis Cache SaaS offering (see http://azure.microsoft.com/en-gb/services/cache/) which offers a fully managed, highly-available, implementation of the Redis Cache. I use it on several projects and can highly recommend it.

To install any software on a worker role instance, you'd need to set this up to happen as a startup task.
You'll reference startup tasks in your ServiceDefinition.csdef file, in the <Startup> element, with a reference to your command file which installs whatever software you want (such as Redis).
I haven't tried installing Redis in a worker role instance, so I can't comment about whether this will succeed. And you'll also need to worry about opening the right ports (whether external- or internal-facing), and scaling (e.g. what happens when you scale to two worker role instances, both running redis?). My answer is specific to how you install software on a role instance.
More info on startup task setup is here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string