Two Azure Functions share an In-Proc collection

Two Azure Functions share an In-Proc collection - azure

I have two Azure Functions. I can think of them as "Producer-Consumer". One is "HttpTrigger" based Function (Producer) which can be fired randomly. It writes the input data in a static "ConcurrentDictionary". The second one is "Timer Trigger" Azure Function(consumer). It reads the data periodically from the same "ConcurrentDictionary" which was being used by the "Producer" function App and then do some processing.
Both the functions are within the same .Net project (but in different classes). The in-memory data sharing through static "ConcurrentDictionary" works perfectly fine when I run the application locally. While running locally, I assume that they are running under the same process. However, when I deploy these Functions in Azure Portal ( They are in the same function App Resource), I found that data sharing through static "ConcurrentDictionary" is not not working.
I am just curious to know, if in Azure Portal, both the Functions have their own process (Probably, that's why they are not able to share in-process static collection). If that is the case, what are my options that these two Functions work as proper "Producer-Consumer"? Will keeping both the Functions in the same class help?
Probably, the scenario is just opposite to what is described in the post - "https://stackoverflow.com/questions/62203987/do-azure-function-from-same-app-service-run-in-same-instance". As against the question in the post, I would like both the Functions to use the same static member of a static class instance.
I am sorry that I cannot experiment too much because the deployment is done through Azure-DevOps pipeline. Too many check-ins in repository is slightly inconvenient. As I mention, it works well locally. So, I don't know how to recreate what's happening in Azure Portal in local environment so that I can try different options? Is there any configurable thing which I am missing to apply?

Don't do that, use an azure queue, event grid, service bus or something else that is reliable but just don't try using a shared object. It will fail as soon as scale out happens or as soon as one of the processes dies. Do think about functions as independent pieces and do not try to go against the framework.
Yes, it might work when you run the functions locally but then you are running on a single machine and the runtime might use the same process but once deployed that ain't true anymore.
If you really really don't want to decouple your logic into a fully seperated producer and consumer then write a single function that uses an in process queue or collection and have that function deal with the processing.

Related

Dealing with Long running Tasks in Azure

We are moving an On-Premise solution into Azure and there are few services as part of the application which schedules to run once everyday.
I did it as a Web API and when ever the HTTP call calls the method fires without any trouble.
But the problem is the the method behind this API is a heavy weight one which takes around 40-50 mins to finish.
But since Azure APIs will expire in 230sec, I am really got stuck.
I am calling the API from Timer Triggered Azure functions. Its working fine.
But the 30-40 mins becoming a real challenge.
So how to handle this such situation in Azure when we have a time consuming method to execute.
(Other than APIs as well)

There can be many issues that causing performance problems in Azure Functions. Try to debug with the help of Azure Service Profiler or any other debugging tools, determine which line of code is executing how long.
Few reasons could be like:
There might be inefficient algorithm written form fetching IDs/ADLS (Azure Data Lake Storage) operations.
If await keyword is used in the Function App Code, then use the .ConfigureAwait(false) functionality also.
Enable Automatic Scaling in the Azure Function App..
It also depends on NuGet Packages that you're using which might be taking long time to create the Azure Functions instance.
ReadIDs and ReadData functions should be asynchronous.
Note: You may get doubt like all the functions are with async, but make sure in return type the Function definition should have Task and async keyword.

Recommended Azure service to replace Azure functions

We have a service running as an Azure function (Event and Service bus triggers) that we feel would be better served by a different model because it takes a few minutes to run and loads a lot of objects in memory and it feels like it loads it every time it gets called instead of keeping in memory and thus performing better.
What is the best Azure service to move to with the following goals in mind.
Easy to move and doesn't need too many code changes.
We have long term goals of being able to run this on-prem (kubernetes might help us here)
Appreciate your help.

To achieve first goal:
Move your Azure function code inside a continuous running Webjob. It has no max execution time and it can run continuously caching objects in its context.
To achieve second goal (On-premise):
You need to explain this better, but a webjob can be run as a console program on-premise, also you can wrap it into a docker container to move it from on-premise to any cloud but if you need to consume messages from an Azure Service Bus you will need an On-Premise-Azure approach connecting your local server to the cloud with a VPN or expressroute.
Regards.

There are a couple of ways to solve the said issue, each with slightly higher amount of change from where you are.
If you are just trying to separate out the heavy initial load, then you can do it once in a Redis Cache instance and then reference it from there.
If you are concerned about how long your worker can run, then Webjobs (as explained above) can work, however, that is something I'd suggest avoiding since its not where Microsoft is putting its resources. Rather look at durable functions. Here an orchestrator function can drive a worker function. (Even here be careful, that since durable functions retain history after running for very very very long times, the history tables might get too large - so probably program in something like, restart the orchestrator after say 50,000 runs (obviously the number will vary based on your case)). Also see this.
If you want to add to this, the constrain of portability then you can run this function in a docker image that can be run in an AKS cluster in Azure. This might not work well for durable functions (try it out, who knows :) ), but will surely work for the worker functions (which would cost you the most compute anyways)
If you want to bring the workloads completely on-prem then Azure functions might not be a good choice. You can create an HTTP server using the platform of your choice (Node, Python, C#...) and have that invoke the worker routine. Then you can run this whole setup inside an image on an AKS cluster on prem and to the user it looks just like a load balanced web-server :) - You can decide if you want to keep the data on Azure or bring it down on prem as well, but beware of egress costs if you decide to move it out once you've moved it up.

It appears that the functions are affected by cold starts:
Serverless cold starts within Azure
Upgrading to the Premium plan would move your functions to pre-warmed instances, which should counter the problem you are experiencing:
Pre-warmed instances for Azure Functions
However, if you potentially want to deploy your function/triggers to on-prem, you should spin them out as microservices and deploy them with containers.
Currently, the fastest way would probably be to deploy the containerized triggers via Azure Container Instances if you don't already have a Kubernetes Cluster running. With some tweaking, you can deploy them on-prem later on.

There are few options:
Move your function app on to premium. But it will not help u a lot at the time of heavy load and scale out.
Issue: In that case u will start facing cold startup issues and problem will be persist in heavy load.
Redis Cache, it will resolve your most of the issues as the main concern is heavy loading.
Issue: If your system is multitenant system then your Cache become heavy during the time.
Create small micro durable functions. It will be not the answer of your Q as u don't want lots of changes but it will resolve your most of the issues.

Azure Functions: Understanding Change Feed in the context of multiple apps

According to the below diagram on https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-processor, at least 4 partition key ranges are distributed between two hosts. What I'm struggling to understand in this diagram is the distinction between a host and a consumer. In the context of Azure Functions, would it be true to say that a host is a Function app whereas a consumer is an active/warm instance?
I'd like to create a setup with N many Function apps each with 0-200 active instances (depending on workload). At the same time, I'd like to read Change Feed. If I use a CosmosDBTrigger with the same connection string and lease container in each app, is this taken care of automatically or do I need a manual implementation?

The documentation you linked is mainly for the Change Feed Processor, but the Azure Functions binding actually runs the Change Feed Processor underneath.
When just using CFP, it's maybe easier to understand because you are mainly in control of the instances and distribution, but I'll try to map it to Functions.
The document mentions a deployment unit concept:
A single change feed processor deployment unit consists of one or more instances with the same processorName and lease container configuration. You can have many deployment units where each one has a different business flow for the changes and each deployment unit consisting of one or more instances.
For example, you might have one deployment unit that triggers an external API anytime there is a change in your container. Another deployment unit might move data, in real time, each time there is a change. When a change happens in your monitored container, all your deployment units will get notified.
The deployment unit in Functions is the Function App. One Function App can span many instances. So each instance/host within that Function App deployment, will act as a available host/consumer.
Further down, the article talks about the dynamic scaling and what it says is basically that, within a Deployment Unit (Function App), the leases will get evenly distributed. So if you have 20 leases and 10 Function App instances, then each instance will own 2 leases and process them independently from the other instances.
One important note on that article is, scaling enables a higher CPU pool, but not a necessarily a higher parallelism.
As the documentation mentions, even on a single instance, CFP will process and read each lease it owns on an independent Task. The problem is, all these parallel processing is sharing the same CPU, so adding more instances will help if you currently see the instance having a CPU thread/bottleneck.
Now, in your example, you want to have N Function Apps, I assume that each one, doing something different. Basically, microservice deployments which would trigger on any change, but do a different task or fire a different business flow.
This other article covers that. Basically you can either, have each Function App use a separate Lease collection (having the monitored collection be the same) or you can share the lease collection but use a different LeaseCollectionPrefix for each Function App deployment. If the number of Function Apps you will be shared the lease collection is high, please check the RU usage on the lease collection as you might need to increase it (there is a note about it on the article).

Azure Logic App that does heavy processing?

I am wanting to create a custom Azure Logic App which does some heavy processing. I am reading as much as I can about this. I want to describe what I wish to do, as I understand it currently, then I am hoping someone can point out where I am incorrect in my understanding, or point out a more ideal way to do this.
What I want to do is take an application that runs a heavy computational process on a 3D mesh and turn it into a node to use in Azure Logic App flows.
What I am thinking so far, in a basic form, is this:
HTTP Trigger App: This logic app receives a reference to a 3D mesh to be processed, it then saves this mesh to Azure Store and passes that that reference to the next logic app.
Mesh Computation Process App: This receives the Azure Storage reference to the 3D mesh. It then launches a high performance server with many CPUs and GPU's, the high performance server downloads the mesh, processes the mesh, then uploads the mesh back to Azure Storage. This app then passes the reference to the processed mesh to the next logic app. Finally this shuts down the high performance server so it doesn't consume resource unnecessarily.
Email notification App: This receives the Azure Storage reference to the processed mesh, then fires off an email with the download link to the user.
Is this possible? So far what I've read this appears possible. I am just wanting someone to verify this in case I've severely misunderstood something.
Also I am hoping a to get a little bit of guidance on the mechanism to launch and shut down a high performance server within the 'Mesh Computation Process App'. The only place the Azure documentation mentions asynchronous, long-term, task processing in Logic Apps is on this page:
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-create-api-app
It talks about it requiring you to launch an API App or a Web App to receive the Azure Logic App request, then ping back status to Azure Logic Apps. I was wondering, is it possible to do this in a serverless manner? So the 'Mesh Computation Process App' would fire off an Azure Function which spins up the higher performance server, then another Azure Function periodically pings that server to report back status until complete, at which point an Azure Function then triggers the higher performance server to shut down, then signals to the 'Mesh Computation Process App' that it is complete and it continues onto the next logic app. Is it possible to do it in that manner?
Any comments or guidance on how to better approach or think about this would be appreciated. This is the first time I've dug into Azure, so I am simultaneously trying to orient myself on proper understanding in Azure and make a system like this.

Should be possible. At the moment I'm not exactly sure if Logic Apps themselves can create all of those things for you, but it definitely can be done with Azure Functions, in a serverless manner.
For your second step, if I understand correctly, you need it to run for long, just so that it can pass something further once the VM is done? You don't really need that. When in serverless, try to not think of long running tasks, and remember that everything is an event.
Putting stuff into Azure Blob storage is an event you can react to, this removes your need for linking.
Your first step, saves stuff to Azure Store, and that's it, doesn't need to do anything else.
Your second app, triggers on inserted stuff to initiate processing.
The VM processes your stuff, and puts it in the store.
The email app triggers on stuff being put into "processed" folder.
Another App triggers on the same file to shut down the VM.
This way you remove the long running state management and chaining the apps directly, and instead each of them does only what it needs to do, and then apps can trigger automatically to the results of the previous flows.
If you do need some kind of state management/orchestration in all of your steps, and you want to still be in serverless, look into durable azure functions. They are serverless, but the actions they take and results they get are stored in table storage, so it can be recreated and restored to a state it was in before. Of course everything is done for you automatically, it just changes a bit on what exactly you can do inside of it to still be durable.
The actual state management you might want to do is maybe something to keep track of all the VM's and try to reuse stuff, instead of spending time spinning them up and killing them. But don't complicate it too much for now.
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview
Of course you still need to think about error handling, like what happens if your VM just dies without uploading anything, you don't want to just miss on stuff. So you can trigger special flows to handle repeats/errors, maybe send different emails, etc.

Running Locust in distributed mode on Azure functions

I am building a small utility that packages Locust - performance testing tool (https://locust.io/) and deploys it on azure functions. Just a fun side project to get some hands on with the serverless craze.
Here's the git repo: https://github.com/amanvirmundra/locust-serverless.
Now I am thinking that it would be great to run locust test in distributed mode on serverless architecture (azure functions consumption plan). Locust supports distributed mode but it needs the slaves to communicate with master using it's IP. That's the problem!!
I can provision multiple functions but I am not quite sure how I can make them talk to each other on the fly(without manual intervention).
Thinking out loud:
Somehow get the IP of the master function and pass it on to the slave functions. Not sure if that's possible in Azure functions, but some people have figured a way to get an IP of azure function using .net libraries. Mine is a python version but I am sure if it can be done using .net then there would be a python way as well.
Create some sort of a VPN and map a function to a private IP. Not sure if this sort of mapping is possible in azure.
Some has done this using AWS Lambdas (https://github.com/FutureSharks/invokust). Ask that person or try to understand the code.
Need advice in figuring out what's possible at the same time keeping things serverless. Open to ideas and/or code contributions :)
Update
This is the current setup:
The performance test session is triggered by an http request, which takes in number of requests to make, the base url, and no. of concurrent users to simulate.
Locustfile define the test setup and orchestration.
Run.py triggers the tests.
What I want to do now, is to have master/slave setup (cluster) for a massive scale perf test.
I would imagine that the master function is triggered by an http request, with a similar payload.
The master will in turn trigger slaves.
When the slaves join the cluster, the performance session would start.

What you describe doesn't sounds like a good use-case for Azure Functions.
Functions are supposed to be:
Triggered by an event
Short running (max 10 minutes)
Stateless and ephemeral
But indeed, Functions are good to do load testing, but the setup should be different:
You define a trigger for your Function (e.g. HTTP, or Event Hub)
Each function execution makes a given amount of requests, in parallel or sequentially, and then quits
There is an orchestrator somewhere (e.g. just a console app), who sends "commands" (HTTP call or Event) to trigger the Function
So, Functions are "multiplying" the load as per schedule defined by the orchestrator. You rely on Consumption Plan scalability to make sure that enough executions are provisioned at any given time.
The biggest difference is that function executions don't talk to each other, so they don't need IPs.
I think the mentioned example based on AWS Lambda is just calling Lambdas too, it does not setup master-client lambdas talking to each other.
I guess my point is that you might not need that Locust framework at all, and instead leverage the built-in capabilities of autoscaled FaaS.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string