This probably a weird situation. All the posts I have looked on this topic is the other way around where they want to check this "remove additional files" but in my case I want to have it unchecked but that is giving problems at later stages. To give some context around
We are building around 15 to 20 Azure functions as a wrapper API on top of Dynamics CRM APIs. So the 2 options evaluated are
a) Create each function in it's own function app - this gives us maintenance issue (20 URLs for Dev, SIT,UAT, Stage, Prod, Training is a considerable mess to tackle for along with their managed identities, app registrations etc etc) ,another key reason not to consider this approach is the consumption plan's warm up issues. It is unlikely that all these functions are heavily used but some of them are.
b) the second option, keeping all functions under 1 big function app. This is most preferred way for us as it will take care of most of the above issues. However, the problem with this we observed is - if we have to deploy 1 function, we have to wait for all the functions to be tested and approved and then deploy all functions even if the requirement is to deploy only one function. This is totally a No-No from architectural point of view.
So, we adapted a hybrid approach - in Visual studio, we still maintain multiple function app projects but during the deployment all these functions will be deployed in to Single Function App by using Web Deploy and Unchecking "Remove additional files in target"
The problem now
this is all worked very well for us during our POC, however now when we started deploying using pipe lines in to Staging slot it is becoming problem for us. Let's say when we first deployed function 1 to staging, swap it to production - the stage now has 0 functions and prod has 1 function. Then when we deploy the 2nd azure function, stage now has only 2nd function and if we swap it with production now, the production will get only 2nd azure function and we miss the 1st Azure function totally from production.
Logically it sounds correct to me but wondering if any one can give any suggestions for a work around for this.
Plz let me know if any further details required.
I have two Azure Functions. I can think of them as "Producer-Consumer". One is "HttpTrigger" based Function (Producer) which can be fired randomly. It writes the input data in a static "ConcurrentDictionary". The second one is "Timer Trigger" Azure Function(consumer). It reads the data periodically from the same "ConcurrentDictionary" which was being used by the "Producer" function App and then do some processing.
Both the functions are within the same .Net project (but in different classes). The in-memory data sharing through static "ConcurrentDictionary" works perfectly fine when I run the application locally. While running locally, I assume that they are running under the same process. However, when I deploy these Functions in Azure Portal ( They are in the same function App Resource), I found that data sharing through static "ConcurrentDictionary" is not not working.
I am just curious to know, if in Azure Portal, both the Functions have their own process (Probably, that's why they are not able to share in-process static collection). If that is the case, what are my options that these two Functions work as proper "Producer-Consumer"? Will keeping both the Functions in the same class help?
Probably, the scenario is just opposite to what is described in the post - "". As against the question in the post, I would like both the Functions to use the same static member of a static class instance.
I am sorry that I cannot experiment too much because the deployment is done through Azure-DevOps pipeline. Too many check-ins in repository is slightly inconvenient. As I mention, it works well locally. So, I don't know how to recreate what's happening in Azure Portal in local environment so that I can try different options? Is there any configurable thing which I am missing to apply?
Don't do that, use an azure queue, event grid, service bus or something else that is reliable but just don't try using a shared object. It will fail as soon as scale out happens or as soon as one of the processes dies. Do think about functions as independent pieces and do not try to go against the framework.
Yes, it might work when you run the functions locally but then you are running on a single machine and the runtime might use the same process but once deployed that ain't true anymore.
If you really really don't want to decouple your logic into a fully seperated producer and consumer then write a single function that uses an in process queue or collection and have that function deal with the processing.
We have a service running as an Azure function (Event and Service bus triggers) that we feel would be better served by a different model because it takes a few minutes to run and loads a lot of objects in memory and it feels like it loads it every time it gets called instead of keeping in memory and thus performing better.
What is the best Azure service to move to with the following goals in mind.
Easy to move and doesn't need too many code changes.
We have long term goals of being able to run this on-prem (kubernetes might help us here)
Appreciate your help.
To achieve first goal:
Move your Azure function code inside a continuous running Webjob. It has no max execution time and it can run continuously caching objects in its context.
To achieve second goal (On-premise):
You need to explain this better, but a webjob can be run as a console program on-premise, also you can wrap it into a docker container to move it from on-premise to any cloud but if you need to consume messages from an Azure Service Bus you will need an On-Premise-Azure approach connecting your local server to the cloud with a VPN or expressroute.
There are a couple of ways to solve the said issue, each with slightly higher amount of change from where you are.
If you are just trying to separate out the heavy initial load, then you can do it once in a Redis Cache instance and then reference it from there.
If you are concerned about how long your worker can run, then Webjobs (as explained above) can work, however, that is something I'd suggest avoiding since its not where Microsoft is putting its resources. Rather look at durable functions. Here an orchestrator function can drive a worker function. (Even here be careful, that since durable functions retain history after running for very very very long times, the history tables might get too large - so probably program in something like, restart the orchestrator after say 50,000 runs (obviously the number will vary based on your case)). Also see this.
If you want to add to this, the constrain of portability then you can run this function in a docker image that can be run in an AKS cluster in Azure. This might not work well for durable functions (try it out, who knows :) ), but will surely work for the worker functions (which would cost you the most compute anyways)
If you want to bring the workloads completely on-prem then Azure functions might not be a good choice. You can create an HTTP server using the platform of your choice (Node, Python, C#...) and have that invoke the worker routine. Then you can run this whole setup inside an image on an AKS cluster on prem and to the user it looks just like a load balanced web-server :) - You can decide if you want to keep the data on Azure or bring it down on prem as well, but beware of egress costs if you decide to move it out once you've moved it up.
It appears that the functions are affected by cold starts:
Serverless cold starts within Azure
Upgrading to the Premium plan would move your functions to pre-warmed instances, which should counter the problem you are experiencing:
Pre-warmed instances for Azure Functions
However, if you potentially want to deploy your function/triggers to on-prem, you should spin them out as microservices and deploy them with containers.
Currently, the fastest way would probably be to deploy the containerized triggers via Azure Container Instances if you don't already have a Kubernetes Cluster running. With some tweaking, you can deploy them on-prem later on.
There are few options:
Move your function app on to premium. But it will not help u a lot at the time of heavy load and scale out.
Issue: In that case u will start facing cold startup issues and problem will be persist in heavy load.
Redis Cache, it will resolve your most of the issues as the main concern is heavy loading.
Issue: If your system is multitenant system then your Cache become heavy during the time.
Create small micro durable functions. It will be not the answer of your Q as u don't want lots of changes but it will resolve your most of the issues.
I'm deploying updates to my Function app through the VS publish window. I set up a deployment slot with auto swap turned on. My updates through VS are going to the slot. The problem is, right after the publish is successful and when I test my API endpoints, I briefly receive 503 errors. I was under the impression that auto swap was seamless and end-users would not experience such interruptions. Am I missing something? How can I make my deployments unnoticeable to the users?
Switching to something like API Management or Traffic Manager is obviously an option, but slots are designed to do exactly what you want, and they should work the way you expect.
I looked into this a bit. Unfortunately, I can reproduce your issue, which suprised me. A few things feel a bit off when using Azure Functions with slots, so maybe there is some weirdness under the covers.
The official documentation does not mention anything about this however, quite the opposite:
Traffic redirection is seamless; no requests are dropped because of a swap.
If a function is running during a swap, execution continues and the next triggers are routed to the swapped app instance.
You don't even need to use Auto Swap. Just publish to both slots and swap the slots manually. When observing the responses, the following pattern can be seen:
Responses of old code
Responses of new code
503 errors for ~10 seconds
Request slowdown
Responses of new code
I tried:
AppService Plan & Consumption Plan
AAR Affinity On/Off
Azure Function V2 and V3 runtime
This seems like a bug to me. I would suggest you create a support case and maybe an issue at Github. I might do so myself if I find the time in the next few days. See also this Issue:
edit: the linked GitHub issue and also the medium article mentioned by Ron point out that you can set WEBSITE_ADD_SITENAME_BINDINGS_IN_APPHOST_CONFIG to 1 and this should help with the 503 errors. It is a documented behavior very deep in the AppService docs. Why it is not mentioned for Azure Functions eludes me.
Did you see this
Does Azure Functions throw 503 errors while app settings are being updated?
Depending on how you are doing the swap it could be triggering a restart because the app settings are "changing"
There is also this that probably would help but its only a prem feature
I would also check out
I believe the solution would be adding an API Management in front of your Azure Functions, then implement a retry policy in it. This error seems to be related to the DNS swap between the slots.
The general practice a lot of people follow is maintaining two hosts(Az function/App services) in two different regions behind azure traffic manager and deployment goes as follows:
disable first region in traffic manager
swap functions in the first region
enable first region in traffic manager
disable second region in traffic manager
swap functions in the second region
enable second region in traffic manager
Although it does not solve the issue of Az functions returning 503, but it does make it unnoticable to the user as you always route to the stable endpoint.
Having two regions also help handle other issues like azure outage in specific regions
I have a set of logic apps that each call a set function apps which are run in parallel.
Each logic app is triggered to start at a certain time during the night with all staggered an hour apart.
The Azure functions are written using the async pattern and call external APIs.
Sometimes the logic apps will run fine and complete their execution in a normal time period, and can do so for two or three days in a row.
However sometimes they will take hours or days forcing me to cancel their run.
Can any body shed any light on this might be happening?
I'm using the latest nuget packages of the durable functions extension
When debugging the functions always complete in a timely fashion
I have noticed that the functions sometimes get stuck at pending.
It appears you have at least two function apps that are configured with the same storage account and task hub name:
This causes the two function apps to steal messages from each other. If functions in one app do not exist in the other app, then it's very possible for orchestrations to get stuck in a Pending state like this.
The simplest way to mitigate this is to give each function app a unique task hub name. Please see the Task Hubs documentation for more information:
We have a few different roles in Azure. Currently these are each deployed to separate instances so they can scale separately (and in production this is what we want), but for testing this seems wasteful and we would like to be able to deploy all the roles to a single instance to minimise costs.
Can we do this?
Roles are essentially definitions for what will run inside a set of Windows Azure VM instances. By definition, they have their own instances, so they cannot be targeted toward a single set of instances.
That said: there's nothing stopping you from combining code from different roles into one single role. You'd need to make sure your OnStart() and Run() take care of all needed tasks, as well as combining startup script items.
The upside (which you already surmised): cost savings, especially when running at low volume (where the entire app might be able to run in two instances, vs. several more near-idle instances split up by role).
One potential downside: Everything combined into a single role will now scale together. This may or may not be an issue for you.
Also, think about sizing. Let's say your website is perfectly happy in a Small, yet some background task you have requires XL (maybe it's a renderer needing 10GB RAM or something). And let's say you always run 2 instances of your website, for SLA purposes. Now, even at very low volume, your app consists of two XL instances instead of 2 Small (web) and one XL (background). Now, your near-idle system could cost more as one combined role than as separate roles. This might not apply to you - just giving an example where it might not make sense to combine...
Adding on to David's great explanation, adding things together and gluing them via the OnStart or Run overrides will work, but are you really testing things properly? Configuration values merged together, potential issues with memory usage, concurrency, etc. You would not be testing the same product as you deploy to production.
Better way, would be to deploy extra-small instances to your QA environment. They cost a fraction of the price of say, Medium or Large servers and provide meaningful testing platform.