"The workflow has been aborted" calling a InvokeMethod activity - iis-7.5

We have a secuential workflow that calls to a InvokeMethod activity. This activity uses a class written in C# and can take a long time to execute (around 50 seconds or more and calls to other wcf services). It is executed in a IIs+Appfabric environment.
Well, half of the times we get "The workflow has been aborted" with no more data in the exception.
Do you know if there is any "timeout" executing activities? Or a way to trace the error?
Thanks

WCF services typically default to a 30 second window for a request to finish. You will need to update your web configuration to expand the receive timeout on your endpoint (both on the client and server sides) to 60 seconds or so.
A different approach (though much more involved) would be to design your workflow to support suspension and unload itself to a SQL Server database while awaiting a response back from your WCF services. This would yield significant performance & scalability benefits as well.

Related

Execute something which takes 5 seconds (like email send) but return with response immediately?

Context
In an ASP.NET Core application I would like to execute an operation which takes say 5 seconds (like sending email). I do know async/await and its purpose in ASP.NET Core, however I do not want to wait the end of the operation, instead I would like to return back to the to the client immediately.
Issue
So it is kinda Fire and Forget either homebrew, either Hangfire's BackgroundJob.Enqueue<IEmailSender>(x => x.Send("hangfire#example.com"));
Suppose I have some more complex method with injected ILogger and other stuff and I would like to Fire and Forget that method. In the method there are error handling and logging.(note: not necessary with Hangfire, the issue is agnostic to how the background worker is implemented). My problem is that method will run completely out of context, probably nothing will work inside, no HttpContext (I mean HttpContextAccessor will give null etc) so no User, no Session etc.
Question
How to correctly solve say this particular email sending problem? No one wants wait with the response 5 seconds, and the same time no one wants to throw and email, and not even logging if the send operation returned with error...
How to correctly solve say this particular email sending problem?
This is a specific instance of the "run a background job from my web app" problem.
there is no universal solution
There is - or at least, a universal pattern; it's just that many developers try to avoid it because it's not easy.
I describe it pretty fully in my blog post series on the basic distributed architecture. I think one important thing to acknowledge is that since your background work (sending an email) is done outside of an HTTP request, it really should be done outside of your web app process. Once you accept that, the rest of the solution falls into place:
You need a durable storage queue for the work. Hangfire uses your database; I tend to prefer cloud queues like Azure Storage Queues.
This means you'll need to copy all the data over that you will need, since it needs to be serialized into that queue. The same restriction applies to Hangfire, it's just not obvious because Hangfire runs in the same web application process.
You need a background process to execute your work queue. I tend to prefer Azure Functions, but another common approach is to run an ASP.NET Core Worker Service as a Win32 service or Linux daemon. Hangfire has its own ad-hoc in-process thread. Running an ASP.NET Core hosted service in-process would also work, though that has some of the same drawbacks as Hangfire since it also runs in the web application process.
Finally, your work queue processor application has its own service injection, and you can code it to create a dependency scope per work queue item if desired.
IMO, this is a normal threshold that's reached as your web application "grows up". It's more complex than a simple web app: now you have a web app, a durable queue, and a background processor. So your deployment becomes more complex, you need to think about things like versioning your worker queue schema so you can upgrade without downtime (something Hangfire can't handle well), etc. And some devs really balk at this because it's more complex when "all" they want to do is send an email without waiting for it, but the fact is that this is the necessary step upwards when a baby web app becomes distributed.

Azure slow communication between APIs

In some 1-5% of our requests, we are seeing slow communication between APIs (REST API requests). Both APIs are developed by us and hosted on Azure, each app service on its own app service plan in the same region, P1v2 tier.
What we are seeing on application insights is that POST or GET requests on origin API can take a few seconds to execute, while real execution time on destination API is only a few milliseconds.
Examples (first line POST request on origin, second execution time on destination API): slow req 1, slow req 2
Our best guess is that the time difference is lost in communication between components. We don't have an explanation for it since the payload is really small and in most cases, communication takes less than 5 milliseconds.
We dismiss the possible explanation it could be due to component cold start since it happens during constant load and no horizontal scaling was performed.
Do you have any idea what might cause it or how to do additional analysis in order to discover it?
If you're running multiple sites on the App Service Plan, then enable the "Always On" setting for your web app > All Settings > Application Settings > Click on Always On
See here for details: https://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
When Always On is off, the site is shut down after 20 minutes of inactivity to free up resources for any additional websites that might be using the same App Service Plan.
The amount of information it needs to collect, process and then present itself requires some time, and involve internal calls as well, that is why considering the server load and usage, it takes around 6 to 7 seconds sometimes even more.
To Troubleshoot that latency, try this steps, provided by Microsoft.

Azure Function with ServiceBusTrigger circuit breaker pattern

I have an Azure function with ServiceBusTrigger which will post the message content to a webservice behind an Azure API Manager. In some cases the load of the (3rd party) webserver backend is too high and it collapses returning error 500.
I'm looking for a proper way to implement circuit breaker here.
I've considered the following:
Disable the azure function, but it might result in data loss due to multiple messages in memory (serviceBus.prefetchCount)
Implement API Manager with rate-limit policy, but this seems counter productive as it runs fine in most cases
Re-architecting the 3rd party webservice is out of scope :)
Set the queue to ReceiveDisabled, this is the preferred solution, but it results in my InputBinding throwing a huge amount of MessagingEntityDisabledExceptions which I'm (so far) unable to catch and handle myself. I've checked the docs for host.json, ServiceBusTrigger and the Run parameters but was unable to find a useful setting there.
Keep some sort of responsecode resultset and increase retry time, not ideal in a serverless scenario with multiple parallel functions.
Let API manager map 500 errors to 429 and reschedule those later, will probably work but since we send a lot of messages it will hammer the service for some time. In addition it's hard to distinguish between a temporary 500 error or a consecutive one.
Note that this question is not about deciding whether or not to trigger the circuitbreaker, merely to handle the appropriate action afterwards.
Additional info
Azure functionsV2, dotnet core 3.1 run in consumption plan
API Manager runs Basic SKU
Service Bus runs in premium tier
Messagecount: 300.000

How does Azure Webjob returns results to webapp?

I understand Webjob is a back-end job and the Webapp can invoke it through Azure queue.
My question is if the Webjob completes, how can the Webapp know the Webjob is finished and how can the Webapp retrieve the results generated by Webjob?
Is there any asyn method that can work in this scenario?
Other methods are also welcomed.
Thanks
Derek
----------------Update ------------------------
Can "ListQueuesSegmentedAsync" method work? But I have no idea how to use it.
You already know the answer! A Message Queue!
If you need more than a few KB for a message (maybe you want to pass a JPEG file) drop that into Blob Storage and signal the Web App/WebJob with a queue message indicating the full path to the newly arrived blob.
For more on implementing a Queue-centric workflow, see my other answer here:
https://stackoverflow.com/a/38036911/4148708
Sometimes, if keeping state isn't your first concern, it may be easier to implement a system where the WebJob calls an authenticated REST endpoint in the WebApp to GET/POST data.
There's no silver bullet. Every scenario tends to be a little different and may benefit from simplicity rather than durability (REST vs durable message queue).
Oh, and since you did specifically ask for async, here's one way to do it for REST (Queues are async by nature):
Call the REST endpoint from your WebJob
Return 202 Accepted (as in I got you, but the TPS report isn't ready yet)
Return Location: https://webapp/{a-chunk-of-sha1-representing-a-unique-id}
(The point of this header is to tell the WebJob Check this URL from time to time to grab the finished TPS report)
Follow the URL to get results, 200 OK means you have them, 417 Expectation Failed means not yet. Actually HTTP 417 is supposted to be used in response to 100 Continue, but you never get to use that so i'm slowing campaigning for 417 Expectation Failed to rival with buzzwords like elastic and disruptive. But i digress.

Does a WCF REST service return a response to the client even if a secondary thread is not done running?

I'm trying to implement some analytics logic in my WCF REST web service but I don't want to damage performance while I do so.
I was thinking of starting a new thread that would communicate with the analytics service (Mixpanel) while the main thread does the actual work but I'm not sure this accomplishes what I want to do.
My assumption is that the web service would return a response as soon as its main thread is done while the secondary thread runs on its own and may run longer without the client waiting any extra time.
Is that an accurate assumption?
Some testing showed that my assumption was accurate.

Resources