masstransit azure messages stop being picked up - azure

I'm facing following problem with MassTransit 3. I'm publishing messages from WebApi to Backend (ran as continuous webjob). When the backend job is started all works well and messages are picked up properly. After cca 20 minutes all messages published from WebApi stop being picked up by the backend. The message is published to the Azure Service Bus properly but is picked up only after restart of the webjob process.
MT debug log is completely silent and shows no issues. So this question is more for authors of MT if they could think of anything that could cause this issue.
Update 1
The web job is continuous and running in standard mode, therefore the 20minute timeout mentioned in azure documentation shouldn't apply.
I've checked the logs and the job is running. Environment doesn't log anything about stopping the job and the process explorer shows the job. With quite high thread count (I have just 3 consumers). All threads are in wait state.

You should be creating a cloud service and not a web job. Web jobs are not meant for continuous processes. A worker role is exactly what you need.
From the Azure documentation:
Web apps in Free mode can time out after 20 minutes if there are no requests to the scm (deployment) site and the web app's portal is not open in Azure. Requests to the actual site will not reset this.

Resolved. The MT process got stuck after spawning around 2k threads. The issue must have been in azure transport as trying the same configuration with Rabbit worked well.
After updating to newer MT version (.11 beta), the transport started to behave properly.

Related

IIS 10 Application Pool fall a sleep

We have ASP.NET Core application used internally which are used during office hours and a batch that should be processed 3 AM every morning which is scheduled by HangFire like this:
RecurringJob.AddOrUpdate(
() => MyBatch(),
"0 0 3 1/1 *");
The problem is that the Application Pool goes to sleep and the batch isn’t processed if the site isn’t manually started (by going to the website usually).
I have searched SO and tampered with these settings in the Application Pool but with no success:
Some sources that I used to modify the settings:
How to prevent/extend idleTimeout in IIS 7?
https://serverfault.com/questions/443065/how-do-i-prevent-iis-8-from-stopping-idle-asp-net-applications
IIS seems like to sleep until the next request
The Application Pool is used by a total of 7 applications (all being inactive during night when the batch should be processed). The used Application Pool uses .NET CLR Version 2.0.
I'm using IIS version 10.0.17134.1.
How can I make the Application Pool stay active so the batch can be invoked regularly every morning?
I ran into the same issue where my ASP.NET core application goes into idle even with "AlwaysRunning" as start mode for the app pool, "Preload Enabled" set to true for the site, and idle timeout set to 0.
I got it to work by installing the Application Initialization module and setting the .NET CLR version to v4.0. Don't use the "No Managed Code" as that would prevent the Always Running from triggering the app start.
I wrote a blog post on this explaining in more details the steps I took to get the app to run continuously.
They got documentation on how to set up service to run without stopping.
http://docs.hangfire.io/en/latest/deployment-to-production/making-aspnet-app-always-running.html#enabling-service-auto-start
My experience (with older IIS versions 7.5, 8.0) is that it works, but not for app pool recycle/domain unload.
Workaround for me was to send init request on the application_end event.
As above - you need to enable the Service Autostart - in addition to this, if you hit multiple exceptions, I have found that the Rapid Fail Protection has shut down Application pools in the past when using HangFire. So it's also worth disabling (or increasing to reasonable limits) this on the application pool.
I'd suggest you put in your process a single call to the HTTP address first, just like a ping, that would be enought to trigger the site startup if it isn't running for some reason.
One other thing is that, by microsoft's description at MSDN the "AlwaysRunning" option would be:
"Specifies that the Windows Process Activation Service (WAS) will
always start the application pool. This behavior allows an application
to load the operating environment before any serving any HTTP
requests, which reduces the start-up processing for initial HTTP
requests for the application."
That may be, to produce the compilation of web pages that is done on the first call to be done before any request coming, but may not actually run the application at all times.
I am on a Shared IIS Hosting with no access to most settings. What I did is add a Recurring Job that would be triggered in minute interval less than the IIS Timeout/Idle.
RecurringJob.AddOrUpdate<IMyKeepAliveService>("KeepHangFireAlive", svc => svc.KeepHangFireAlive(URL_TO_SELF), "*/4 * * * *");
The above CRON is enough to prevent IIS App pool from going to sleep.
I use RestSharp to make a tiny ping/GET request to "Self".

Azure Webjobs: Does re-publishing associated website cause existing jobs to stop running?

I have an Azure Website where I would like to be able to republish the website without stopping any webjobs that might be running in the background.
Ignoring the fact that it's bad practice to publish while the site is being used, this scenario means that a large queue might keep the webjobs firing 24/7 as load increases on the website.
I'm not sure if publishing the website (and not the webjobs) cause the webjobs (scheduled and on-demand) to cancel. Do they?
I think they do, and in that case, is there anything you can do to prevent that? I risk jobs being stopped halfway-through because of the need to publish, and I don't want to sit there waiting for the queue to be empty before publishing. A method of allowing currently running jobs to finish without starting new runs would be fine too.
If the webjob files are not updated (under wwwroot/app_data/jobs/...) they will not restart.

Sudden dropoff in Azure queue performance

Short version: What reasons could there be for a sudden, dramatic, and seemingly permanent increase in the rate of timing-out Azure queue requests?
It's going to be difficult to provide all of the details that could possibly be relevant here, but here's a start:
This is an Azure application (SDK v2.0) with a WCF service placing work requests on a queue (roughly 100k calls a day) and a couple of worker roles which process the queue. We've got New Relic monitoring with the latest .NET agent (3.3.38).
We've run into an issue in our latest release, deployed a few days ago -- after it ran normally for about 24 hours, all of a sudden we started seeing a greatly increased rate of timeouts when our worker roles fetch messages from the queue, along with a catastrophic drop in throughput (our application can now barely keep up with its own queue using 40 workers, whereas it usually gets by with just 2!) Ever since the timeouts started, they show no signs of letting up, keeping up at the same rate since it started happening.
A couple images from New Relic to illustrate:
While this isn't nearly enough information to provide a good answer, I'm just trying to figure out where I might start looking. I've got support tickets open with New Relic and Microsoft, but we're trying to investigate on our own as well. Could this be throttling? Some kind of resource exhaustion in my queue processor worker role? We don't see increased load on the WCF service, and we haven't changed Azure client libraries or changed much of anything in the code that processes the queue.
I suggest you enable analytics on your storage account to determine if the bottleneck is server side or client side/network related. Specifically, you can look at Storage Analytics Metrics table - AverageE2ELatency and AverageServerLatency properties to check if the issue is server side or client side.
You can learn more about Azure storage analytics from links below
Overview:
http://msdn.microsoft.com/en-us/library/hh343270.aspx
How to enable in portal:
http://azure.microsoft.com/en-us/documentation/articles/storage-monitor-storage-account/
Metrics table Schema:
http://msdn.microsoft.com/en-us/library/hh343264.aspx
Blog post:
http://blogs.msdn.com/b/windowsazurestorage/archive/2011/08/03/windows-azure-storage-analytics.aspx

WebJob doesn't Trigger

I've created a simple Azure WebJob that uses a QueueInput trigger. It deployed without any problems and I've schedule it via the management portal so that it 'Runs continuously'
Initial testing seemed fine, with the job triggering shortly after placing anything in the queue.
By chance I then left it about a day before placing anything else in the queue. This time the job hadn't triggered within a few minutes so I logged in to the portal to view the invocation logs - which showed that the job had just that moment been triggered.
That seemed too much of a coincidence so I left it another day before placing something in the queue. Again, the job didn't trigger. I left it overnight and by morning it still hadn't triggered.
When I logged in to the management portal this time I noticed that the job was marked as 'Aborted' on the WebJobs page. It was like that only for about 10 seconds before the status changed to 'Running'. And then the job immediately triggered from what was placed in the queue the night before, as expected.
As it's an alpha release I'm expecting glitches. Just wondering whether anyone else has had a similar experience.
For WebJobs SDK, your job must be running in order to listen for triggers (new queue messages, new blobs, etc). Azure Websites free tier has quotas and will put your job to sleep which means it's no longer listening on triggers. Using the site may cause it to come back to life and start listening to triggers again.
The SDK dashboard will show a warning icon next to functions if the hosting job is not running (it detects this via heartbeats).
Make sure that your website is configured with the "Always On" setting Enabled.
If your site contains continuously running jobs they may not perform reliably if this setting is disabled.
http://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
By default, web sites are unloaded if they have been idle for some period of time. This lets the system conserve resources. You can enable the Always On setting for a site in Standard mode if the site needs to be loaded all the time. Because continuous web jobs may not run reliably if Always On is disabled, you should enable Always On when you have continuous web jobs running on the site.

How to create a job in IIS capable of running an extended process

I have a web service app, I have 1 web service call that could take anything from 1 hour to 14 hours, depending on the data that needs to be processed and the time of the month.
Is there any way to create a job in IIS that could be capable of running this extended process. I also need job management and reporting to be able to see if jobs are running, so that new jobs aren't created on top of others.
I will be working with IIS6 primarily. And would like to use C# code.
Right now I am using a web service call, but I don't like the idea of having web services run for such a long time, and due to the nature of the web service, I can't split the functionality any more.
IIS jobs would be awesome if they are available. Any ideas?
If I were you, I would make a command line app that is kicked off by the web service. Running a commandline app is pretty straight forward, basically
Process p = new Process();
p.StartInfo.UseShellExecute = false;
p.StartInfo.FileName = "appname.exe";
p.Start();
There are a limited amount of worker processes per machine, they aren't really meant for long running jobs.
One possibility, with a bit of setup cost, is to have your processing run as a Windows service that listens to a message queue (MSMQ or similar), and have your web service simply post the request onto the message queue to be handled by the processing service.
Monitoring jobs is more difficult; your web service would need to have a way of querying your processing service to find out its state. This is an IPC (interprocess communication) problem, which has many different solutions with various tradeoffs that depend on your environment and circumstances.
That said, for simple cases, Matt's solution is probably sufficient.

Resources