Continuous Webjob Restarts automatically - azure-web-app-service

I have a continuous running web job which starts long running process on a background thread. I have enabled Application logging to write all the logs into a storage table.
Sometimes the web job automatically restarts for no apparent reason and nothing is logged in the storage table. Below is the log information from the storage table
I assume whenever the web job stops it should write into the logs (first line in the below screen capture).
I looked at the memory and CPU consumption of the web app and it was always below 50% during the entire month.
I am using "Basic" pricing tier for the web app and set the "Always On" to true.
How to find out what is the reason behind the web job shutting down? Is there any other place where I can look for more detailed log?
EDIT Below is the logs from the scm site which still does not say why it stopped :(
EDIT 2 In the log I found some more information.
Could not send heartbeat. Access to the path 'C:\DWASFiles\Sites\MyWebApp\VirtualDirectory0\data\DaaS\Heartbeats\RD000D3A702E55' is denied.
at DaaSRunner.Program.Main(String[] args)
at DaaSRunner.Program.StartSessionRunner()
at DaaS.HeartBeats.HeartBeatController.<GetHeartBeats>d__8.MoveNext()
at DaaS.HeartBeats.HeartBeatController.GetNumberOfLiveInstances()
at DaaSRunner.Program.SendHeartBeat()
at System.Linq.Enumerable.Count[TSource](IEnumerable`1 source, Func`2 predicate)
at System.IO.File.Delete(String path)
at System.IO.File.InternalDelete(String path, Boolean checkHost)
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at DaaS.HeartBeats.HeartBeat.OpenHeartBeat(String heartBeatPath)
Any help?

Look at the WebJob's log file for system logs.
You can get to it by going to the WebJobs dashboard - https://{sitename}.scm.azurewebsites.net/azurejobs

Related

process in GCP VM instance killed automatically

I'm using GCP VM instance for running my python script as back ground process.
But I found my script got SIGTERM.
I check the syslog and daemon.log in /var/log
and I found my python script (2316) was terminated by system.
What do I need to check VM settings?
Judging from this log line in your screenshot:
Nov 12 18:23:10 ai-task-1 systemd-logind[1051]: Power key pressed.
I would say that your script's process was SIGTERMed as a result of the hypervisor gracefully shutting down the VM, which would happen when a GCP user or service account with admin access to the project performs a GCE compute.instances.stop request.
You can look for this request's logs for more details on where it comes from in the Logs Viewer/Explorer or gcloud logging read --freshness=30d (man) with some filters like:
resource.type="gce_instance"
"ai-task-1"
timestamp>="2020-11-12T18:22:40Z"
timestamp<="2020-11-12T18:23:40Z"
Though depending on the retention period for your _Default bucket (30 days by default), these logs may have already expired.

How to fetch IIS Start log for a corresponding IIS Stop log in Azure Log Analytics outside of Alert's monitoring time period

I'm working on configuring an Azure Log Analytics alert (using KQL) to capture the IIS Stop & Start events (from Events table) in my OMS Workspace, and if the alert query finds that there's no corresponding IIS Start event log generated from a PaaS Role for a particular IIS Stop event log- the user should get notified by an alert so that he can bring IIS back up.
Problem: Let’s say I setup my alert to run over a Time Period & Frequency of 15mins. If the alert triggered at 10:30AM, that means it will scan the IIS logs from 10:15:01 AM to 10:29:59 AM. Now, suppose an IIS Stop event got logged in around 10:28 AM, then the respective IIS Start log (if any) will be logged in after a couple of minutes around 10:31AM or 10:32 AM – and hence it will go out of the alert’s monitoring time period. This will create a false positive failure scenario. (IIS got started back but my alert didn’t captured the Start event log). And thus, it might lead to some unnecessary IIS Start/Reset operations on my PaaS roles.
Attaching a representative quick sketch to explain it figuratively.
Please let me know if there's any possible approach to achieve this. Any suggestions are welcome. Thanks in advance!
Current implementation as follows.
Here we can see False Alert generated at 10:30.
You can see the below approach, where we select last 10 minutes data(Overlapped) every 5 minutes.
For the below case you can generate the alert
See if its helping you.

Azure Web Job logs

I want to check my WebJob app.
I am sending a queue message to 'blobsaddressqueueu' queue.
After few seconds the message disappears from the queue - means that it triggered the WebJob app.
Then I see the message in 'blobsaddressqueueu-poison' - means that something went wrong in the process.
When I go to Log Stream (after I turn it on) under ParseMatanDataWebJob20170214032408, I do not see any changes and the only message I get in Log Stream is 'No new trace in the past 1 min(s)' and so on.
What am I doing wrong?
All I want to do is check the csv file (the queue message directs the webJob to the blob container with the csv file), and check the process when the csv file is read by WebJob so I will figure out why it goes to poison.
I do not see any changes and the only message I get in Log Stream is
'No new trace in the past 1 min(s)' and so on.
Maybe you could change your Logging Level in diagnostics logs, and if your level is right and you could not see the logs you could go to D:\home\LogFiles\SiteExtensions\Kudu in Kudu to check the detailed log file.
For you I suggest checking the running logs, you could get it in portal like the pic shows.Also you could get the log file in Kudu at data/jobs/continuous/jobName.
You still could add trace message logging in a WebJob, about the details you could refer to this article.
If you still have other questions, please me know.

Convert azure web application to azure website

Within our company we've got a rather large serviceapplication running as a azure cloudservice. The service contains a webrole and a workerrole.
The webrole contains an MVC-application and the workerrole is running in the background. The workerrole is used to handle several large processes and a bunch smaller processes 24/7, this is checked every 5 minutes.
I've created an azure website for this application and wrote a small wrapper class which checks if configuration values needs to be taken from either the web.config file or cloud configuration files (.cscfg files). I've added the appropiate transformations to transform some extra settings and published the application to the azure website.
So far everything works good, but what i've expected a bit already indeed happened.. The workerrole isn't working anymore and is throwing errors. The first error i've seen was;
Could not load file or assembly 'Microsoft.WindowsAzure.ServiceRuntime, Version=2.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified.
So ofcourse i've taken the quick solution and went 'properties > copy local' and set it to true. After publishing this to the azure website i'm getting the following error;
Could not load file or assembly 'msshrtmi, Version=2.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified.
I can find out where this error is coming from, but it feels like this is the second of a whole other bunch of errors coming. On several sites I've read that azure websites just doesn't support workerroles (obviously).
This gives me a few options;
Find a solution so I can connect the azure website to the workerrole still running in the cloudservice. If this works I can drop the webrole and I'm able to connect multiple instances to one workerrole.
Find a solution to convert the workerrole to something (no idea what this possibly could be) supported by the azure website.
Forget the whole idea and stick to the cloudservice setting with the web- and workerrole.
Fragment from workerrole.cs
The facade makes a database call to check any newly added processes.
public override void Run()
{
// Only process if the web.config says we're allowed to do so.
while (true)
{
var process = Convert.ToBoolean(WebConfigurationManager.GetSetting("Process"));
try
{
if (process )
{
var username = WebConfigurationManager.GetSetting("UsernameWorkerRole");
if (string.IsNullOrEmpty(username))
{
var version = Assembly.Load("Ecare.Productie.WorkerRole").GetName().Version;
var versionString = String.Format("{0}.{1}.{2}.{3}", version.Major, version.Minor, version.Build.ToString("000"), version.Revision.ToString("00000"));
username = ApplicatieConstanten.WorkerRoleName + " " + versionString;
}
IServiceFacade serviceFacade = new ServiceFacade(username);
serviceFacade.Start();
}
}
catch (Exception ex)
{
AuditingLoggingHelper.GetLoggerInstance(ApplicatieConstanten.WorkerRoleName).Error("Exception while starting service", ex);
}
Thread.Sleep(10000);
}
// ReSharper disable once FunctionNeverReturns
}
The main reason we're doing this, is because we have VS solution with an MVC-application (the web role) and the workerrole. We're currently publishing this to an cloud service in azure. Because of the development processes we're running seperate test, acceptation and production environments. Since it's a heavy process we're running quite expensive machines in azure, but that mostly only needed for the workerrole. The webpart is lightweight. So it's mainly an idea trying to reduce costs. So the idea is to convert the webrole to an azure website (this part is working already with just a small modification to read information from the web.config instead of the cloudconfiguration). But the workerrole currently isn't working because we haven't changed anything for that yet. An colleague of mine basically said "write a wrapper for the configpart, publish the azurewebsite to 1 or more testenvironments and point them to the same workerrole". But i'm having my doubts wether this is even possible..
Did anyone else ever ran into this sort of situation and found a solution for this? Any help finding a solution is greatly appreciated!
Find a solution so I can connect the azure website to the workerrole
still running in the cloudservice. If this works I can drop the
webrole and I'm able to connect multiple instances to one workerrole.
I'm guessing that you're using some kind of queue mechanism (Azure Storage Queues or Service Bus Queues) to facilitate communication between Web and Worker Role. If that's the case, then you can continue to use the same. Your website will push messages in a queue and your worker role will poll this queue and fetch messages and work on those.
Find a solution to convert the workerrole to something (no idea what
this possibly could be) supported by the azure website.
Do take a look at Azure Webjobs. In Web Apps world, they are the counterpart of Worker Roles.
UPDATE
Based on the comments, I think you should be able to port your code to run as Web Jobs. There are two ways by which you can do it:
If you create a Continuous Web Job, then you would have to put this 10 second sleep logic in your code itself. The job will continuously be running but will only wake up every 10 seconds. Similar to your current Worker Role implementation.
You could very well take out this 10 seconds sleep logic from your code by making your Web Job as a Scheduled Web Job where you schedule to run this every 10 seconds. I would recommend going down this route as you have decoupled your scheduling logic (10 second sleep) from your application. So tomorrow if you were to increase the sleep time, you would simply change the schedule in the portal without redeploying your code.
As Gaurav pointed, the equivalent to worker roles in the App Service space is Azure WebJobs.
Regarding this problem:
So far everything works good, but what i've expected a bit already indeed happened.. The workerrole isn't working anymore and is throwing errors. The first error i've seen was;
Could not load file or assembly 'Microsoft.WindowsAzure.ServiceRuntime, Version=2.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified.
So ofcourse i've taken the quick solution and went 'properties > copy local' and set it to true. After publishing this to the azure website i'm getting the following error;
Could not load file or assembly 'msshrtmi, Version=2.5.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified.
Microsoft.WindowsAzure.ServiceRuntime is specific to Cloud Services, and will not work in Web Apps (that's why you get the msshrtmi error with the web app). If you are still running a worker role, that file is in the instance GAC, and should be in your local machine's also. That said, Microsoft.WindowsAzure.ServiceRuntime can be referenced in the worker role project, but not the web app project.
I'm guessing you are using ServiceRuntime to get some configuration setting value using:
var value = RoleEnvironment.GetConfigurationSettingValue(settingName);
You can changed it to:
var value = CloudConfigurationManager.GetSetting(settingName);
as this method reads the configuration setting value from the appropriate configuration store (from MSDN).
The best solution here is to convert the Worker Role to a WebJob as #Graurav mentioned above.
If you want to connect the Web App to the Worker Role would be to use an Azure Queue or other intermediary storage where operations could be dropped form the WebApp and picked up by the Worker Role.

Azure GetMachineGoalState dying from HTTP 410 error

My Azure web role is constantly recycling.
The WaAppAgent.log file on my Azure web role contains a whole stream of these errors:
[00000008] [05/15/2012 00:10:20.90] GetMachineGoalState() failed with exception: Microsoft.ServiceModel.Web.WebProtocolException: Server Error: Gone (Gone) ---> System.Net.WebException: The remote server returned an error: (410) Gone.
at System.Net.HttpWebRequest.GetResponse()
at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)
--- End of inner exception stack trace ---
at Microsoft.ServiceModel.Web.WebHttpChannelProxy`1.Invoke(IMessage msg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Microsoft.WindowsAzure.RoleContainer.Protocol.IControlSystem.GetMachineGoalState()
at Microsoft.WindowsAzure.GuestAgent.ContainerStateMachine.ControlSystem.GetGoalState().
[00000008] [05/15/2012 00:10:20.90] Caught exception in pre-initialization heartbeat thread, will continue heartbeats: System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.WindowsAzure.GuestAgent.ContainerStateMachine.ContainerStateManager.InitialHeartbeatThread(Object waitEvent)
[00000005] [05/15/2012 00:10:23.24] Agent runtime initialized.
The Azure service status dashboard says everything is green on their side, but it always says that unless the entire platform is down in a smoldering heap so I don't really trust it at all.
Is this my problem, or theirs?
I take your statement that you are using Full IIS based Windows Azure Web Role. I have seen such error with VM Role but with Web Role it seems very strange. I wouldn't weight much on your logs either at this time because it may mislead to wrong direction.
If your Web role status shows green (Ready) state but your site is not available then issue could be related to application itself because green or ready state means the role host process (in this case waiishost.exe) is healthy. It would be rare occurrence when role status is green and your role host process is unhealthy. Based on what you have provided either there is an issue within the VM startup sequence means your role not even started however the portal must not show ready state.
Please log into your Azure VM using RDP and..
Check first that WaIISHost.exe process is running or not.. keep an eye in this process for 2 minutes to check if this process is crashing and restarting again
Please check application event log because it should have some pattern of exception to find the root cause
Please check recent Azurebootstrapper and iisconfigurator logs located in C:\logs folder for any specific issue during VM boot and IIS start respectively.
At last please backup the logs (drive C:\logs and C:\Resources), please try rebooting your instance. Still have problem contact Windows Azure support team:
https://www.windowsazure.com/en-us/support/contact/

Resources