Azure VMs restart unexpectedly - azure

This is a problem is related to worker role hosted VM. I have a simple worker role, which spans a process inside of it. The process spawned is the 32 bit compiled TCPServer application. Worker role has a endpoint defined in it, the TCPserver is bound to the end point of the Worker role. So when I connect to my worker role endpoint, and send something, TCPserver recieves it , processes it returns something back. So here the endpoint of the worker role which is exposed to outside world, internally connects to TCPserver.
string port = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints[""TCPsocket].IPEndpoint.Port.ToString();
var myProcess = new Process()
{
StartInfo = new ProcessStartInfo(Path.Combine(localstorage.RootPath, "TCPServer.exe"))
{
CreateNoWindow = true,
UseShellExecute = true,
WorkingDirectory = localstorage.RootPath,
Arguments = port
}
};
It was working fine. But suddenly sever stopped to respond. When I checked in portal, VM role was restarting automatically. But it never succeeded. It was showing Role Initializing.. status. Manual stop and start also din't work. I redeployed the same package without any change in the code. This time deployment itself failed.
Warning: All role instances have stopped
- There was no endpoint listening at https://management.core.windows.net/<SubscriptionID>/services/hostedservices/TCPServer/deploymentslots/Production that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.
But after some time again I tried to deploy, it worked fine.
Can anyone tell me what would be the problem?
Update:
public override void Run()
{
Trace.WriteLine("RasterWorker entry point called", "Information");
string configVal = RoleEnvironment.GetConfigurationSettingValue("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString");
CloudStorageAccount _storageAccount = null;
_storageAccount = CloudStorageAccount.Parse(configVal); // accepts storage cridentials and create storage account
var localstorage = RoleEnvironment.GetLocalResource("MyLocalStorage");
CloudBlobClient _blobClient = _storageAccount.CreateCloudBlobClient();
bool flag = false;
while (true)
{
Thread.Sleep(30000);
if (!flag)
{
if (File.Exists(Path.Combine(localstorage.RootPath, "test.ppm")))
{
CloudBlobContainer _blobContainer = _blobClient.GetContainerReference("reports");
CloudBlob _blob = _blobContainer.GetBlobReference("test.ppm");
_blob.UploadFile(Path.Combine(localstorage.RootPath, "test.ppm"));
Trace.WriteLine("Copy to blob done!!!!!!!", "Information");
flag = true;
}
else
{
Trace.WriteLine("Copy Failed-> File doesnt exist!!!!!!!", "Information");
}
}
Trace.WriteLine("Working", "Information");
}
}

To prevent your worker role to be restart you'll need to block the Run method of your entry point class.
If you do override the Run method, your code should block
indefinitely. If the Run method returns, the role is automatically recycled by raising the Stopping event and calling the OnStop method
so that your shutdown sequences may be executed before the role is
taken offline.
http://msdn.microsoft.com/en-us/library/windowsazure/microsoft.windowsazure.serviceruntime.roleentrypoint.run.aspx
You need to make sure that, whatever happens, you never return from the Run method if you want to keep the role alive.
Now, if you're hosting the TCPServer in a console application (I'm assuming you're doing this since you pasted the Process.Start code), you'll need to block the Run method after starting the process.
public override void Run()
{
try
{
Trace.WriteLine("WorkerRole entrypoint called", "Information");
var myProcess = new Process()
{
StartInfo = new ProcessStartInfo(Path.Combine(localstorage.RootPath, "TCPServer.exe"))
{
CreateNoWindow = true,
UseShellExecute = true,
WorkingDirectory = localstorage.RootPath,
Arguments = port
}
};
myProcess.Start();
while (true)
{
Thread.Sleep(10000);
Trace.WriteLine("Working", "Information");
}
// Add code here that runs in the role instance
}
catch (Exception e)
{
Trace.WriteLine("Exception during Run: " + e.ToString());
// Take other action as needed.
}
}
PS: This has nothing to do with your deployment issue, I assume this was a coincidence

Related

Azure Web App Windows Container Terminated before gracefull shutdown

I've created a .NET Framework Console program which starts and runs some code, then upon exit it should logout of any external services before exiting (gracefully shutdown).
Here is a sample program:
using System;
using System.Runtime.InteropServices;
using System.Threading;
namespace delayed_shutdown
{
class Program
{
public enum CtrlTypes
{
CTRL_C_EVENT = 0,
CTRL_BREAK_EVENT,
CTRL_CLOSE_EVENT,
CTRL_LOGOFF_EVENT = 5,
CTRL_SHUTDOWON_EVENT
}
[DllImport("Kernel32")]
public static extern bool SetConsoleCtrlHandler(HandlerRoutine handler, bool Add);
public delegate bool HandlerRoutine(CtrlTypes CtrlType);
public static volatile HandlerRoutine handlerRoutine = new HandlerRoutine(ConsoleCtrlCheck), true)
public static volatile ManualResetEvent exitEvent = new ManualResetEvent(false);
public static bool ConsoleCtrlCheck(CtrlTypes ctrlType)
{
switch (ctrlType)
{
case CtrlTypes.CTRL_C_EVENT:
Console.WriteLine("CTRL_C received");
exitEvent.Set();
return true;
case CtrlTypes.CTRL_CLOSE_EVENT:
Console.WriteLine("CTRL_CLOSE received");
exitEvent.Set();
return true;
case CtrlTypes.CTRL_BREAK_EVENT:
Console.WriteLine("CTRL_BREAK received");
exitEvent.Set();
return true;
case CtrlTypes.CTRL_LOGOFF_EVENT:
Console.WriteLine("CTRL_LOGOFF received");
exitEvent.Set();
return true;
case CtrlTypes.CTRL_SHUTDOWON_EVENT:
Console.WriteLine("CTRL_SHUTDOWN received");
exitEvent.Set();
return true;
default:
return false;
}
}
static int Main(string[] args)
{
if (!SetConsoleCtrlHandler(handlerRoutine))
{
Console.WriteLine("Error setting up control handler... :(");
return -1;
}
Console.WriteLine("Waiting for control event...");
exitEvent.WaitOne();
var i = 60;
Console.WriteLine($"Exiting in {i} seconds...");
while (i > 0)
{
Console.WriteLine($"{i}");
Thread.Sleep(TimeSpan.FromSeconds(1));
i--;
}
Console.WriteLine("Goodbye");
return 0;
}
}
}
I would have expected Windows Containers running as Azure App Service to trigger "docker stop" like function, which would send SIGTERM to my application.
But what happens is that Azure Web App Windows Container is terminated, after 1 sec of trying to stop the container. How do ask Azure Web App to wait X number of seconds before terminating the windows container?
We are currently working on signaling the process upon stop for Windows Containers on Azure App Service.
In Azure App Service we will default to 5 seconds for waiting for a container to exit upon shutdown but we will allow this to be configurable using the following app setting: WEBSITES_CONTAINER_STOP_TIME_LIMIT and we will allow to wait up to 2 min (WEBSITES_CONTAINER_STOP_TIME_LIMIT=00:02:00)
This capability will be deployed in the next update rollout and I hope to be available worldwide early next year and once it is out we will update our docs too, so please stay tuned.
Intercepting the SIGTERM event is something that isn't currently supported. Since App Service is tailored to HTTP workloads, I am curious though as to the reasoning of having a console app pick up such an event. If you could elaborate further, there may be an alternative such as running your console app as a Web Job instead.

Azure web jobs - parallel message processing from queues not working properly

I need to provision SharePoint Online team rooms using azure queues and web jobs.
I have created a console application and published as continuous web job with the following settings:
config.Queues.BatchSize = 1;
config.Queues.MaxDequeueCount = 4;
config.Queues.MaxPollingInterval = TimeSpan.FromSeconds(15);
JobHost host = new JobHost();
host.RunAndBlock();
The trigger function looks like this:
public static void TriggerFunction([QueueTrigger("messagequeue")]CloudQueueMessage message)
{
ProcessQueueMsg(message.AsString);
}
Inside ProcessQueueMsg function i'm deserialising the received json message in a class and run the following operations:
I'm creating a sub site in an existing site collection;
Using Pnp provisioning engine i'm provisioning content in the sub
site (lists,upload files,permissions,quick lunch etc.).
If in the queue I have only one message to process, everything works correct.
However, when I send two messages in the queue with a few seconds delay,while the first message is processed, the next one is overwriting the class properties and the first message is finished.
Tried to run each message in a separate thread but the trigger functions are marked as succeeded before the processing of the message inside my function.This way I have no control for potential exceptions / message dequeue.
Tried also to limit the number of threads to 1 and use semaphore, but had the same behavior:
private const int NrOfThreads = 1;
private static readonly SemaphoreSlim semaphore_ = new SemaphoreSlim(NrOfThreads, NrOfThreads);
//Inside TriggerFunction
try
{
semaphore_.Wait();
new Thread(ThreadProc).Start();
}
catch (Exception e)
{
Console.Error.WriteLine(e);
}
public static void ThreadProc()
{
try
{
DoWork();
}
catch (Exception e)
{
Console.Error.WriteLine(">>> Error: {0}", e);
}
finally
{
// release a slot for another thread
semaphore_.Release();
}
}
public static void DoWork()
{
Console.WriteLine("This is a web job invocation: Process Id: {0}, Thread Id: {1}.", System.Diagnostics.Process.GetCurrentProcess().Id, Thread.CurrentThread.ManagedThreadId);
ProcessQueueMsg();
Console.WriteLine(">> Thread Done. Processing next message.");
}
Is there a way I can run my processing function for parallel messages in order to provision my sites without interfering?
Please let me know if you need more details.
Thank you in advance!
You're not passing in the config object to your JobHost on construction - that's why your config settings aren't having an effect. Change your code to:
JobHost host = new JobHost(config);
host.RunAndBlock();

How to integration test Azure Web Jobs?

I have a ASP.NET Web API application with supporting Azure Web Job with functions that are triggered by messages added to a storage queue by the API's controllers. Testing the Web API is simple enough using OWIN but how do I test the web jobs?
Do I run a console app in memory in the test runner? Execute the function directly (that wouldn't be a proper integration test though)? It is a continious job so the app doesn't exit. To make matters worse Azure Web Job-functions are void so there's no output to assert.
There is no need to run console app in memory. You can run JobHost in the memory of your integration test.
var host = new JobHost();
You could use host.Call() or host.RunAndBlock(). You would need to point to Azure storage account as webjobs are not supported in localhost.
It depends on what your function is doing, but you could manually add a message to a queue, add a blob or whatever. You could assert by querying the storage where your webjob executed result, etc.
While #boris-lipschitz is correct, when your job is continious (as op says it is), you can't do anything after calling host.RunAndBlock().
However, if you run the host in a separate thread, you can continue with the test as desired. Although, you have to do some kind of polling in the end of the test to know when the job has run.
Example
Function to be tested (A simple copy from one blob to another, triggered by created blob):
public void CopyBlob(
[BlobTrigger("input/{name}")] TextReader input,
[Blob("output/{name}")] out string output)
{
output = input.ReadToEnd();
}
Test function:
[Test]
public void CopyBlobTest()
{
var blobClient = GetBlobClient("UseDevelopmentStorage=true;");
//Start host in separate thread
var thread = new Thread(() =>
{
Thread.CurrentThread.IsBackground = true;
var host = new JobHost();
host.RunAndBlock();
});
thread.Start();
//Trigger job by writing some content to a blob
using (var stream = new MemoryStream())
using (var stringWriter = new StreamWriter(stream))
{
stringWriter.Write("TestContent");
stringWriter.Flush();
stream.Seek(0, SeekOrigin.Begin);
blobClient.UploadStream("input", "blobName", stream);
}
//Check every second for up to 20 seconds, to see if blob have been created in output and assert content if it has
var maxTries = 20;
while (maxTries-- > 0)
{
if (!blobClient.Exists("output", "blobName"))
{
Thread.Sleep(1000);
continue;
}
using (var stream = blobClient.OpenRead("output", "blobName"))
using (var streamReader = new StreamReader(stream))
{
Assert.AreEqual("TestContent", streamReader.ReadToEnd());
}
break;
}
}
I've been able to simulate this really easily by simply doing the following, and it seems to work fine for me:
private JobHost _webJob;
[OneTimeSetUp]
public void StartupFixture()
{
_webJob = Program.GetHost();
_webJob.Start();
}
[OneTimeTearDown]
public void TearDownFixture()
{
_webJob?.Stop();
}
Where the WebJob Code looks like:
public class Program
{
public static void Main()
{
var host = GetHost();
host.RunAndBlock();
}
public static JobHost GetHost()
{
...
}
}

Update storage tables when webjob is shutting down

My question is similar to the below one.
Notification of when continuous Azure WebJob is stopping for NoAutomaticTrigger type jobs
I have used the idea from Amit's Blog but then hit a little roadblock
I have a file watcher set in the webjob which gets triggered if the webjob is shutdown from the portal.
I need to update a few flags in my storage tables before the webjob is terminated.
The problem is that my code seems to stop at a point where I am trying to retrive a record from storage table. I have exception handler around the below code and no exception message is written on the console.
Below is my code
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("my storage key");
var tableClient = storageAccount.CreateCloudTableClient();
var table = tableClient.GetTableReference("myTable");
TableOperation operation = TableOperation.Retrieve("partKey", "rowKey");
var result = table.Execute(operation); // stucks here
if (result.Result != null)
{
MyEntity entity = (MyEntity)result.Result;
if (entity != null)
{
entity.IsRunning = false; //reset the flag
TableOperation update = TableOperation.InsertOrReplace(entity);
table.Execute(update); //update the record
}
}
I have increased the stopping_wait_time in settings.job to 300 seconds but still no luck.
You could use Microsoft.Azure.WebJobs.WebJobsShutdownWatcher
This is an implementation of Amit solution : WebJobs Graceful Shutdown
So I've found a solution doing this :
No modification in the Program.cs
class Program
{
static void Main()
{
var host = new JobHost();
host.Call(typeof(Startup).GetMethod("Start"));
host.RunAndBlock();
}
}
the graceful shutdown goes in your function :
public class Startup
{
[NoAutomaticTrigger]
public static void Start(TextWriter log)
{
var token = new Microsoft.Azure.WebJobs.WebJobsShutdownWatcher().Token;
//Shut down gracefully
while (!token.IsCancellationRequested)
{
// Do somethings
}
// This code will be executed once the webjob is going to shutdown
Console.Out.WriteLine("Webjob is shuting down")
}
}
After the while loop, you could also stop started tasks.

WindowsEventLogs not logged on Azure

I have an Azure WebRole with the following code:
public override bool OnStart()
{
setDiagnostics();
TestClass test = new TestClass();
return base.OnStart();
}
private void setDiagnostics()
{
string wadConnectionString = "Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString";
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue(wadConnectionString));
DeploymentDiagnosticManager deploymentDiagnosticManager = new DeploymentDiagnosticManager(cloudStorageAccount, RoleEnvironment.DeploymentId);
RoleInstanceDiagnosticManager roleInstanceDiagnosticManager = cloudStorageAccount.CreateRoleInstanceDiagnosticManager(
RoleEnvironment.DeploymentId,
RoleEnvironment.CurrentRoleInstance.Role.Name,
RoleEnvironment.CurrentRoleInstance.Id);
DiagnosticMonitorConfiguration diagConfig = roleInstanceDiagnosticManager.GetCurrentConfiguration();
if (diagConfig == null)
diagConfig = DiagnosticMonitor.GetDefaultInitialConfiguration();
diagConfig.WindowsEventLog.DataSources.Add("Application!*");
diagConfig.WindowsEventLog.ScheduledTransferPeriod = TimeSpan.FromMinutes(1D);
roleInstanceDiagnosticManager.SetCurrentConfiguration(diagConfig);
DiagnosticMonitor.Start(wadConnectionString, diagConfig);
}
In the constructor of my TestClass is the following code:
EventLog.WriteEntry("TestClass", "Before the try", EventLogEntryType.Information);
try
{
EventLog.WriteEntry("TestClass", "In the try", EventLogEntryType.Information);
Int32.Parse("abc");
}
catch (Exception ex)
{
EventLog.WriteEntry("TestClass", ex.Message, EventLogEntryType.Error);
}
For some reason this code works well if I run it in debug mode with a break point on the OnStart method and running through the code with F11. However, I do not see any EventLog entries in my WADWindowsEventLogsTable if I remove all breakpoints and just run it. So this seems like a timing issue to me... Does anyone know why my code is performing this behaivor?
Thanks in advance!
The problem was the EventLog.WriteEntry() method. I used the source "TestClass" as EventLog source. However I never created this source with a startup task and due to insufficient privileges it failed to log my entries.
So the solution: create an own source with a startup task or use trace messages.
Try moving your TestClass creation from RoleEntryPoint.OnStart() to RoleEntryPoint.Run().
public override void Run() {
TestClass test = new TestClass();
base.Run();
}
When it is deployed to Azure, does the role actually startup or does it just cycle between initialising and busy?
If it does the problem is to do with time. The items are transferred from the instance to Azure Storage on a timed basis:
diagConfig.WindowsEventLog.ScheduledTransferPeriod = TimeSpan.FromMinutes(1D);
if the instance restarts before it hits that timer, then nothing gets transferred to your table.
You'll need to make sure that you keep the application running for more than 1 minute.
You'll need to verify that the following setting is turned off (this might cause the logs to end up in a different place than you would expect them to, more information: http://michaelcollier.wordpress.com/2012/04/02/where-is-my-windows-azure-diagnostics-data/):

Resources