Azure Role Failure Notifications - azure

I have an Azure Cloud Service Load Balanced over 2 Roles with a recent addition of a custom LoadBalancerProbe.
The issue I am having is that in my site I have a threaded task that was throwing unhandled exceptions, 5 of these unhandled exceptions was causing the role to report itself as unhealthy. I have fixed this issue by dealing with these unhandled exceptions, however, we have a large team and I am fearful that this may happen again as a result of a future release.
I would like to set up an azure notification (preferably email) to let me know when one of my roles reports itself unhealthy. I have looked around on the web but cannot find any help. Has someone done this before? It's more of a precaution but I am keen to get this implemented.
Thanks in advance for any help you can give me

There isn't an alert baked into the platform for this that I know of, but you can certainly do some coding/scripting to get this done. You can use any of the SDKs, Microsoft Azure Management Libraries or command line tools that let you get at the Azure Management API, but for my example here I'll use PowerShell. I don't have a script to do this completely, but the main part is getting the instance statuses, such as:
$NonReadyInstances = (Get-AzureDeployment $serviceName -Slot Production -verbose:$false).RoleInstanceList | Where-Object { $_.InstanceStatus -ne "ReadyRole" }
This is getting all the instances that aren't in a ready state; however, you'd probably want to filter out for several of the other valid states of instances as your system scales, etc. There is a list of valid instances states you can use to decide which ones you want to be notified about. For example, in your case I'd think CyclingRole or RestartingRole would be ones to watch for.
Check the count of how many items are returned in the $NonReadyInstances variable and if it is greater than zero send your email. Put that in a script and run it regularly.
Here's a link to getting started and set up with the Azure PowerShell Cmdlets if you aren't familiar with it. - http://msdn.microsoft.com/en-us/library/jj156055.aspx
Also, here is a link on the Azure Automation preview service if you want to run this without running it from your own machines- http://azure.microsoft.com/en-us/services/automation/.

Related

About container options of Azure Batch

I am in trouble with the container options of Azure Batch.
To change the hostname of the container to be started, set --hostname="test" to the containerRunOptions of Task.
However, it is an error!
ContainerSettings: --hostname="test" Message: create_container() got an unexpected keyword argument 'hostname '
Even -h test will result in a similar error.
Other options work fine.(--volume etc...)
Pool Infomation:
Publisher:microsoft-azure-batch
OS:centos-container
sku:7-4
image:centos:latest(docker hub)
Is this a bug in Azure Batch?
Is the option to specify it wrong?
Updated answer (2018-08-23):
The fix for this issue has been rolled out.
Previous answer:
This was identified as a service defect and will be addressed in a future version. You can track the Azure Batch Node Agent release notes for when the fix is released.
If you are using Batch to execute tasks without performing deeper integration, e.g., you are using the Azure CLI or similar tooling, you can use Batch Shipyard in "non-native mode" to work around this problem in the meantime. (Disclaimer: I'm a contributor for this code).

Terminate resource group automatically based on a time window

I'm not sure if it's an off topic for SO but I really need help here. Now in my project we are running load test on weekly basic and we are taking the advantage of ARM and azure CLI for making it fully automated test framework, starting from vm spinning to report gen.
But after the test, for now we are terminating the resource group manually and we have few though to make it automatic e.g by running a cron job. So just I'm curious if there is a better approach to do a graceful termination/destroy(not stop) automatically using azure cli based on a time window.
No, there is no such a way, but if everything is automated, you can run az group delete xxx at the end of your script\automation routine.
On top of that, take a look at Event Grid. Its a new service that can create actions in response to events.

Alternatives for Application Insight reg:

I have an existing on-prem/Cloud environment in which am running my enterprise application and I would like to implement Application Insight to capture telemetry. But I have few issues on it. Are there any alternatives to use application insights? I have two concerns here:
1) it might not be possible to install softwares in production environment 2) restarting IIS Server would pull all the sites down at least for a minutes or two. It would be great if some one can suggest alternatives of leveraging these App Insights. Thanks in advance :)
there are 2 ways to use Application insights:
1) using the sdk, where you add the sdk to your service. At some point you have to deploy the service, so when you deploy, you'd also deploy app insights into that service
2) using status monitor, which does require restarting IIS. using status monitor isn't required, but does let you collect extra and detailed information that you wouldn't get from the sdk alone.
A lot of people end up doing both, (1) so they can do custom collection of events, traces, etc, and (2) to get detailed dependency calls
But like AlexB suggested, setting up something where you can swap between slots is one of the best ways to set things up, if possible, so you can just swap between the slots without having any downtime at all.

Role Instances are taking longer than expected - Workaround issues

Whenever we get the error "Role Instances are taking longer than expected". The only possible options to do are .
Shutdown the emulators and try again.
Restart the machine and see if that helps.
Uninstall the Azure Tools for that version.
Some times uninstalling the same takes a long time,some times even days. It appears that some process or service is blocking the same. Has anyone faced this before ? If yes does anyone know which process would be blocking the same?
When an instance starts it will run the OnStart method on the worker/web role (depending on your service type). The more stuff you have in there, the more time it will take to start up the role. Common caveats are the Cache as mentioned and blob/table storage (if you do read/write/create when you start the role).
Try minimizing the OnStart's workload and moving any storage stuff in async tasks.
I have had similar problems as well in the past
IISConfigurator could not map the web roles in IIS. In my case it was due to corrupted file system ACLs on the code directory. See logs under C:\Users\YOUR_USER_NAME\AppData\Local\dftmp\IISConfiguratorLogs\
Another cause might be that something else has tied up the Port Numbers that Azure is trying to bind your web role on. Or that the ports that the local storage needs for tables/blobs and queues (10000-10002) have been taken by another app. Open a command prompt and run netstat -anb
Try running the Visual Studio using "Run as Administrator" option.

Is there a force flag in windows azure to change the role size?

I tried to change the role size (upwards) in an Azure role and got the following error after uploading
"The role size specified for role 'Website' in the newly uploaded package differs from the role size for this role in the currently deployed service. Changing the size of the role will cause all local data on the role instance to be lost. Please use the Force flag if you want to allow the loss of local data."
which leads to the question - is there a force flag? Where is it? How do I set it?
Its just appeared!
An update to the Management Portal today (20 Oct 2011) has added an "Allow VM size or role count to be updated" tick box to the Upgrade Deployment dialog. So I guess that's the shiny new Force Flag!
The information I've heard on the Azure forums is that it is not possible to modify the role size without a full redeploy (and this has been my experience as well).
Over time Microsoft has allowed more things to be modified during an upgrade and Mark Russinovich may have suggested in a Mix or TechEd presentation that role size modification would be supported some point in the future.
The error message you are receiving may be an early artefact of the implementation of a size upgrade enhancement. The research I've done (and I have done a LOT of asking around) would indicate that the "Force Flag" mentioned is not actually implemented yet - although I'm more than happy to be proven wrong.
Right now, the vmsize is inside of the package (cspkg) which is sent to AppFabric. It's a decision which is part of the build/packaging. So, to change it, you need to change the VMSize, build the package and then [re]deploy.
If you have a running system, you can avoid down time by using VIP Swap. It will basically stand up another AT[s] in the staging slot with your new VM size and then swap over to it.
So, there's more moving parts involved but you can still change it with virtually no impact.

Resources