What happens when when the disc holding logs on azure is full? - azure

Our website is currently deployed to azure and we are writing trace logs using azure diagnostics. We then ship the logs to blob storage periodically and read them using Cerebrata’s Windows Diagnostics Manger software. I would like to know what happens when the disc holding the logs on azure is full i.e before the logs are shipped. When do the logs get purged? and is it is any different if the logs are not shipped. My concern is that the site may somehow fall over when exceptions are raised (if at all) when trying to write to a full disc.
Many Thanks

If you are using Windows Azure Diagnostics, then it will age out the logs on disk (deleting the oldest files first). You have a quota that is specified in your wad-control-container in blob storage on an instance level basis. By default, this will be 4GB (you can change it). All of your traces, counters, and event logs needs to fit in this 4GB of disk space. You can set separate quotas here if you like per data source as well. The Diagnostics Manager takes care to manage the data sources and the quota.
Now, there was a bug in older versions of the SDK where the disk could get full and diagnostics stopped working. You will know if you might be impacted by this bug by RDP'ing into an instance and trying to navigate to C:\Resources\Directory\\Monitor directory. If you are denied access, then you are likely to hit this bug. If you can view this directory as normal admin on machine, you should not be impacted. There was a permission issue in an older SDK version where deletes to this directory failed. Unfortunately, the only symptom of this impact is that suddenly you won't get data transferred out anymore. There is no overt failure.

Are you using System.Diagnostics.Trace to "write" your logs, or you are writing in log files.
In either way there is a roll-up. Meaning that if you hit storage quota, before logs beeing transfered, the oldest logs are being deleted. But you can easily increase your default (4G !) logs quota.
Please take a look at following articles and posts, describing in detail diagnostics in Windows Azure:
http://blogs.msdn.com/b/golive/archive/2012/04/21/windows-azure-diagnostics-from-the-ground-up.aspx
http://www.windowsazure.com/en-us/develop/net/common-tasks/diagnostics/
http://msdn.microsoft.com/en-us/library/windowsazure/hh411544.aspx

Related

Does the Diagnostic Logging setting turn itself off by design?

I have enabled diagnostic logging (Error level only to file system or blob) on my azure website several times and confirmed that it is working. When I come back and check the next day it is switched off. I can't seem to find any documentation that suggest that this is by design.
If you're logging to File System, then it does disable itself after 12 hours. You can see this if you click the help bubble:
The reason is that it could affect site performance due to excessive writing to the (slow) file system.
However, if you set it up for blob, it should never get turned off until you do it.
If you turn on Application Logging to the File System, then yes, it will turn itself off after 12 hours. You can see this in the portal if you hover over the information icon for Application Logging (see below). This behavior is also document here for reference.
The reason why this is disabled after 12 hours has to do with the limited set of storage you have on the local file system, which will be 1GB - 250GB depending on your App Service Plan (size).
If you enable application logging to Azure Storage (blob), then you have up to 500TB of potential storage. In this scenario, your logging should not be getting disabled after 12 hours.

Processe IIS log files from Azure storage container- using web job a good idea?

I have azure log files with more than 250MB size each in one container(6 files per hour). I have cs program to access and process these log files. But what i am doing now is just taking only 100 lines from each log files(created in one hour). If i am processing the whole files, then i want to access almost 1.5GB of data. How can i handle this situation? My plan is to use a WebJob to create smaller files from this log files automatically and to store these files to a different container, and access that files from my cs program. Do you have any idea?
To tell the truth I don't understand the problem- are you worried because of traffic or processing time?
In any case you can try to reduce file size by removing some fields in IIS log setup. Another option will be setting Log File Rollover to smaller size- this can minimize download time for processing.
What you are suggesting is doable with a WebJob, but it sounds like the size of the file is the issue. Would a different log type be better for your scenario? Possibly using "Failed Request Tracing". You may also be able to change the verbosity level in the diagnostics config. For more info see:
Enable diagnostics logging for web apps in Azure App Service
Azure Web App (Website) Logging - Tips and Tools

Azure Websites automated and manual backups are not created

Whilst accepting that Backups in Windows Azure Websites are a preview feature, I can't seem to get them working at all. My site is approximately 3GB and on the standard tier. The settings are configured to move to a Geo-Redundant storage account with no other containers. There is no database selected, I'm only backing up the files.
In the Admin Portal, if I use the manual Backup Now button, a 0 bytes file is created within the designated storage account, dated 01/01/0001 00:00:00. However even after several days, it is not replaced with the 'actual' file.
If I use the automated backup scheduler, nothing happens at all - no errors, no 0 byte files.
Can anyone shed any light on this please?
The backup/restore feature is still in a preview mode and officially supports only 2 GB of data. From the error message you posted ("backup is currenly in progress") it seems you probably hit a bug which was there and was fixed last week (the result of that bug was that there were some lingering backups which blocked subsequent backups).
Please try it again, you should be able to invoke it now. If you find another error message in operational logs, feel free to post it here (just leave the RequestId in it unscrambled - we can correlate using that) and we can take a look.
However, as I mentioned in the beginning, more than 2 GBs are not fully supported yet (you might not be able to do e.g. roundtrip with your data - backup and then restore).
Thanks,
Petr

Caching Diagnostics recommends 20GB of local storage(!). Why?

I installed the Azure 1.8 tools/SDK and it upgraded my projects co-located caching from preview to final. However, it also decided to add 20GB to the Role's Local Storage (DiagnosticStore). I manually dialed it down to 500MB but then I get the following message in the Role's Property page (cloud proj => roles => right click role => properties i.e. GUI for ServiceDefinition.csdef):
Caching Diagnostics recommends 20GB of local storage. If you decrease
the size of local storage a full redeployment is required which will
result in a loss of virtual IP addresses for this Cloud Service.
I don't know who signed off on this operating model within MS but it begs a simple Why?. For better understanding, I'm breaking that "Why" into 3 "Why" subquestions for caching in Azure SDK 1.8:
Why is the diagnostics of caching coupled with the caching itself? We just need caching for performance...
Why is the recommendation for a whopping 20Gigs? What happens if I dial it down to 500MB?
Slightly off-topic but still related: why does the decreasing of local storage require a full redeployment? This is especially painful since Azure doesn't provide any strong controls to reserve IP addresses. So if you need to work with 3rd parties that use whitelisted IPs - too bad!?
PS: I did contemplate breaking it into 3 separate questions. But given that they are tightly coupled it seems this would be a more helpful approach for future readers.
Diagnostic store is used for storing cache diagnostic data which includes - server logs, crash dumps, counter data etc. which can be automatically uploaded to Azure Storage by configuring the cache diagnostics (CacheDiagnostics.ConfigureDiagnostics call in OnStart method - without this call, data is generated on local VM but not uplaoded into Azure Storage ). And the amount of data that is collected is controlled by diagnostic level (higher the level, more data is collected) which can be changed dynamically. More details on cache diagnostics is avialble at: http://msdn.microsoft.com/en-us/library/windowsazure/hh914135.aspx
Since you enabled cache, it will come with default diagnostic level that should help in diagnosing cache issues if they happen. This data is stored locally unless you call the ConfigureDiagnostics method in OnStart (which uploads the data to Azure storage).
If a lower storage value is provided (say 2GB), then higher diagnostic levels cannot be used since they need more space (crash dump itself can take upwards 12GB for XL VMs). And if you want higher levels, then you might want to upgrade the deployment with change in the diagnostic store size which defeats the purpose - change diagnostic level without redeployment/upgrade/update/restarts. That is the reason why a limit of 20GB is set to cater to all diagnostic levels (and they can be changed in a running deployment with cscfg change).
is answered above.
Hope this helps.
I'll answer question #3 - local storage decreases are one of the only deployment changes that can't be done in-place (increases are fine, as well as VM size changes and several other changes now possible without redeploy). See this post for details around in-place updates.

What happens to Azure diagnostic information when a role stops?

When an Azure worker role stops (either because of an unhandled exception or because Run() finishes), what happens to local diagnostic information that has not yet been transferred? Microsoft documentation says diagnostics are transferred to storage at scheduled intervals or on demand, neither of which can cover an unhandled exception. Does this mean diagnostic information is always lost in this case? This seems particularly odd because crash dumps are part of the diagnostic data (set up by default in DiagnosticMonitorConfiguration.Directories). How then can you ever get a crash dump back (related to this question)?
To me it would be logical if diagnostics were also transferred when a role terminates, but this is not my experience.
It depends on what you mean by 'role stops'. The Diagnostic Monitor in SDK 1.3 and later is implemented as a background task that has no dependency on the RoleEntryPoint. So, if you mean your RoleEntryPoint is reporting itself as unhealthy or something like that, then your DiagnosticMonitor (DM) will still be responsive and will send data according to the configuration you have setup.
However, if you mean that a role stop is a scale down operation (shutting down the VM), then no, there is no flush of the data on disk. At that point, the VM is shutdown and the DM with it. Anything not already flushed (transferred) can be considered lost.
If you are only rebooting the VM, then in theory you will be connected back to the same resource VHDs that hold the buffered diagnostics data so you would not lose it, it would be transferred on next request. I am pretty sure that sticky storage is enabled on it, so it won't be cleaned on reboot.
HTH.
The diagnostic data is stored locally before it is transferred to storage. So that information is available to you there; you can review/verify this by using RDP to check it out.
I honestly have not tested to see if it gets transferred after the role stops. However, you can request transfers on demand. So using that approach, you could request the logs/dumps to be transferred one more time after the role has stopped.
I would suggest checking out a tool like Cerebrata Azure Diagnostics Manager to request on demand transfer of your logs, and also analyze the data.
I answered your other question as well. Part of my answer was to add the event that would allow you to change your logging and transfer settings on the fly.
Hope this helps
I think it works like this: local diagnostic data is stored in the local storage named "DiagnosticStore", which I guess has cleanOnRoleRecycle set to false. (I don't know how to verify this last bit - LocalResource has no corresponding attribute.) When the role is recycled that data remains in place and will eventually be uploaded by the new diagnostic monitor (assuming the role doesn't keep crashing before it can finish).

Resources