Querying Azure Diagnostic table storages - azure

We are storing our Windows/Linux VM metrics and logs into Azure diagnostics storage account for long term retention. We keep this data in Log Analytics as well but being cost conscious we keep only the minimal essential set and for 1 month. However it seems there is no way to efficiently query the Table storage data when we need it - e.g. checking historical cpu usage for a particular machine over a specific period in the past, or checking the logs captured during that period. The partition key and row key is highly convoluted with some very basic help available for the WAD tables schema while none exist for LinuxsyslogVer2v0 table schema. I was curious if anyone else using the diagnostic logs table storage for any querying/reporting? If so how do you query these for a specific host and time period? I can do a querying using non primary/row key but besides being time consuming it will cost a hell eventually considering that will be a table scan. Really appreciate any advice.

You should consider using Azure Data Explorer (ADX) for your long-term storage solution. It allows for KQL queries on your long-term data and is the preferred method for keeping log/security data past the default for services like LogA and Sentinel.
The pricing page for ADX can be a bit confusing and there is a website to help you estimate costs here: https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
By default, logs ingested into Azure Sentinel are stored in Azure Monitor Log Analytics. This article explains how to reduce retention costs in Azure Sentinel by sending them to Azure Data Explorer for long-term retention.
Storing logs in Azure Data Explorer reduces costs while retains your ability to query your data, and is especially useful as your data grows. For example, while security data may lose value over time, you may be required to retain logs for regulatory requirements or to run periodic investigations on older data.
https://learn.microsoft.com/en-us/azure/sentinel/store-logs-in-azure-data-explorer?tabs=adx-event-hub

Related

What all logs/Metrics should be enabled as part of the Diagnostic settings enablement for Azure Storage Accounts

As part of a client requirement, I've been asked to set up central log repository for different Azure workloads including Storage accounts & databases. I see a default diagnostic setting in place but all of those are disabled. To enable these, we need to enable certain logs/metrics which will further be ingested into the workspace. Now I want to make a cost-effective & most accurate selection of the logs/metrics for storage accounts. Can someone with more profound knowledge into this domain enlighten me about it?
Similarly for Postgre SQL & Cosmos DB databases too, I have to make such decision. Please help me with this.
Please check the below points and references in detail.
Selection:
You can select the logs for the operations that you want to Get all the details you wish for.selection depends on the requirement.
A good practice is to go through your agents and monitoring settings
and see exactly what you are logging. Capture logs which are
important for your monitoring purpose.
Choose the cheapest region to create and store your log analytics
workspace.
If you have very high volume of the log ingestion then it would be
prudent to opt for azure commitment tier.
In case you need to export the log analytics data, rather than
exporting all the data, you can filter it and send only relevant log
data
Above things can significantly reduce your azure billing cost and help you to save money in using azure monitor effectively. Understand Azure Monitor and Log Analytics Pricing and Cost Optimization (azurelib.com)
Storing:
Log data can accumulate in your account over time which can increase the cost of storage.
If you need log data for only a small period of time, you can reduce
your costs by modifying the log data retention period to less days.
Use lifecycle policy to move data between access tiers.
Data ingested into Log Analytics workspace can be retained at no
additional charge(free) up to the first 31 days.
See
Design considerations and change the data retention if
not needed more than that. See Monitoring Azure Blob Storage
| Microsoft Docs.
Storage Insights is a dashboard on top of Azure Storage metrics and
logs. You can use Storage Insights to examine the transaction volume
and used capacity of all your accounts. That information can help
you decide which accounts you might want to retire.
Analyze:
Analyze the used capacity and monitor the use of the container.
you can consider reducing the total cost by exporting logs to
storage account, and then using a serverless query solution on top
of log data.See blob storage monitoring/optimize cost for infrequent
queries
Organize data into access tiers.Log Analytics has Commitment Tiers,
which can save you as much as 30 percent compared to the
Pay-As-You-Go price.
You should periodically review this information to determine if you
can reduce your charges by moving to another tier
References:
Plan and manage costs for Azure Blob Storage | Microsoft Docs
Azure Monitor Logs pricing details - Azure Monitor | Microsoft Docs
Azure Monitor Log Analytics too Expensive? Part 2 - Save Some Money
| Thomas Stringer (trstringer.com)

Application Insight Analytics on long time-period

I know that Application Insights store data for only some period of time.
What do I need to do if I want to do analysis on a long time-period, let's say a year?
I know we have continuous export thing and as suggested, we can use power BI on the stored data in BLOBs but that will have a cost associated with it. Another way is code and transform that JSON data in BLOBs to some excel representations.
Is there any other way apart from these two for doing analysis in long time-periods of AI data? Something that picks data stored in BLOBs and uses that to show analytics?
There are two things being developed - ability to specify different retention period (1 year will be more expensive) and ability to do analytics query on top of blobs. Unfortunately, both of them are not yet available.
Will be providing updates to this answer.
Update: It is possible to specify a retention for Application Insights resources.

Expensive use of storage account from Azure Functions

I'm running a single Azure Function on the consumption plan. I've picked the consumption plan for the serverless feature as well as to minimize cost. The function consumes messages from a service bus topic and writes some output to blob storage.
Keeping the function running for the last 30 days is around $10. That's very acceptable, since the function has a lot of messages to consume. Writing the output to blob storage is around $20. Also acceptable. What I don't understand is, that the charge for the Function's underlying storage account is around $70 for the same period. The consumption is primarily hitting File Write Operation Units and File Protocol Operation Units. The storage account is created as locally redundant general purpose v1.
Anyone able to explain what's going on here? When looking at the storage account, there's a few blobs. I believe the problem is with tables storage. When inspecting the storage account, there are tables looking like this:
$MetricsCapacityBlob
$MetricsHourPrimaryTransactionBlob
AzureWebJobsHostLogs201804
I've disabled logging in my function, by removing the AzureWebJobsDashboard app setting. After doing so, the AzureWebJobsHostLogs* tables no longer seems to receive new rows. But the $Metrics* tables still receive new data. I have no clue if writes to these tables are causing all of the file write activity I see in the Costs Management view in the Portal, though.
What's going on here? Is maintaining these tables from serverless code really required and does it sound normal that the price for table access is x7 the price of the function itself?
You should go to Metrics in Azure Portal for this storage account and check the patterns of how the File storage transactions are consumed. If it's consistently high, it's something with your application (e.g. too much logging to file).
In my case, it appears to be a bug in Azure Functions, and I filed a bug here.
The function starts consuming tens of thousands of read and write transactions after any code change, however minor. So basically each code change or deployment costs me perhaps around $0.20, and it could be more in your case.
This is easy to see in the Metrics diagram because it looks like a huge spike in transactions.
So the solution is: don't write logs to the filesystem and don't deploy often.
It is interesting and unusual that your storage cost is so much higher. I think the dashboard logging is a likely culprit, so it would be good understand if you see a drop over the next few days with it turned off.
I would spend a bit more time in the cost analysis section of the Azure Portal to see if you can get more details about exactly which aspect of your storage usage is driving the majority of the cost. i.e. is it about table operations, blob operations, etc. This screenshot shows the Cost History view with a breakdown per meter. Note the tooltip in this screenshot:
The $Metrics tables are not written by Azure Functions, they are generated by Azure Storage itself. I would be surprised if these metrics were contributing significantly to your overall cost. But if you want to experiment, I think you can disable them through this UX:
To give you a baseline on what sort of ratio of storage costs to functions execution cost is expected, you might want to take a look at the cost write up I did in this blog post:
https://blogs.msdn.microsoft.com/appserviceteam/2017/09/19/processing-100000-events-per-second-on-azure-functions/
You'll notice that the storage costs were less than functions, and that includes a significant number of storage operations due to event hubs processing requiring checkpoints written to storage. I'll note that these tests were run with dashboard logging off (again making me suspect that as the main cost driver). So no, it is NOT normal for your storage costs to be 7x your functions cost!

Allow customer to only see logging information

We run a software application on azure for one of our customers. The customer want to see the performance of the systems. This consist of two parts. One is the metric information of the servers and they also want to see some information I want to provide by custom logging.
My plan is to give the customer access to the portal and only allow him access to the metric information and the custom tables.
It seems to me that by assigning a role to the customer I should be able to block all the other possibilities.
Does someone can me tell which actions I have to allow/forbid to achieve this? Or were I can find the information for this?
Solution #1
Instead of giving Read access to the virtual machine which may breaks security policy, I'd recommend to go with Azure Log Analytics (ref: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-overview
) workspace. That said, you will need to create a workspace which collects and stores server metrics (ref: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-quick-collect-windows-computer) and other custom metrics.
Your customer will be given access to the workspace only which he can see all metrics in a dashboard. If there is a need for log filtering, you can use Log Analytics query language (ref: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-log-search-transition)
Log Analytics is a paid service. You are given free up to 10 workspaces per subscription. The workspace is considered an Azure resource so the limit follows by subscription limit, which means you can create up to 800 workspaces per a resource group. A subscription can allow 800 * 800 (for reference if you would like to do capacity planning for your workspace-based solution). For Log Analytics pricing, read here (https://azure.microsoft.com/en-us/pricing/details/log-analytics/).
Log Analytics is a good choice as its value proportion is to offer your customer intuitive dashboard to monitor their virtual machine performance, and to offer Near Real Time monitoring. And this solution is a cloud native compatibility.
There is a management solution which offers a bundle of VM capacity and performance monitoring which you can try now https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-capacity
Solution #2
Log Analytics might not be your choice because it might add more Azure service and operational cost. If you need a cheaper cost, you would need to collect your virtual machine by Performance Counter which is a built-in feature in Windows OS. With Performance Counter you can export to Excel file, or visualize into Power BI or some custom chart.
Other Solutions
You can utilize Azure Monitor and API to get data, For example, this API https://learn.microsoft.com/en-us/rest/api/monitor/metricdefinitions/list. You would certainly need to visualize or format in some intuitive way to satisfy your customer. It can be a custom front-end web, or Power BI or even Excel with chart.
You can just query to Azure Blob Storage and use Stream Analytics combining with Power BI to visualize your data (https://thuansoldier.net/?p=7187).
There is not a single solution. This really depends on your existing resource capacity, financial stuff or so on.

Microsoft Azure DocumentDB vs Azure Table Storage

For several recent years, Microsoft offers a "NoSQL" key/value storage, called "Table Storage" (http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/)
Table Storage offers a high performance, scalability (via partitioning) and relatively low cost. A primary drawback of Tables that only Partition and Row keys can be indexed - so making queries on values is very inefficient.
Recently Microsoft announced a new "NoSQL" service, called "DocumentDB" (http://azure.microsoft.com/en-us/documentation/services/documentdb/)
Instead of storing a list of properties (like Tables do), DocumentDB stores JSON objects. The whole object being indexed - so efficient queries may be created based on every property and any nested property of stored objects.
Microsoft says that DocumentDB provides high performance and scalability as well.
If that's so - why anyone would use Table Storage over DocumentDB? It sounds like DocumentDB provides the same functionality as Tables, but with additional capabilities such as the ability to index anything.
I will glad if someone could make a comparison between DocumentDB and Table Storage, highlighting cons and pros of each one.
Both are NoSQL technologies, but they are massively different. Azure Tables is a simple Key/Value store and does not support complex functionality like complex queries (most of them will require a full partition/table scan anyway, which will kill your performance and your cost savings), custom indexing (indexing is based on PartitionKey and RowKey only, you currently can't index on any other entity property and searching for anything other than PartitionKey/RowKey combination will require a partition/table scan), or stored procedures. You also can't batch read requests for multiple entities (through batch write requests are supported if all the entities belong to the same partition). For a real-life application of Azure Tables, see HERE.
If your data needs (particularly around querying them) are simple (like in the example above), then Azure Tables provide what you need, you might end up using that in favor of DocDB due to pricing, performance and storage capacity. For example, Azure Tables performance target is 20.000 operations per second. Trying to get that same level of performance on DocDB will have a significantly higher service cost for you. Also, Azure tables are limited by the capacity of your Azure storage account (500TB), whereas DocDB storage is limited by the capacity units you buy.
Table Services is mainly a key-value type NOSQL and DocumentDB is (as the name suggests) a Document Type NoSQL store. What you are asking is essentially the difference between these two types of NOSQL approaches. If you shape your research according to this you should be able to get a better understanding for sure.
Just to keep things simple I suggest you consider the differences between how DocumentDB and Table Services are priced. Not only the cost of these services vary a lot from each other but the fact that DocumentDB works on a "provision first" model and Table Services are offered on a pure consumption based pricing might give you some clues on your compare/contrast.
Let me ask you this; why would I use DocumentDB if the features in Table Services well serve my needs? ;) I suggest you to take a look at how the current Azure Diagnostics tooling use Azure Storage Services, how Storage Metrics use Azure Storage on itself to get a sense of how useful Table Services would be and how overkill DocumentDB might be in some situations.
Hope this helps.
I think that the comparison is all about trading price for performance. Table Services are just Storage Services, which seem to cap out at 20,000 ops/second, but paying for that kind of throughput all the time (because Storage gives it to us all the time) is $1,200/month. Crazy money.
Table services have simple indexes, so queries are very limited. Good for anything that is written and read via IDs. DocumentDB indexes the entire document, so a query can be done on any property.
And lastly, Table services are bound by the storage constraint of the Storage account it's on (which could get crazy high given negotiation with Microsoft directly), where DocumentDB storage seems unlimited.
So it's a balance. Do you have a LOT of data (hundreds of gigs, or terabytes) that you need in one place? DocumentDB. Do you need to support complex queries? DocumentDB. Do you have data that needs to come and go fast, but based on a 1-to-2 property lookup? Table services. Would you trade having to code around a simple index in order to avoid paying through the nose for throughput? Table services.
And Redis, someone mentioned that... man, I dunno. Even the existence of persistence in a caching framework (which Redis offers) doesn't turn it into a tech of choice... There is a huge difference between a persistent store that holds data that is "often used, but may be missing or time-retired", like a cache would, and a persistent store that guarantees your data to be there.
A real life example:
I have to store some tokens, retrieve them, delete them. Only query ever done will be based on User ID.
So I use Table Storage, as it fulfill my requirement perfectly. I save the token against User ID.
Document DB seemed to be overkill for this.
Here is the answer from microsoft's official docs
Common attributes of Cosmos DB, Azure Table Storage, and Azure SQL Database:
99.99 availability SLA
Fully managed database services
ISO 27001, HIPAA and EU Model Clauses Compliant
The following table shows the uncommon attributes of Azure Cosmos DB,
Azure Table Storage

Resources