Azure Table Storage: Capture all table activity for compliance purposes

Azure Table Storage: Capture all table activity for compliance purposes - azure

I need to capture all inserts/updates/deletes in Azure Table Storage for compliance purposes. How is this accomplished? I'm looking for code samples and/or documentation. I know there is Change Feed support for blobs (https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-change-feed?tabs=azure-portal), which is still in preview. Anything similar for tables?

Table storage does not provide any change feed or similar. If you need that, you could switch to "Premium Tables" which is basically Table API on Cosmos DB - which does provide things like change feed. Of course, this does come at a higher price point.
https://learn.microsoft.com/en-us/azure/cosmos-db/table-introduction

If you're desperate you can try Azure Storage analytics logging. Important caveat:
Requests are logged on a best-effort basis. This means that most requests will result in a log record, but the completeness and timeliness of Storage Analytics logs are not guaranteed.
As such, it doesn't solve your compliance problem, but it might help someone else.

Related

Querying Azure Diagnostic table storages

We are storing our Windows/Linux VM metrics and logs into Azure diagnostics storage account for long term retention. We keep this data in Log Analytics as well but being cost conscious we keep only the minimal essential set and for 1 month. However it seems there is no way to efficiently query the Table storage data when we need it - e.g. checking historical cpu usage for a particular machine over a specific period in the past, or checking the logs captured during that period. The partition key and row key is highly convoluted with some very basic help available for the WAD tables schema while none exist for LinuxsyslogVer2v0 table schema. I was curious if anyone else using the diagnostic logs table storage for any querying/reporting? If so how do you query these for a specific host and time period? I can do a querying using non primary/row key but besides being time consuming it will cost a hell eventually considering that will be a table scan. Really appreciate any advice.

You should consider using Azure Data Explorer (ADX) for your long-term storage solution. It allows for KQL queries on your long-term data and is the preferred method for keeping log/security data past the default for services like LogA and Sentinel.
The pricing page for ADX can be a bit confusing and there is a website to help you estimate costs here: https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
By default, logs ingested into Azure Sentinel are stored in Azure Monitor Log Analytics. This article explains how to reduce retention costs in Azure Sentinel by sending them to Azure Data Explorer for long-term retention.
Storing logs in Azure Data Explorer reduces costs while retains your ability to query your data, and is especially useful as your data grows. For example, while security data may lose value over time, you may be required to retain logs for regulatory requirements or to run periodic investigations on older data.
https://learn.microsoft.com/en-us/azure/sentinel/store-logs-in-azure-data-explorer?tabs=adx-event-hub

Expensive use of storage account from Azure Functions

I'm running a single Azure Function on the consumption plan. I've picked the consumption plan for the serverless feature as well as to minimize cost. The function consumes messages from a service bus topic and writes some output to blob storage.
Keeping the function running for the last 30 days is around $10. That's very acceptable, since the function has a lot of messages to consume. Writing the output to blob storage is around $20. Also acceptable. What I don't understand is, that the charge for the Function's underlying storage account is around $70 for the same period. The consumption is primarily hitting File Write Operation Units and File Protocol Operation Units. The storage account is created as locally redundant general purpose v1.
Anyone able to explain what's going on here? When looking at the storage account, there's a few blobs. I believe the problem is with tables storage. When inspecting the storage account, there are tables looking like this:
$MetricsCapacityBlob
$MetricsHourPrimaryTransactionBlob
AzureWebJobsHostLogs201804
I've disabled logging in my function, by removing the AzureWebJobsDashboard app setting. After doing so, the AzureWebJobsHostLogs* tables no longer seems to receive new rows. But the $Metrics* tables still receive new data. I have no clue if writes to these tables are causing all of the file write activity I see in the Costs Management view in the Portal, though.
What's going on here? Is maintaining these tables from serverless code really required and does it sound normal that the price for table access is x7 the price of the function itself?

You should go to Metrics in Azure Portal for this storage account and check the patterns of how the File storage transactions are consumed. If it's consistently high, it's something with your application (e.g. too much logging to file).
In my case, it appears to be a bug in Azure Functions, and I filed a bug here.
The function starts consuming tens of thousands of read and write transactions after any code change, however minor. So basically each code change or deployment costs me perhaps around $0.20, and it could be more in your case.
This is easy to see in the Metrics diagram because it looks like a huge spike in transactions.
So the solution is: don't write logs to the filesystem and don't deploy often.

It is interesting and unusual that your storage cost is so much higher. I think the dashboard logging is a likely culprit, so it would be good understand if you see a drop over the next few days with it turned off.
I would spend a bit more time in the cost analysis section of the Azure Portal to see if you can get more details about exactly which aspect of your storage usage is driving the majority of the cost. i.e. is it about table operations, blob operations, etc. This screenshot shows the Cost History view with a breakdown per meter. Note the tooltip in this screenshot:
The $Metrics tables are not written by Azure Functions, they are generated by Azure Storage itself. I would be surprised if these metrics were contributing significantly to your overall cost. But if you want to experiment, I think you can disable them through this UX:
To give you a baseline on what sort of ratio of storage costs to functions execution cost is expected, you might want to take a look at the cost write up I did in this blog post:
https://blogs.msdn.microsoft.com/appserviceteam/2017/09/19/processing-100000-events-per-second-on-azure-functions/
You'll notice that the storage costs were less than functions, and that includes a significant number of storage operations due to event hubs processing requiring checkpoints written to storage. I'll note that these tests were run with dashboard logging off (again making me suspect that as the main cost driver). So no, it is NOT normal for your storage costs to be 7x your functions cost!

Monitor the amount of blobs entering into an Azure container

Basically I have a storage account with a containers that contain blobs of unhandled errors. My task is to somehow generate a metric that will be able to show how many blobs were uploaded to that container every hour. I tried using the Azure built in metrics, but it seems like that might limit me to the entire storage account and not just one container. I did some research on Power BI and thought that might be a good place to start, but again I came up empty.
If anyone has a good starting place for me, that would be incredible. I'm assuming that this will end up being something that requires some SQL queries, or perhaps something I can do programatically in Visual Studio. Apologies if this was posted in the wrong place, but it seemed like the best fit from my opinion.
Thanks!

You should take a look at Azure Event Grid with Blob Storage Integration. In short, whenever a blob is created, an event will be raised by Azure Event Grid. You can consume this event and post the event data to an HTTP endpoint (or call an Azure Function) which can save this information about this event in some persistent storage (Azure Tables for example). You can then create reports by querying this data.
For more information about this, you may find this link helpful: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-event-overview.

Azure Storage custom audit and logs

I'm writing a small app that reads and writes from Azure Blob Storage (Images, documents, etc.)
I need to implement some logging that will log activities such as:
file uploaded
File deleted
File updates
etc.
So, basically I need my log to look something like this:
User John Doe Create a container "containerName" on 2016-05-05
User Mike Smith removed a blob test.jpg
etc...
UserIds and other additional info will be sent through method.
Example: CreateImage(String CreatedBy)
Question:
What is the best way to store and create such type of logs? The easiest one is to have SQL database with table Audit and all necessary columns. But I know that Azure has Azure Diagnostics. Can that be used to store and query logs? For example, I will need to see all file manipulations by user, by date, etc.

I would go using one of these ways:
1) Azure Storage Tables for logs. Here, you may store everything you need regarding logs. Then, if you need a functionality to get/filter/etc, you may look into LINQ to Azure Tables or even LINQPad if you need the desktop-ready software. However, some design considerations should be taken into account - design guidance is here.
2) Application Insights. Using custom events functionality, you may go with the powerful logging and then, on the portal, see how it is going. You may attach some metadata to the custom event, and then aggregate/filter/see that using convenient web interface. Or connect log4net to the AI, if you want to stream logs to the AI. AI may export its logs into the Azure Storage continuously, so you may take that and dive into it later.
IMHO, i would not say that SQL Database is the appropriate store for logs - it looks like too much (in terms of resources, maybe price, etc) for me for storing the logs in the full-fledged DB. Not very relevant, but interesting reading about working with a lot of records.

Microsoft Azure DocumentDB vs Azure Table Storage

For several recent years, Microsoft offers a "NoSQL" key/value storage, called "Table Storage" (http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-tables/)
Table Storage offers a high performance, scalability (via partitioning) and relatively low cost. A primary drawback of Tables that only Partition and Row keys can be indexed - so making queries on values is very inefficient.
Recently Microsoft announced a new "NoSQL" service, called "DocumentDB" (http://azure.microsoft.com/en-us/documentation/services/documentdb/)
Instead of storing a list of properties (like Tables do), DocumentDB stores JSON objects. The whole object being indexed - so efficient queries may be created based on every property and any nested property of stored objects.
Microsoft says that DocumentDB provides high performance and scalability as well.
If that's so - why anyone would use Table Storage over DocumentDB? It sounds like DocumentDB provides the same functionality as Tables, but with additional capabilities such as the ability to index anything.
I will glad if someone could make a comparison between DocumentDB and Table Storage, highlighting cons and pros of each one.

Both are NoSQL technologies, but they are massively different. Azure Tables is a simple Key/Value store and does not support complex functionality like complex queries (most of them will require a full partition/table scan anyway, which will kill your performance and your cost savings), custom indexing (indexing is based on PartitionKey and RowKey only, you currently can't index on any other entity property and searching for anything other than PartitionKey/RowKey combination will require a partition/table scan), or stored procedures. You also can't batch read requests for multiple entities (through batch write requests are supported if all the entities belong to the same partition). For a real-life application of Azure Tables, see HERE.
If your data needs (particularly around querying them) are simple (like in the example above), then Azure Tables provide what you need, you might end up using that in favor of DocDB due to pricing, performance and storage capacity. For example, Azure Tables performance target is 20.000 operations per second. Trying to get that same level of performance on DocDB will have a significantly higher service cost for you. Also, Azure tables are limited by the capacity of your Azure storage account (500TB), whereas DocDB storage is limited by the capacity units you buy.

Table Services is mainly a key-value type NOSQL and DocumentDB is (as the name suggests) a Document Type NoSQL store. What you are asking is essentially the difference between these two types of NOSQL approaches. If you shape your research according to this you should be able to get a better understanding for sure.
Just to keep things simple I suggest you consider the differences between how DocumentDB and Table Services are priced. Not only the cost of these services vary a lot from each other but the fact that DocumentDB works on a "provision first" model and Table Services are offered on a pure consumption based pricing might give you some clues on your compare/contrast.
Let me ask you this; why would I use DocumentDB if the features in Table Services well serve my needs? ;) I suggest you to take a look at how the current Azure Diagnostics tooling use Azure Storage Services, how Storage Metrics use Azure Storage on itself to get a sense of how useful Table Services would be and how overkill DocumentDB might be in some situations.
Hope this helps.

I think that the comparison is all about trading price for performance. Table Services are just Storage Services, which seem to cap out at 20,000 ops/second, but paying for that kind of throughput all the time (because Storage gives it to us all the time) is $1,200/month. Crazy money.
Table services have simple indexes, so queries are very limited. Good for anything that is written and read via IDs. DocumentDB indexes the entire document, so a query can be done on any property.
And lastly, Table services are bound by the storage constraint of the Storage account it's on (which could get crazy high given negotiation with Microsoft directly), where DocumentDB storage seems unlimited.
So it's a balance. Do you have a LOT of data (hundreds of gigs, or terabytes) that you need in one place? DocumentDB. Do you need to support complex queries? DocumentDB. Do you have data that needs to come and go fast, but based on a 1-to-2 property lookup? Table services. Would you trade having to code around a simple index in order to avoid paying through the nose for throughput? Table services.
And Redis, someone mentioned that... man, I dunno. Even the existence of persistence in a caching framework (which Redis offers) doesn't turn it into a tech of choice... There is a huge difference between a persistent store that holds data that is "often used, but may be missing or time-retired", like a cache would, and a persistent store that guarantees your data to be there.

A real life example:
I have to store some tokens, retrieve them, delete them. Only query ever done will be based on User ID.
So I use Table Storage, as it fulfill my requirement perfectly. I save the token against User ID.
Document DB seemed to be overkill for this.

Here is the answer from microsoft's official docs
Common attributes of Cosmos DB, Azure Table Storage, and Azure SQL Database:
99.99 availability SLA
Fully managed database services
ISO 27001, HIPAA and EU Model Clauses Compliant
The following table shows the uncommon attributes of Azure Cosmos DB,
Azure Table Storage

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string