How to store audit data in Azure - azure

We're in the design phase for building an audit trail in an existing web application. The application runs on Windows Azure and uses a SQL Azure database.
The audit logs must be filtered by user, or by object type (eg. show all action of a user, or show all actions that are performed on a object).
We have to choose how to store the data, should we use SQL Azure, or should we use table storage? We prefer table storage (cheaper)..
however the 'problem' with table storage is how to define the partition key. We have several thousand customers (the appplication users) in our SQL database, each in their own tenant. Using the tenant ID as partition key is not specific enough, so we have to add something to the partition key. So there's the issue: given the requirements for filtering, we can add a user ID to the partition key to make filtering by user easy, or we can add an object ID to make filtering by object easy.
So we see two possible solutions:
- use SQL Azure instead of table storage
- use table storage and use two tables with different partition keys, which means we duplicate all entries
Any ideas what's the best approach for our situation? Are there other, better solutions?

DocumentDB on Azure might be worth considering.
https://azure.microsoft.com/en-us/documentation/articles/documentdb-use-cases/
You can have audit trail stored in DocDB as JSON documents (user, activity, object fields and can index on all fields )

Azure Table Storage is appropriate to store log data. As Azure App services use Azure Table Storage to store the diagnosis logs.
In think you can consider to set the PartitionKey as your user's tenant name, and the RowKey is the user's ID. As according the Table Storage Data Model, we only need to keep:
Together the PartitionKey and RowKey uniquely identify every entity within a table
Alternatively, you can clarify your concern about:
Using the tenant ID as partition key is not specific enough, so we have to add something to the partition key
Additionally, you can refer https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/#overview for more info about design Azure Table Storage.
Any update, please feel free to let me know.

If you're worried about filtering in multiple ways - you could always write the same data to multiple partitions. It works really well. For example, in our app we have Staff and Customers. When there is an interaction we want to track/trace that applied to both of them (perhaps an over the phone Purchase), we will write the same information (typically as json) to our audit tables.
{
PurchaseId: 9485,
CustomerId: 138,
StaffId: 509,
ProductId: 707958,
Quantity: 20
Price: 31.99,
Date: '2017-08-15 15:48:39'
}
And we will write that same row to the following partitions: Product_707958, Customer_138, Staff_509. The row key is the same across the three rows in each partition: Purchase_9485. Now if I want to go and query everything that has happened for a given staff, customer, or item, I just grab the whole partition. The storage is dirt cheap, so who cares if you write it to multiple places?
Also, an idea for you considering you have multiple tenants - you could make the table name Tenant_[SomeId]. There are some other issues you might have to deal with, but it is in a sense another key to get at schema-less data.

Related

Azure Log Analytics Workspace and GDPR

Is there any way to purge/mask data in a Log Analytics workspace with regular expressions or similar, to be able to remove sensitive data that has been sent to the workspace?
Like social security numbers, that is a part of an URL?
As per this Microsoft Document, Log Analytics is a flexible store, which while prescribing a schema to your data, allows you to override every field with custom values. we can Mask the data in the Log Analytics workspace and here are a few approaches where we can set a few strategies for handling personal data
Where possible, stop the collection of, obfuscate, anonymize, or otherwise adjust the data being collected to exclude it from being considered "private". This is by far the preferred approach, saving you the need to create a very costly and impactful data handling strategy.
Where not possible, attempt to normalize the data to reduce the impact on the data platform and performance. For example, instead of logging an explicit User ID, create a lookup data that will correlate the username and their details to an internal ID that can then be logged elsewhere. That way, should one of your users ask you to delete their personal information, it is possible that only deleting the row in the lookup table corresponding to the user will be sufficient.
Finally, if private data must be collected, build a process around the purge API path and the existing query API path to meet any obligations you may have around exporting and deleting any private data associated with a user.
Here is the KQL query for verifying the private data in log analytics
search *
| where * matches regex #'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b' //RegEx originally provided on https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp
| summarize count() by $table

Setting PermissionModes in Azure CosmosDB

Hi I working on a simple application using Azure CosmosDB. Now I want to use resource tokens to provide specific access to documents and collection in the DB. In the permission modes there are PermissionMode.Read and PermisssionMode.All. So I am assuming that PermissionMode.All allows the users to read, write, delete and post. If what I am assuming is correct, I specifically do not want my users to delete or post in a certain collection. How do I achieve this?
For better understanding, my database contains a container called users, which contains user information along with their posts and likes per post and stuff. Now I allow all my users to read (view posts of other users) and write (give a like or increment the like field), but I want to allow Post and Delete to a document to only the user of the document.
The finest granularity for assigning permissions is a partition key value so the only way to grant per document permissions is if your document id is also the partition key. If your partition key is userId and the user profile and posts, etc. all share that same partition key then that should work for you. Here is a sample that creates a permission on a partition key for a user.

Row level access control in snowflake

I have a customer that owns a carpet cleaning business and we have all of his different franchisee's data in a multi-tenant database model and we would like to move this data into a data warehouse in snowflake. I don't want to have to build a separate database for each customer because then I have to keep each database up to date with the latest data model. I want to use 1 data model to rule them all. I have a tenant ID that I keep with each record to identify the franchisee's data. I want to give a set of credentials to each franchisee to where they can hook up their analytics tool of choice (tableau, power bi, etc.) and only get access to the rows that are applicable to them. Is there a way to secure the rows they see in each table based on their user. In other words some sort of row level access control similar to profiles in postgres. Are there any better methods for handling this type of scenario? Ultimately I want to maintain and manage the least number of elt jobs and data models.
This is the purpose of ether Secure Views, or Reader Accounts.
We are using both, and they have about the same technical hassle/setup costs. But we are using an internal tool to build/alter the schema's.
To expand on Simeon's answer:
You could have a single Snowflake account and create a Snowflake role & user for each franchisee. These roles would have access to a Secure View which uses the CURRENT_ROLE / CURRENT_USER context functions as in this example from the Snowflake documentation.
You'd have to have a role -> tennant ID "mapping table" which is used in the Secure View to limit the rows down to the correct franchisee.

How to do a transaction with two collections on Azure CosmosDB

I have a question about transaction on Azure CosmosDB (with SQL API) where I have to edit two items on collection A and create other item on collection B.
I’m developing a project where there are exchange of points. So I have users with status of their points and I have to save the transaction for history. On a SQL database, I start a transactions, do the operation of update user A balance, update user B balance and create a new row on tickets with all details of operation and if it is good, end the transaction. But I have dudes about how to do on Azure CosmosDB. I have a collection where there are users with their balance, and other collection with tickets. So, with stored procedure, I can update user A and user B because both are on the same collection and if some of them fail to update, the collection isn’t edited (as transaction of SQL database), but, how to create the new ticket? Because If I have understood the documentation correctly, it works on same collection, but it doesn’t work with different collections.
Could you give me some advice about how to do it in order to do the 3 operations on the same transactions?
CosmosDB only supports transactions within the same partition.
This means that you cannot have cross containers transactions and you also cannot have transaction between 2 documents with different partition key values.

Generate consecutive number in CosmoDB

I'm using CosmoDB to store my data. The ID is a GUID. There is a new requirement to display a consecutive number. How to achive that in a document based db? I want to keep GUID as "id" and have another unique field "display_id". The application is running in a "App Service". I do not want to run a SQL-Server with a "serial table".
There's no such facility built-in to provide you with increasing numbers. That will be up to you to manage in your app layer (or potentially in a Cosmos DB stored procedure).
Also: There is nothing specific to document databases and increasing numbers. It's simply something that isn't offered via the database engine.

Resources