Azure Log Analytics Workspace and GDPR - azure

Is there any way to purge/mask data in a Log Analytics workspace with regular expressions or similar, to be able to remove sensitive data that has been sent to the workspace?
Like social security numbers, that is a part of an URL?

As per this Microsoft Document, Log Analytics is a flexible store, which while prescribing a schema to your data, allows you to override every field with custom values. we can Mask the data in the Log Analytics workspace and here are a few approaches where we can set a few strategies for handling personal data
Where possible, stop the collection of, obfuscate, anonymize, or otherwise adjust the data being collected to exclude it from being considered "private". This is by far the preferred approach, saving you the need to create a very costly and impactful data handling strategy.
Where not possible, attempt to normalize the data to reduce the impact on the data platform and performance. For example, instead of logging an explicit User ID, create a lookup data that will correlate the username and their details to an internal ID that can then be logged elsewhere. That way, should one of your users ask you to delete their personal information, it is possible that only deleting the row in the lookup table corresponding to the user will be sufficient.
Finally, if private data must be collected, build a process around the purge API path and the existing query API path to meet any obligations you may have around exporting and deleting any private data associated with a user.
Here is the KQL query for verifying the private data in log analytics
search *
| where * matches regex #'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b' //RegEx originally provided on https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp
| summarize count() by $table

Related

What is the best way to handle sensitive data in PowerApps and Lists?

I have only been using PowerApps for about a year now and I've received a request to create an application that compares tables based off of an existing Access database. The big concern is that the existing database has sensitive data in its inputs (specifically, credit card numbers). We do not currently have Dataverse, so I've been using SharePoint Lists for my tables, but my experience has been that we have to give full read/write access to each table to each app user, so simply not displaying the sensitive data is insufficient for security purposes. How can I protect the data but still allow access to the tables in the application?
Here is a scenario may help you have a reference:
User A can see an item in the list, but user B cannot see it in the list.
(1) The item has unique access to user A.
(2) The current view contains a filter, only showing file which is created by A.
(3) The file may in draft status and B is unable to see the draft.
In list settings -> versioning settings, when enable 'Require content approval for submitted items?', you could set specific user to see the draft items (items which is in pending status)

How to get all user information from Azure AD Graph based on list of users?

If I have a large list of users, how can I return a list of the ones that exist in Azure AD via the Graph without a huge performance hit?
Let's say the Azure Tenant has 30,000 users
And we want to check a list of 1,000 users to see if they exist
I see two ways to do this:
Iterate over each user and check if that user exists, passing in a filter to the graph on the UPN
Query Azure for all users and intersect on that set. This results in 30,000 users being returned which requires paging (~30 pages) on the Azure side. This significantly reduces performance.
Is there a POST request where you can pass in users to match on? Is there a limit to the amount of data you can put in the filter on the GET request?
I have tried to Repro to GET only the List of user from the bulk users in AD.
Use endsWith or startsWith below query
https://graph.microsoft.com/v1.0/users?$count=true&$search="displayName:room"&$filter=endsWith(mail, '#XXXXXXX.onmicrosoft.com')&$orderBy=displayName&$select=id,displayName,mail&$top=2
Make Sure ConsistencyLevel:eventual is added gives me the below result, which has search only top 2 as per the required data.

where does azure form recognizer store results?

The Azure Cognitive Services Form Recognizer API accepts requests via a POST, and then makes the results available for 48 hours via a GET request to a resource:
POST "https://westus.api.cognitive.microsoft.com/formrecognizer/v2.1-preview.3/custom/models/{modelId}/analyze?includeTextDetails={boolean}
GET "https://westus.api.cognitive.microsoft.com/formrecognizer/v2.1-preview.3/custom/models/{modelId}/analyzeResults/{resultId}]
Presumably Azure stores these results somewhere, but I can't find any documentation regarding it. After 48 hours, is the data deleted or just made unavailable? Where does the data reside? Who owns the data? Does the account owner have access to the underlying storage account or database?
Check here in their official doc. It mentions of a "The following diagram illustrates how your data is processed." but I cannot see one 🤔
Anyways, answering ahead....
is the data deleted or just made unavailable?
deleted
The input data and results are deleted within 48 hours and not used
for any other purpose.
To learn more about privacy and security commitments, see the
Microsoft Trust Center and cognitive services compliance and privacy.
Where does the data reside? Who owns the data?
Azure internal 😕
The incoming data is processed in the same region where the Cognitive
Services Azure resource was created. When you submit your documents to
a Form Recognizer operation, it starts the process of analyzing the
document to extract all text and identify structure and key values in
a document. Your data and results are then temporarily encrypted and
stored in Azure Storage.
When you create a Form Recognizer resource in the Azure portal, you
specify a region. From then on, your resource and all of its
operations stay associated with that particular Azure server region.
Does the account owner have access to the underlying storage account or database?
Analyze Form Result API
The "Get Analyze Results" operation is authenticated against the same
API key that was used to call the "Analyze" operation to ensure no
other customer can access your data.
Azure temporarily stores the results for customers to retrieve: Analyze and
Get Results are asynchronous calls. In other words, the service
doesn't know when the customers will call the Get Results operation to
fetch the extracted results. To facilitate checking the completion
status and returning the extracted results to the customer upon
completion, the extracted results are stored temporarily in Azure
Storage. This behavior allows customers to poll the asynchronous Get
Results operation for job completion status and fetch the results upon
completion.
There is already a similar ask from a user in feedback forum, you can vote that too to get product group's attention ✌

Row level access control in snowflake

I have a customer that owns a carpet cleaning business and we have all of his different franchisee's data in a multi-tenant database model and we would like to move this data into a data warehouse in snowflake. I don't want to have to build a separate database for each customer because then I have to keep each database up to date with the latest data model. I want to use 1 data model to rule them all. I have a tenant ID that I keep with each record to identify the franchisee's data. I want to give a set of credentials to each franchisee to where they can hook up their analytics tool of choice (tableau, power bi, etc.) and only get access to the rows that are applicable to them. Is there a way to secure the rows they see in each table based on their user. In other words some sort of row level access control similar to profiles in postgres. Are there any better methods for handling this type of scenario? Ultimately I want to maintain and manage the least number of elt jobs and data models.
This is the purpose of ether Secure Views, or Reader Accounts.
We are using both, and they have about the same technical hassle/setup costs. But we are using an internal tool to build/alter the schema's.
To expand on Simeon's answer:
You could have a single Snowflake account and create a Snowflake role & user for each franchisee. These roles would have access to a Secure View which uses the CURRENT_ROLE / CURRENT_USER context functions as in this example from the Snowflake documentation.
You'd have to have a role -> tennant ID "mapping table" which is used in the Secure View to limit the rows down to the correct franchisee.

How to store audit data in Azure

We're in the design phase for building an audit trail in an existing web application. The application runs on Windows Azure and uses a SQL Azure database.
The audit logs must be filtered by user, or by object type (eg. show all action of a user, or show all actions that are performed on a object).
We have to choose how to store the data, should we use SQL Azure, or should we use table storage? We prefer table storage (cheaper)..
however the 'problem' with table storage is how to define the partition key. We have several thousand customers (the appplication users) in our SQL database, each in their own tenant. Using the tenant ID as partition key is not specific enough, so we have to add something to the partition key. So there's the issue: given the requirements for filtering, we can add a user ID to the partition key to make filtering by user easy, or we can add an object ID to make filtering by object easy.
So we see two possible solutions:
- use SQL Azure instead of table storage
- use table storage and use two tables with different partition keys, which means we duplicate all entries
Any ideas what's the best approach for our situation? Are there other, better solutions?
DocumentDB on Azure might be worth considering.
https://azure.microsoft.com/en-us/documentation/articles/documentdb-use-cases/
You can have audit trail stored in DocDB as JSON documents (user, activity, object fields and can index on all fields )
Azure Table Storage is appropriate to store log data. As Azure App services use Azure Table Storage to store the diagnosis logs.
In think you can consider to set the PartitionKey as your user's tenant name, and the RowKey is the user's ID. As according the Table Storage Data Model, we only need to keep:
Together the PartitionKey and RowKey uniquely identify every entity within a table
Alternatively, you can clarify your concern about:
Using the tenant ID as partition key is not specific enough, so we have to add something to the partition key
Additionally, you can refer https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/#overview for more info about design Azure Table Storage.
Any update, please feel free to let me know.
If you're worried about filtering in multiple ways - you could always write the same data to multiple partitions. It works really well. For example, in our app we have Staff and Customers. When there is an interaction we want to track/trace that applied to both of them (perhaps an over the phone Purchase), we will write the same information (typically as json) to our audit tables.
{
PurchaseId: 9485,
CustomerId: 138,
StaffId: 509,
ProductId: 707958,
Quantity: 20
Price: 31.99,
Date: '2017-08-15 15:48:39'
}
And we will write that same row to the following partitions: Product_707958, Customer_138, Staff_509. The row key is the same across the three rows in each partition: Purchase_9485. Now if I want to go and query everything that has happened for a given staff, customer, or item, I just grab the whole partition. The storage is dirt cheap, so who cares if you write it to multiple places?
Also, an idea for you considering you have multiple tenants - you could make the table name Tenant_[SomeId]. There are some other issues you might have to deal with, but it is in a sense another key to get at schema-less data.

Resources