Data classification in Unity Catalog of Azure Databricks

Data classification in Unity Catalog of Azure Databricks - azure

Question: Does Unity Catalog in Azure Databricks have the feature of classifying assets? If so, can someone please provide links to online documentation on this feature in Unity Catalog? Please see the context below:
Unity Catalog is the Azure Databricks data governance solution for the Lakehouse. And Microsoft Purview is a data governance solution for on-premises, multicloud, and software-as-a-service (SaaS) data.
The Data classification in the Microsoft Purview governance portal is a way of categorizing data assets by assigning unique logical tags or classes to the data assets. Classification is based on the business context of the data. For example, you might classify assets by Passport Number, Driver's License Number, Credit Card Number, SWIFT Code, Person’s Name, and so on.
When Purview scans your data storage it identifies/classifies the data assets based on the data that matches the classification (logical tags such as Passport Number, Credit Card number etc.)
When you classify data assets, you make them easier to understand, search, and govern. Classifying data assets also helps you understand the risks associated with them.

Unity Catalog right now doesn't provide such classifications. There were roadmap presentations that included the notion of attribute classes that could be also used to classify objects and assign permissions for accessing data marked as specific attribute (attribute-based access control). But right now there is no specific timeline for that future.
Although it's now possible to use SQL to add comments on both table & column level, although it may not be very convenient.
P.S. Potentially Purview <-> Unity Catalog integration will allow use use Purview's UI to classify data in UC, but I haven't seen such integration in

Related

Azure Data Catalog Backup

Since ADC is provided by MS as SaaS to customers, is MS taking backups of the dataset and business glossary? If yes, how often and how can a customer get access to the backups for recovery purposes?

Unfortunately, there is no explicit backup/restore feature available for catalogs.
I would suggest you to vote up an idea submitted by another Azure customer.
https://feedback.azure.com/forums/906052-data-catalog/suggestions/33125845-azure-data-catalog-backup-feature
All of the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.
The closest way to achieve this with current functionality is to use the Azure Data Catalog REST API to extract all assets and persist them locally (and re import them manually later).
There is a sample application available that demonstrates this technique: Data Catalog Import/Export sample tool.

How Modifying Azure Analysis services roles using a logic app?

With Azure Data Factory I have built a pipeline to orchestrate the processing of my Azure Analysis Services model trough a dedicated Logic App as explicated in this article, and it works properly.
Now, always using Azure Data Factory (through Logic App), I wish I could also update the list of the user in a specific roles.
In the article mentioned above, to process the Azure Analysis Services models, the Logic App calls a specific API that has the following format:
https:// <rollout>.asazure.windows.net/servers/<serverName>/models/<resource>/refreshes
but this API doesn't seem to work for update the model's roles.
Is there anyone who knows the correct method to be able to update model roles using a specific Logic App?
Thanks for any suggestions

If you don't necessarily need to use the logic app for this, I think it might be possible using Azure automation and the powershell cmdlets for managing azure analysis services:
https://learn.microsoft.com/en-us/azure/analysis-services/analysis-services-refresh-azure-automation
https://learn.microsoft.com/en-us/azure/analysis-services/analysis-services-powershell
https://learn.microsoft.com/en-us/powershell/module/sqlserver/Add-RoleMember?view=sqlserver-ps

One alternative approach might be to have fixed AD groups as members of the tabular model roles and add / remove members from those AD groups. Therefore the tabular model roles would not need to be refreshed, it would simply be a matter of adding or removing members from the AD groups as part of your governance process.
A second approach would be to use dynamic row-level security. Adding records to a Azure SQL DB table is perfectly possible with Logic Apps and could be used to drive security, depending on your requirements. You can then refresh your security dimension with the Logic App. See here for more details:
https://learn.microsoft.com/en-us/power-bi/desktop-tutorial-row-level-security-onprem-ssas-tabular
To answer your question however, the Azure Analysis Services REST API is useful but is not that fully featured, ie it does not contain all possible operations for tabular models or the service. One other missing example I found was backups, ie although it is possible to trigger a pause or resume of the service, it is not possible to trigger a backup of a tabular model via the REST API. I do not believe it is possible to alter role members or at least, the operation is not listed in the REST API, although happy to be corrected if I am wrong. To be more specific, Roles is not mentioned in the list of available objects which can be passed in to the Objects array using the POST / Refreshes eg here. table and partition are the only ones I'm aware of.
There are also no examples on the MS github site:
https://github.com/microsoft/Analysis-Services
Finally, consider calling TMSL via Powershell in an Azure Function, which you can call from Azure Data Factory.
HTH

Does MS use the data on Azure for internal machine learning?

My organization wants to use Microsoft Text Analytics API for sentiment analysis. But my employer concern is that MS will be using that data for the live training of their sentiment engine. Is this the case?

See the Microsoft Trust Center for your answer.
How we manage your data
With Microsoft, you are the owner of your customer data.
Microsoft will use your customer data only to provide the services we
have agreed upon, and for purposes that are compatible with providing
those services. We do not share your data with our
advertiser-supported services, nor do we mine it for marketing or
advertising. If you leave the service, we take the necessary steps to
ensure the continued ownership of your data.

Azure Application Insights for Service Fabric

I have multiple services running on Service Fabric. I would like to add Application Insight for logging. I'm just wondering whether I have to add an Application Insight resource for each microservice or only one is common for all. What is the best practice?

There is no such thing a the best practice for this. It really depends. Some considerations:
Pricing: depending on the level (basic or enterprise) you will get an amount of data for free / included in the base price. See the docs. So in some cases, depending on the amount of traffic you can reduce costs by having a dedicated AI resource per service. AI resources for services that send data below the threshold of the AI pricing plan are then (almost) free.
Querying: if you split up services per AI resource getting an overview of the whole system is difficult since at the moment you cannot create queries spanning multiple AI resources.
Responsibility: If you have multiple teams working on multiple services it might be an option to have an AI resource per team so they have a good insight in only the parts they are responsible for.
If you do decide to use a shared AI resource there are options like custom telemetry initializers to include custom data that further identify which ASF application or service is sending the data if it is not included by default.
See also Add Application Insight to a existing Azure Service Fabric cluster for more info about how to integrate AI.
Now, when it comes to bring data together you do have some additional options that may or may not need additional services or configuration. For example:
PowerBi: You can visualize data of AI resources using dashboards, see https://learn.microsoft.com/en-us/azure/application-insights/app-insights-export-power-bi
OMS: Operation Management Suite, See https://blogs.technet.microsoft.com/msoms/2016/09/26/application-insights-connector-in-oms/. As Jesse mentions you can link multiple AI Resources
Custom dashboards: Using the rest api you can create your own solution that displays data for one or more AI resources.

Azure Search with multiple indexes

I need to enable full text & faceted search for a service that stores each customer data in a separate Azure SQL database. Each database in turn stores customer's multiple projects data. Each database can contain n number of projects. Each customer's project's data is accessed as a isolated data repository. Therefore, I need search and facets to be limited to each project's data. Since Azure search supports finite number of Indexes, I am not sure how to best leverage it in my scenario? Moreover, searchable data across projects will have different set of information that needs to be searched. Therefore, columns in Index will vary from project to project in each database.
How to best address this problem through Azure search?

Take a look at the Design patterns for multitenant SaaS applications and Azure Search. In particular, in some cases you can share an index across tenants and use filters to isolate data - see this section. The drawback of this approach is that sharing data across tenants can affect search relevance (since term frequency / document frequency are scoped to an index), but in many scenarios this is acceptable.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string