I have an azure table storage with thousands of records. I need to mask certain sensitive data eg :
CustomerName : Tim Captain as T***** C****
BSBNumber :0342 8765 as ***8765
Good day!
Unlike Azure SQL databases/synapse which provides the functionality of Dynamic data masking, Azure table storage doesnt have any such functionality.
In case if you need masking aspect, you would have to have a custom logic while loading data to mask data during ingestion and based on access need use a reverse logic to unmask while extracting
Related
We are using Azure Data factory to move data from Source like Azure SQL and Azure Postgres to destination as Azure data lake.There is some sensitive data which needs to be masked.
Is it possible to have data masking in Azure Data factory during transformation phase only?
Thanks! in advance
You can leverage on Cryptographic functions of the source DB and use it in the SELECT statement to get encrypted data into the data lake. If you use a reversible function you can decrypt later on.
You can also mask the data using SQL function (e.g selecting only a substring of a sensitive column) but then it won't be reversible (same thing if you leverage Data Masking on the Azure SQL DB)
Here the Cryptographic functions for Azure SQL DB
Hi there is a option of dynamic data masking in the portal where you have deployed the database. You can go there and select the table and column to mask your data
https://learn.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview?view=azuresql
I have a dataset in Data Factory, and I would like to know if is possible update row values using only data factory activities, without data flow, store procedures, queries...
There is a way to do update (and probably any other SQL statement) from Data Factory, it's a bit tacky though.
The Loopup activity, can execute a set of statements in Query mode, ie:
The only condition is to end it with select, otherwise Lookup activity throws error.
This works for Azure SQL, PostgreSQL, and most likely for any other DB Data Factory can connect to.
Concepts:
Datasets:
Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
Now, a dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.
Currently, according to my experience, it's impossible to update row values using only data factory activities. Azure Data Factory doesn't support this now.
Fore more details,please reference:
Datasets
Datasets and linked services in Azure Data Factory.
For example, when I use Copy Active, Data Factory doesn't provide my any ways to update the rows:
Hope this helps.
This is now possible in Azure Data Factory, your Data flow should have an Alter Row stage, and the Sink has a drop-down where you can select the key column for doing updates.
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row
As mentioned in Above comment regarding ADF data flow, ADF data flow does not support on-permise sink or source, the sink & source should reside in Azure SQL or Azure Data lake or any other AZURE data services.
I'm trying to find the best solution for storing dynamic spatial data. I wonder if any of Microsoft's Azure solutions could work. Azure Table Storage would let me create a lot of custom and dynamic structures stored on fast SSD disks.
Because of data's dynamic nature, common indexing seems useless. I would also like to create a lot of table-like structures so the whole architecture cannot be static. Using Azure Table Storage I would dynamically create a table based on country, city, etc sorted by latitude or longitude.
I would appreciate any clue.
Azure Table Storage has mostly been replaced by Azure Cosmos DB.
At the time of writing the Table Storage page even says:
The content in this article applies to the original basic Azure Table storage. However, there is now a premium offering for Azure Table storage in public preview that offers throughput-optimized tables, global distribution, and automatic secondary indexes. To learn more and try out the new premium experience, please check out Azure Cosmos DB: Table API.
You can use Cosmos DB via the Table API, but you'll probably find the Document DB API to be more powerful.
Documents are "schema-free". You can just throw your documents in to a collection, and then you can query against them.
You can create documents which have geo-spatial properties which are indexed automatically.
Then you can perform geo-spatial queries against those properties.
For example you might give each of your documents a point, and then create a query to select all documents that are inside of a polygon.
Or maybe you want to find out how far away each document is from a given point.
I've Two Custom code dll, for Image related to IP Cams.
dll-One : Extract image from IP cams and can be stored it to Azure data lake Store.
Like :
/adls/clinic1/patientimages
/adls/clinic2/patientimages
dll-two : Use those image and extract information from it and load data into RDBMS tables.
So for instance in RDBMS ,say there are entities dimpatient, dimclinic and factpatientVisit.
For start, a one time data can be exported to defined location in Azure data lake store.
Like:
/adls/dimpatient
/adls/dimclinic
/adls/factpatientVisit
Question :
How to push incremental data in same file or how we can handle this incremental load in Azure data Analytics?
This like implementing Warehouse in Azure Data Analytics.
Note: Azure SQL db or any other storage offered by Azure is not want to.
I mean why to spend in other Azure Services if one type of storage has capabilities to hold all types of data.
adls is name of my ADLS storage.
I am not sure I completely understand your question, but you can organize your data files in Azure Data Lake Store or your rows in partitioned U-SQL tables along a time dimension, so you can add new partitions/files for each increment. In general, we recommend that such increments are of substantial sizes though to preserve the ability to scale.
I have an application that looks up data for a page. The data is looked up by primary key and row key in table storage.
I am considering SQL Azure storage. Is there some advantage in my going to this kind of storage being that the look up will always be very direct. Note that I do NOT need any reporting. ALL I want is single row look up
I am considering SQL Azure storage. Is there some advantage in my going to this kind of storage being that the look up will always be very direct. Note that I do NOT need any reporting. ALL I want is single row look up
Assuming that your requirements are fully stated as will only ever need single row access, and assuming that you only want to know about advantages and not disadvantages, then the only advantages I can think of are that SQL azure offers:
time-based subscription pricing instead of pricing per transaction
options for backup (in CTP)
options for replication/synchronisation
more client library options (e.g. Entity Framework, Linq2SQL, etc)
more data types supported
more options for moving your app outside of Azure if you ever want to
Use Table Storage if you don't need relational database functionality.