Mapping JSON Object to ADX dynamic column in Azure Data Factory - azure

I am trying to ingest a JSON file blob into an Azure Data Explorer table using the copy object in Azure Data Factory. To then transform and process the data afterward.
Is there a way to map the 'Properties' object to a dynamic column in ADX? I have tried creating an ingestion mapping in ADX and referencing that in the sink settings of the copy. I got an error that the mapping reference was not found.
If anyone has achieved this before, advice would be welcome. Thanks

You can ingest JSON data into an ADX table by using this.
https://learn.microsoft.com/en-us/azure/data-explorer/ingest-json-formats?tabs=kusto-query-language#ingest-mapped-json-records
You don't need to create a Copy pipeline, instead use Azure Data Explorer Command from ADF and run the following as a command
.ingest into table Events ('https://kustosamplefiles.blob.core.windows.net/jsonsamplefiles/simple.json') with '{"format":"json", "ingestionMappingReference":"FlatEventMapping"}'
I have ingested JSON which is stored in a blob container into a table in ADX using this approach.

Related

File is not readed completely by Copy Data in Azure Data Factory

I'm developing a pipeline that be able to insert data from a .txt file located in the Blob Storage into a table in a SQL Data Base.
Problem: Somehow the activity configuration is not working properly cause' is not reading all the records in the file and in consequence is not loading all the data into the Data Base (I realized this issue when I opened the file and compared the number of records from .text file against SQL table. Also, when I searched records from the last month in the table on SQL I didn't find them)
Note: I checked out the size limit of characters in the table from SQL and that isn't the problem.
I'd like to share with you the Data Copy activity and Source Data Set configuration as well:
Sink Dataset:
Do you know, guys what I'm doing wrong here? Hope you can help me, best regards.
P.S. Here's the Source Dataset
As discussed in comments, while using copy activity you would have to make sure to set the schema before running the activity. By design the schema mapping is left empty and has to be configured by the user either manually or asking adf to import the schema from the dataset.
Note: While using Auto create table option in sink, it automatically creates sink table (if nonexistent) in source schema,
but won't be supported when a stored procedure is specified (on the
sink side) or when staging is enabled.
Using COPY statement to load data into Azure Synapse Analytics as sink, the connector supports automatically creating destination table with DISTRIBUTION = ROUND_ROBIN if not exists based on the source schema.
Refer official doc: Copy and transform data in Azure Synapse Analytics by using Azure Data Factory or Synapse pipelines
Source...
Sink...
So Azure Synapse will be used as the sink. Additionally, an Azure Synapse table has to be created which matches the column names, column order, and column data types of source.
For dynamic mapping
If you view the pipeline code, you can see in the Translator section the JSON equivalent of the mapping section from UI.
You can reuse this as a base in Dynamic mapping to enable further copying similar files without having to manually configure schema.
Copy the JSON under mappings in translator

How to incrementally load data from Azure Blob storage to Azure SQL Database using Data Factory?

I have a json file stored in Azure Blob Storage and I have loaded it into Azure SQL DB using Data Factory.
Now I would like to find a way in order to load only new records from the file to my database (as the file is being updated every week or so). Is there a way to do it?
Thanks!
You can use the upsert ( slowly changing dimension type 1 ) that is already implemented in Azure Data Factory.
It will add new record and update old record that changed.
Here a quick tutorial :
https://www.youtube.com/watch?v=MzHWZ5_KMYo
I would suggest you to use Dataflow activity.
In Dataflow Activity, you have the option of alter row as shown in below image.
In Alter row you can use Upsert if condition.
Here mention condition as 1 == 1

Azure Data Factory: read from csv and copy row by row to a cosmos db

I'm new to Azure Data Factory. I'm trying to solve the following problem:
Read csv file from Azure Blob
Parse it row by row and dump each row into an existing cosmos db
I am currently looking into a solution that does:
Copy data from source (csv) to sink (Azure Storage Table)
ForEach activity that parses the table and copies the rows into the db
Is this a correct approach, and if it is, how should I set up the dynamic content of the ForEach activity?
Note:
I've tried this solution (link) but I get an error message saying
Reading or replacing offers is not supported for serverless accounts
which means that
CosmosDB Serverless is not currently supported as Sink for the Data flow in Azure Data Factory.
If you use Lookup + ForEach actives, the Foreach Items should be:
#activity('Lookup1').output.value
Your solution may be hard to achieve that.
Since you have found that Data Flow doesn't support Cosmos DB Serverless, I think you may can ref this tutorial: Copy Data From Blob Storage To Cosmos DB Using Azure Data Factory
It uses copy active to copy data from a csv file in Blob Storage to Azure Cosmos DB directly.

is it posible update row values from tables in Azure Data Factory?

I have a dataset in Data Factory, and I would like to know if is possible update row values using only data factory activities, without data flow, store procedures, queries...
There is a way to do update (and probably any other SQL statement) from Data Factory, it's a bit tacky though.
The Loopup activity, can execute a set of statements in Query mode, ie:
The only condition is to end it with select, otherwise Lookup activity throws error.
This works for Azure SQL, PostgreSQL, and most likely for any other DB Data Factory can connect to.
Concepts:
Datasets:
Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
Now, a dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.
Currently, according to my experience, it's impossible to update row values using only data factory activities. Azure Data Factory doesn't support this now.
Fore more details,please reference:
Datasets
Datasets and linked services in Azure Data Factory.
For example, when I use Copy Active, Data Factory doesn't provide my any ways to update the rows:
Hope this helps.
This is now possible in Azure Data Factory, your Data flow should have an Alter Row stage, and the Sink has a drop-down where you can select the key column for doing updates.
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row
As mentioned in Above comment regarding ADF data flow, ADF data flow does not support on-permise sink or source, the sink & source should reside in Azure SQL or Azure Data lake or any other AZURE data services.

Error trying to copy data from Azure SQL database to Azure Blob Storage

I have created a pipeline in Azure data factory (V1). I have a copy pipeline, that has an AzureSqlTable data set on input and AzureBlob data set as output. The AzureSqlTable data set that I use as input, is created as output of another pipeline. In this pipeline I launch a procedure that copies one table entry to blob csv file.
I get the following error when launching pipeline:
Copy activity encountered a user error: ErrorCode=UserErrorTabularCopyBehaviorNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=CopyBehavior property is not supported if the source is tabular data source.,Source=Microsoft.DataTransfer.ClientLibrary,'.
How can I solve this?
According to the error information, it indicateds that it is not supported action for Azure data factory, but if use Azure sql table as input and Azure blob data as output it should be supported by Azure data factory.
I also do a demo test it with Azure portal. You also could follow the detail steps to do that.
1.Click the copy data from Azure portal.
2.Set copy properties.
3.Select the source
4.Select the destination data store
5.Complete the deployment
6.Check the result from azure and storage.
Update:
If we want to use the existing dataset we could choose [From Existing Conections], for more information please refer to the screenshot.
Update2:
For Data Factory(v1) copy activity settings it just supports to use existing Azure blob storage/Azure Data Lake Store Dataset. More detail information please refer to this link.
If using Data Factory(V2) is acceptable, we could using existing azure sql dataset.
So, actually, if we don't use this awful "Copy data (PREVIEW)" action and we actually add an activity to existing pipeline and not a new pipeline - everything works. So the solution is to add a copy activity manually into an existing pipeline.

Resources