File is not readed completely by Copy Data in Azure Data Factory - azure

I'm developing a pipeline that be able to insert data from a .txt file located in the Blob Storage into a table in a SQL Data Base.
Problem: Somehow the activity configuration is not working properly cause' is not reading all the records in the file and in consequence is not loading all the data into the Data Base (I realized this issue when I opened the file and compared the number of records from .text file against SQL table. Also, when I searched records from the last month in the table on SQL I didn't find them)
Note: I checked out the size limit of characters in the table from SQL and that isn't the problem.
I'd like to share with you the Data Copy activity and Source Data Set configuration as well:
Sink Dataset:
Do you know, guys what I'm doing wrong here? Hope you can help me, best regards.
P.S. Here's the Source Dataset

As discussed in comments, while using copy activity you would have to make sure to set the schema before running the activity. By design the schema mapping is left empty and has to be configured by the user either manually or asking adf to import the schema from the dataset.
Note: While using Auto create table option in sink, it automatically creates sink table (if nonexistent) in source schema,
but won't be supported when a stored procedure is specified (on the
sink side) or when staging is enabled.
Using COPY statement to load data into Azure Synapse Analytics as sink, the connector supports automatically creating destination table with DISTRIBUTION = ROUND_ROBIN if not exists based on the source schema.
Refer official doc: Copy and transform data in Azure Synapse Analytics by using Azure Data Factory or Synapse pipelines
Source...
Sink...
So Azure Synapse will be used as the sink. Additionally, an Azure Synapse table has to be created which matches the column names, column order, and column data types of source.
For dynamic mapping
If you view the pipeline code, you can see in the Translator section the JSON equivalent of the mapping section from UI.
You can reuse this as a base in Dynamic mapping to enable further copying similar files without having to manually configure schema.
Copy the JSON under mappings in translator

Related

How to incrementally load data from Azure Blob storage to Azure SQL Database using Data Factory?

I have a json file stored in Azure Blob Storage and I have loaded it into Azure SQL DB using Data Factory.
Now I would like to find a way in order to load only new records from the file to my database (as the file is being updated every week or so). Is there a way to do it?
Thanks!
You can use the upsert ( slowly changing dimension type 1 ) that is already implemented in Azure Data Factory.
It will add new record and update old record that changed.
Here a quick tutorial :
https://www.youtube.com/watch?v=MzHWZ5_KMYo
I would suggest you to use Dataflow activity.
In Dataflow Activity, you have the option of alter row as shown in below image.
In Alter row you can use Upsert if condition.
Here mention condition as 1 == 1

Mapping JSON Object to ADX dynamic column in Azure Data Factory

I am trying to ingest a JSON file blob into an Azure Data Explorer table using the copy object in Azure Data Factory. To then transform and process the data afterward.
Is there a way to map the 'Properties' object to a dynamic column in ADX? I have tried creating an ingestion mapping in ADX and referencing that in the sink settings of the copy. I got an error that the mapping reference was not found.
If anyone has achieved this before, advice would be welcome. Thanks
You can ingest JSON data into an ADX table by using this.
https://learn.microsoft.com/en-us/azure/data-explorer/ingest-json-formats?tabs=kusto-query-language#ingest-mapped-json-records
You don't need to create a Copy pipeline, instead use Azure Data Explorer Command from ADF and run the following as a command
.ingest into table Events ('https://kustosamplefiles.blob.core.windows.net/jsonsamplefiles/simple.json') with '{"format":"json", "ingestionMappingReference":"FlatEventMapping"}'
I have ingested JSON which is stored in a blob container into a table in ADX using this approach.

Mapping tab in ADF "Copy data" action missing new table storage property/column name

We have an existing Azure Data Factory pipeline that takes data from Azure Table Storage table and copies the data to Azure SQL table - which is working without issue.
The problem exists when we added a new data element to the table storage (since it is NoSQL). When I go into the ADF Source of Pipeline and refresh the table storage, the new data is not available to map properly. Is there something I am missing to get this new data element (column) to show up. I know this is working correctly since I can see this column in Azure Storage Explorer.
Congratulations that you found the answer:
"I located the answer with additional research. See article: https://stackoverflow.com/questions/44123539/azure-data-factory-changing-azure-table-schema"
This can be beneficial to other community members.

Azure data factory copy data

I want to get the same schema as my source dataset without creating it in the Azure database.
Can I get the same schema as the source data set without creating a table in an azure database while copying the data?
Can I get the same schema as the source data set without creating a
table in azure data base while copying the data.
Copy activity needs real being input and output data source.If you want to get the schema, you have to set a real being dataset. It could not be limited to azure database, it could be some rest api or on-prem File system which provides data schema.

Can i populate different SQL tables at once inside azure data factory when the source data set is Blob storage?

I want to copy data from azure blob storage to azure sql database. The destination database is divided among different tables.
So is there any way in which i directly send the blob data to different sql tables using a single pipeline in one copy activity?
As this should be a trigger based pipeline so it is a continuous process, i created trigger for every hour but right now i can just send blob data to one table and then divide them into different table by invoking another pipeline where source and sink dataset both are SQL database.
Finding a solution for this
You could use a stored procedure in your database as a sink in the copy activity. This way, you can define the logic in the stored procedure to write the data to your destination tables. You can find the description of the stored procedure sink here.
You'll have to use a user defined table type for this solution, maintaining them can be difficult, if you run into issues, you can have a look at my & BioEcoSS' answer in this thread.
According to my experience and Azure Data Factory doucmentation, we could not directly send the blob data to different sql tables using a single pipeline in one copy activity.
Because during Table mapping settings, One Copy Data Active only allows us select one corresponding table in the destination data store or specify the stored procedure to run at the destination.
You don't need to create a new pipeline, just add a new copy data active, each copy active call different stored procedure.
Hope this helps.

Resources