Azure Data Factory - Data flow activity changing file names - azure

I am running a data flow activity using Azure Data Factory.
Source data source - Azure bolb
Destination data source - Azure Data Lake Gen 2
For Eg. I have a file named "test_123.csv" in Azure blob. When I create a data flow activity to filter some data and copy to Data Lake it is changing the file name to "part-00.csv" in Data Lake.
I want to keep my original filename?

Yes you can do that , please look at the screenshot below . Please do let me know how it goes .

Related

How can I create the current Year/Month/Day folder dynamically in the Azure Data Factory pipeline?

I'm using copy activity to send data to Azure Data Lake Gen2. I need to create a Year/Month/Day folder dynamically.
file_1.csv
file_2.csv
file_3.csv
.
-
-
file_9.csv
My question: how can I Create a Year/Month/Day folder dynamically transferring from one container to another container?
You can use the following procedure for getting Year/Month/Day folder dynamically.
In my storage account, these are the files.
Create a copy activity with wild card path.
Then, go to sink -> Open sink dataset and Create dataset parameter with the name folder.
Go to connection, and added this dynamic content: #dataset().folder
Now, add this dynamic content to the dataset properties value:
#concat(formatDateTime(utcnow(), 'yyyy'), '/',formatDateTime(utcnow(),'MM'),'/',formatDateTime(utcnow(),'dd'),'/')
Pipeline successfully executed and got this output :

Azure Data Factory Exception while reading table from Synapse and using staging for Polybase

I'm using Data Flow in Data Factory and I need to join a table from Synapse with my flow of data.
When I added the new source in Azure Data Flow I had to add a Staging linked service (as the label said: "For SQL DW, please specify a staging location for PolyBase.")
So I specified a path in Azure Data Lake Gen2 in which Polybase can create its tem dir.
Nevertheless I'm getting this error:
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Source 'keyMapCliente': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: CREATE EXTERNAL TABLE AS SELECT statement failed as the path name 'abfss://MyContainerName#mystorgaename.dfs.core.windows.net/Raw/Tmp/e3e71c102e0a46cea0b286f17cc5b945/' could not be used for export. Please ensure that the specified path is a directory which exists or can be created, and that files can be created in that directory.","Details":"shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: CREATE EXTERNAL TABLE AS SELECT statement failed as the path name 'abfss://MyContainerName#mystorgaename.dfs.core.windows.net/Raw/Tmp/e3e71c102e0a46cea0b286f17cc5b945/' could not be used for export. Please ensure that the specified path is a directory which exists or can be created, and that files can be created in that directory.\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1632)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:872)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:767)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jd"}
The following are the Azure Data Flow Settings:
this the added source inside the data flow:
Any help is appreciated
I have reproed and was able to enable stagging location as azure data lake Gen2 storage account for polybase and connected synapse table data successfully.
Create your database scooped credentials with azure storage account key as secret.
Create an external data source and an external table with the scooped credentials created.
In Azure data factory:
Enable staging and connect to azure data lake Gen2 storage account with Account key authentication type.
In the data flow, connect your source to the synapse table and enable staging property in the source option

Unable to file multiple BLOB to Synapse using data factory

I want to copy bulk data from BLOB storage to Azure Synapse with the following structure:
BLOB STORAGE:-
devpresented (Storage account)
processedout (Container)
Merge_user (folder)
> part-00000-tid-89051a4e7ca02.csv
Sales_data (folder)
> part-00000-tid-5579282100a02.csv
SYNAPSE SQLDW:
SCHEMA:- PIPELINEDEMO
TABLE: Merge_user, Sales_data
Using data factory I want to copy BLOB data to Synapse database as below:
BLOB >> SQLDW
Merge_user >> PIPELINEDEMO.Merge_user
Sales_data >> PIPELINEDEMO.Sales_data
The following doc in mentioned for SQL DB to SQL DW:
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal
However, I didn't find anything for BLOB source in data facory.
Can anyone please suggest, how can I move multiple BLOB files to different tables.
If you need to copy the contents of different csv files to different tables in Azure Synapse, then you can enable multiple copy activities within a pipeline.
I create a .csv file, this is the content:
111,222,333,
444,555,666,
777,888,999,
I upload this to my storage account, Then I set this .csv file as the source of the copy activity.
After that, I create a table in azure synapse, and set this as the sink of the copy activity:
create table testbowman4(Prop_0 int, Prop_1 int, Prop_2 int)
At last, trigger this pipeline, you will find the data is in the table:
You can create multiple similar copy activities, and each copy activity performs a copy action from blob to azure synapse.

Azure Data Factory - Azure Data Lake Gen1 access

A file is being added by the Logic Apps to the Data Factory V2
I have a Data Factory that access 'data lake gen 1' to process the file. I receive the following error, when I try to debug the data factory after file is added.
"ErrorCode=FileForbidden,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to read a 'AzureDataLakeStore' file. File path: 'Stem/Benchmark/DB_0_Measures_1_05052020 - Copy - Copy - rounded, date changed - Copy (3).csv'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (403) Forbidden.,Source=System,'",
When I "Apply to children" after next load permission, error is gone.
Tried so far:
- Assigned permission in Data Lake for the Data Factory and it`s children.
Assigned permission in Data Lake Folder for the Data Factory and it's children.
Added data factory as a contributor to data lake.
Added data factory as an owner to data lake.
Allowed "all Azure services to access this Data Lake Storage Gen1 account".
After all tries, still need manually to "apply permission to children" for each file added.
Is there anyway to fix this?
Can reproduce your error:
This is how I resolve:
And my account is the owner of the data lake gen1. The datafactory is the contributor of the data lake gen1.
you need to give Read + Execute Access on parent folders and then do what #Bowman Zhu mentioned above.

How to access files stored in AzureDataLake and use this file as input to AzureBatchStep in azure.pipleline.step?

I registered an Azure data lake datastore as in the documentation in order to access the files stored in it.
I used
DataReference(datastore, data_reference_name=None, path_on_datastore=None, mode='mount', path_on_compute=None, overwrite=False)
and used it as input to azure pipeline step in AzureBatchStep method.
But I got an issue: that datastore name could not be fetched in input.
Is Azure Data Lake not accessible in Azure ML or am I getting it wrong?
Azure Data Lake is not supported as an input in AzureBatchStep. You should probably use a DataTransferStep to copy data from ADL to Blob and then use the output of the DataTransferStep as an input to AzureBatchStep.

Resources