Unable to file multiple BLOB to Synapse using data factory - azure

I want to copy bulk data from BLOB storage to Azure Synapse with the following structure:
BLOB STORAGE:-
devpresented (Storage account)
processedout (Container)
Merge_user (folder)
> part-00000-tid-89051a4e7ca02.csv
Sales_data (folder)
> part-00000-tid-5579282100a02.csv
SYNAPSE SQLDW:
SCHEMA:- PIPELINEDEMO
TABLE: Merge_user, Sales_data
Using data factory I want to copy BLOB data to Synapse database as below:
BLOB >> SQLDW
Merge_user >> PIPELINEDEMO.Merge_user
Sales_data >> PIPELINEDEMO.Sales_data
The following doc in mentioned for SQL DB to SQL DW:
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal
However, I didn't find anything for BLOB source in data facory.
Can anyone please suggest, how can I move multiple BLOB files to different tables.

If you need to copy the contents of different csv files to different tables in Azure Synapse, then you can enable multiple copy activities within a pipeline.
I create a .csv file, this is the content:
111,222,333,
444,555,666,
777,888,999,
I upload this to my storage account, Then I set this .csv file as the source of the copy activity.
After that, I create a table in azure synapse, and set this as the sink of the copy activity:
create table testbowman4(Prop_0 int, Prop_1 int, Prop_2 int)
At last, trigger this pipeline, you will find the data is in the table:
You can create multiple similar copy activities, and each copy activity performs a copy action from blob to azure synapse.

Related

Extracting Files from Onprem server to Azure Blob storage while filtering files with no data

I am trying to transfer on-premise files to azure blob storage. However, out of the 5 files that I have, 1 has "no data" so I can't map the schema. Is there a way I can filter out this file while importing it to azure? Or would I have to import them into azure blob storage as is then filter them to another blob storage? If so, how would I do this?
DataPath
CompleteFiles Nodata
If your on prem source is your local file system, then first copy the files with folder structure to a temporary blob container using azcopy SAS key. Please refer this thread to know about it.
Then use ADF pipeline to filter out the empty files and store it final blob container.
These are my files in blob container and sample2.csv is an empty file.
First use Get Meta data activity to get the files list in that container.
It will list all the files and give that array to the ForEach as #activity('Get Metadata1').output.childItems
Inside ForEach use lookup to get the row count of every file and if the count !=0 then use copy activity to copy the files.
Use dataset parameter to give the file name.
Inside if give the below condition.
#not(equals(activity('Lookup1').output.count,0))
Inside True activities use copy activity.
copy sink to another blob container:
Execute this pipeline and you can see the empty file is filtered out.
If your on-prem source is SQL, use lookup to get the list of tables and then use ForEach. Inside ForEach do the same procedure for individual tables.
If your on-prem source other than the above mentioned also, first try to copy all files to blob storage then follow the same procedure.

Azure Data Factory Exception while reading table from Synapse and using staging for Polybase

I'm using Data Flow in Data Factory and I need to join a table from Synapse with my flow of data.
When I added the new source in Azure Data Flow I had to add a Staging linked service (as the label said: "For SQL DW, please specify a staging location for PolyBase.")
So I specified a path in Azure Data Lake Gen2 in which Polybase can create its tem dir.
Nevertheless I'm getting this error:
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Source 'keyMapCliente': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: CREATE EXTERNAL TABLE AS SELECT statement failed as the path name 'abfss://MyContainerName#mystorgaename.dfs.core.windows.net/Raw/Tmp/e3e71c102e0a46cea0b286f17cc5b945/' could not be used for export. Please ensure that the specified path is a directory which exists or can be created, and that files can be created in that directory.","Details":"shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: CREATE EXTERNAL TABLE AS SELECT statement failed as the path name 'abfss://MyContainerName#mystorgaename.dfs.core.windows.net/Raw/Tmp/e3e71c102e0a46cea0b286f17cc5b945/' could not be used for export. Please ensure that the specified path is a directory which exists or can be created, and that files can be created in that directory.\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1632)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:872)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:767)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jd"}
The following are the Azure Data Flow Settings:
this the added source inside the data flow:
Any help is appreciated
I have reproed and was able to enable stagging location as azure data lake Gen2 storage account for polybase and connected synapse table data successfully.
Create your database scooped credentials with azure storage account key as secret.
Create an external data source and an external table with the scooped credentials created.
In Azure data factory:
Enable staging and connect to azure data lake Gen2 storage account with Account key authentication type.
In the data flow, connect your source to the synapse table and enable staging property in the source option

How to load only newly added file in Azure SQL DB

I am trying to load data from csv file to Azure SQL DB using copy activity. First I loaded three files from blob storage to Azure SQL DB. Then again Three new files are uploaded to blob storage and now I want to load only newly added file to azure sql db. File name are in this format: "student_index_date" where index are from 1-6 and I have to make use of this index.
You can use Getmetadata activity to get the list of child items and then filter based on the latest modified dt for the latest file and use it as source in copy activity.

Azure Data Factory - Data flow activity changing file names

I am running a data flow activity using Azure Data Factory.
Source data source - Azure bolb
Destination data source - Azure Data Lake Gen 2
For Eg. I have a file named "test_123.csv" in Azure blob. When I create a data flow activity to filter some data and copy to Data Lake it is changing the file name to "part-00.csv" in Data Lake.
I want to keep my original filename?
Yes you can do that , please look at the screenshot below . Please do let me know how it goes .

Get list of all files in a azure data lake directory to a look up activity in ADFV2

I have a number of files in azure data lake storage, i am creating a pipeline in ADFV2 to get the list of all the files in a folder in ADLS. How to do this?
You should use Get metadata activity.
Check this
You could follow below steps to list files in ADLS.
1: Use ADLS SDK to get the list file names in a specific directory and output the results. Such as Java SDK here. Of course, you could use .net or Python.
// list directory contents
List<DirectoryEntry> list = client.enumerateDirectory("/a/b", 2000);
System.out.println("Directory listing for directory /a/b:");
for (DirectoryEntry entry : list) {
printDirectoryInfo(entry);
}
System.out.println("Directory contents listed.");
2. Compile the file so that it could be executed.Store it into azure blob storage.
3.Use custom activity in azure data factory to configure the blob storage path and execute the program. More details,please follow this document.
You could use custom activity in Azure data factory.
https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-java-sdk#list-directory-contents

Resources