Import Schemas in Azure Data Factory with Parameters - azure

I am trying to develop a simple ADF pipeline that copies data from a delimited file to MySQL database, when such a file is uploaded to a Blob Storage Account. I am using parameters to define the name of the Storage Account, the Container that houses the files and file name (inputStorageAccount, inputContainer, inputFile). The name of the Storage Account is a global parameter and the other two are meant to be provided by the trigger. The Linked Service has also been parameterized.
However, I want to define the mappings for this operation. So, I am trying to 'import schemas' by providing the values for these parameters (I have stored a sample file in the Storage Account). But, I keep getting this error when trying to do so,
What am I doing wrong? How can I get this to work?
I would also like to know why I am not being asked to provide a value for the inputContainer parameter when I try to use 'import schema' at the dataset level,

Where you have to add the values Add dynamic content [Alt+P] :
Just as mentioned here in the below Snip, Go to the + Symbol where you will find a window and need to fill in the parameter name, type and value:
Where we can directly select the parameter according to the options :
Here is another detailed scenario which might help: Using Azure DataFactory Parameterized Linked Service | Docs, then you can reset the schema.

Related

Azure storage table copy

I have a problem using azure AzCopy. Here my scenario.
I have 2 storage accounts, which I am gonna name storage1 and storage2.
Storage1 contains some important data in multiple tables, what I want to do..is to be able to copy all the tables in storage1 to storage2 (having a backup).
I tried 2 different approaches:
AzCopy
Azure Data Factory
With Azure Data Factory I didn't have any particular problem to make it work, I was able to move all the blobs from storage1 to the data Factory but I I couldn't move the tables and have no clue if this is possible to do it with python.
with AzCopy I had zero luck. I gave myself permission in IAM Blob Storage Data contributor and from the terminal when I run this command:
azcopy cp 'https://storage1.table.core.windows.net/Table1' 'https://storage2[...]-Key'
I got the permission error.
In this specific scenario I would love to be able to use AzCopy as is way more simple than data factory as all what I need is to move those table from one storage to the other.
Anyone who can help me out to understand what am I doing wrong with azCopy please?
EDIT:
This is the error when I try to copy the table using azcopy
INFO: The parameters you supplied were Source: 'https://storage1.table.core.windows.net/[SAS]' of type Local, and Destination: 'https://storage2.table.core.windows.net/[SAS]' of type Local
INFO: Based on the parameters supplied, a valid source-destination combination could not automatically be found. Please check the parameters you supplied. If they are correct, please specify an exact source and destination type using the --from-to switch. Valid values are two-word phases of the form BlobLocal, LocalBlob etc. Use the word 'Blob' for Blob Storage, 'Local' for the local file system, 'File' for Azure Files, and 'BlobFS' for ADLS Gen2. If you need a combination that is not supported yet, please log an issue on the AzCopy GitHub issues list.
failed to parse user input due to error: the inferred source/destination combination could not be identified, or is currently not supported
If you want to copy all the tables which is present is abc container to xyz container. Use simple copy acitivty and while creating dataset just give folder path that copy all the content i.e all the tables to your xyz container.
I would like to watch below video from the 30th mins. It will help in your scenario.
https://youtu.be/m6wyB-Hm3j0

How to use dynamic content in relative URL in Azure Data Factory

I have a REST data source where I need pass in multiple parameters to build out a dataset in Azure Data Factory V2.
I have about 500 parameters that I need to pass in so don’t want to pass these individually. I can manually put these in a list (I don’t have to link to another data source to source these). The parameters would be something like [a123, d345, e678]
I'm working in the UI. I cannot figure out how to pass these into the relative URL (where it says Parameter) to then form the dataset. I could do this in Power BI using functions and parameters but can't figure it out in Azure Data Factory as I'm completely new to it. I'm using the Copy Data functionality in ADF to do this.
The sink would be a json file in an Azure blob that I can then access via Power BI. I'm fine with this part.
Relative URL with Parameter requirement
How to add dynamic content
I'm afraid that your requirement can't be implemented.As you know,ADF REST dataset is used to retrieving data from a REST endpoint by using the GET or POST methods with Http request. No way to configure a list of parameters in the relativeUrl property which ADF would loop it automaticlly for you.
Two ways to retrieve your goal:
1.Loop your parameter array ,pass single item into relativeUrl to execute copy activity individually.Using this way,you could use foreach activity in the ADF.
2.Write a overall api to accept list paramter from the requestBody,execute your business in the api inside with loop.

Copying files in fileshare with Azure Data Factory configuration problem

I am trying to learn using the Azure Data Factory to copy data (a collection of csv files in a folder structure) from an Azure File Share to a Cosmos DB instance.
In Azure Data factory I'm creating a "copy data" activity and try to set my file share as source using the following host:
mystorageaccount.file.core.windows.net\\mystoragefilesharename
When trying to test the connection, I get the following error:
[{"code":9059,"message":"File path 'E:\\approot\\mscissstorage.file.core.windows.net\\mystoragefilesharename' is not supported. Check the configuration to make sure the path is valid."}]
Should I move the data to another storage type like a blob or I am not entering the correct host url?
You'll need to specify the host in json file like this "\\myserver\share" if you create pipeline with JSON directly or you use set the host url like this "\myserver\share" if you're using UI to setup pipeline.
Here is more info:
https://learn.microsoft.com/en-us/azure/data-factory/connector-file-system#sample-linked-service-and-dataset-definitions
I believe when you create file linked service, you might choose public IR. If you choose public IR, local path (e.g c:\xxx, D:\xxx) is not allowed, because the machine that run your job is managed by us, which not contains any customer data. Please use self-hosted IR to copy your local files.
Based on the link posted by Nicolas Zhang: https://learn.microsoft.com/en-us/azure/data-factory/connector-file-system#sample-linked-service-and-dataset-definitions and the examples provided therein, I was able to solve it an successfully create the copy action. I had two errors (I'm configuring via the data factory UI and not directly the JSON):
In the host path, the correct one should be: \\mystorageaccount.file.core.windows.net\mystoragefilesharename\myfolderpath
The username and password must be the one corresponding to the storage account and not to the actual user's account which I was erroneously using.

Azure Data Factory Copy Data dynamically get last blob

I have an Azure Data Factory Pipeline that runs on a Blob Created Trigger, I want it to grab the last Blob added and copy that to the desired location.
How do I dynamically generate the file path for this outcome?
System Variables
Expressions and Functions
"#triggerBody().folderPath" and "#triggerBody().fileName" captures the last created blob file path in event trigger. You need to map your pipeline parameter to these two trigger properties. Please follow this link to do the parameter passing and reference. Thanks.

how to pass a folder from a data lake store as a parameter to a pipeline?

In data factory, I know you can pass a parameter at the beginning of a pipeline, and then access it later using #pipeline(). If I have a folder in a data lake store, how can I pass that as a parameter and have access to it later (let's say I want to loop a for-each over each file inside it.) Do I pass the path to the folder? Am I passing it as an object?
Here are the steps that you can use -
You can use pass folder path as a parameter (string) to the pipeline.
Use the path and "Get Metadata" activity with "Child Items". This will return the list of files in JSON Format
Get Metadata Selection
Loop through using "Foreach" activity and perform any action.
Use output from metadata activity as Items in Foreach activity (example below)
#activity('Get List of Files').output
Hope this helps
First, you need create a data lake store linked service. It will contain the path of azure data lake store. You could use azure data factory UI to create the linked service
Then you need create a data lake store dataset reference that linked service in step 2.
Then you create a getMetaData activity reference dataset in step 2.
Then following steps provided by summit.
All of these can be done in UI.https://learn.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal#create-a-pipeline

Resources