Data Factory v1 mask some headers as credential in http headers - azure

I have a Data Factory (v1) which downloads some files from an HTTP server.
Within the dataset pointing to the file location on this server we add an API key as an additional header to the HTTP request. We don't want this key to be visible from the portal similar to how Linked Services mask credentials after having been deployed.
The following Json files define the source linked service, the source dataset and the copy activity.
HTTP_source_linkedservice.json
{
"name": "HTTPSourceLinkedService",
"properties": {
"hubName": "this_is_a_hubname",
"type": "Http",
"typeProperties": {
"url": "https://website.com",
"authenticationType": "Anonymous"
}
}
}
HTTP_source_dataset
{
"name": "HTTPSourceDataset",
"properties": {
"published": false,
"type": "Http",
"linkedServiceName": "HTTPSourceLinkedService",
"typeProperties": {
"relativeUrl": "/main_file_to_download",
"additionalHeaders": "X-api-key: API_KEY_HERE\n"
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": true,
"policy": {}
}
}
Copy Activity
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "HttpSource"
},
"sink": {
"type": "BlobSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "HTTPSourceDataset"
}
],
"outputs": [
{
"name": "HTTPSinkDataset"
}
],
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "CopyFileFromServer"
}
I know we could use a Custom Activity to make the request itself and fetch the API key from a keyvault but I really want to use the standard Copy Activity.
Is there a way to achieve this ?

Unfortunately, I think this is not possible. Header field are defined as string. And in v1, there is even no secure string which is introduced in v2 to indicate a field is credentials.
But I think this can’t be achieved in v2 either. As the model type is fixed.

Related

How to grab column value from ADLS gen 2 csv file and use the column value in the body of the email,also send blob data as attachment to outlook mail

Here is my Scenario,
There will be a drop of csv file into blob storage every day ,that will be processed by my dataflow in ADF and generate a csv in output folder.
Now Using logic apps, I need to send that csv file (less than 10 mb ) as an attachement to customer via Outlook connector.
Besides ,My body of the email must have dynamic value coming from that blob csv .
For example 'AppWorks' is the column value in column 'Works/not'. Sometimes it may be "AppNotWorks".So How to handle this scenario in Azure logic apps
You can use the combination of both data factory and logic apps to do this. Use look up activity to get the first row of the file (Since the entire column value will be same, we can get the required value from one row).
Now use web activity to trigger the logic app. Pass the logic app's HTTP request URL to web activity. In the body, pass the following dynamic content:
#activity('Lookup1').output.firstRow
When you debug the pipeline, the logic app will be successfully triggered. I have given the Request Body JSON schema to get values individually. For the sample I have taken, it would look as shown below:
{
"properties": {
"customer": {
"type": "string"
},
"id": {
"type": "string"
}
},
"type": "object"
}
Create a connection to storage account to link the required file.
Now, using the Outlook connector, send the Email.
The following is the entire Logic app JSON:
{
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"Get_blob_content_(V2)": {
"inputs": {
"host": {
"connection": {
"name": "#parameters('$connections')['azureblob']['connectionId']"
}
},
"method": "get",
"path": "/v2/datasets/#{encodeURIComponent(encodeURIComponent('AccountNameFromSettings'))}/files/#{encodeURIComponent(encodeURIComponent('JTJmZGF0YSUyZnNhbXBsZTEuY3N2'))}/content",
"queries": {
"inferContentType": true
}
},
"metadata": {
"JTJmZGF0YSUyZnNhbXBsZTEuY3N2": "/data/sample1.csv"
},
"runAfter": {},
"type": "ApiConnection"
},
"Send_an_email_(V2)": {
"inputs": {
"body": {
"Attachments": [
{
"ContentBytes": "#{base64(body('Get_blob_content_(V2)'))}",
"Name": "sample1.csv"
}
],
"Body": "<p>Hi #{triggerBody()?['customer']},<br>\n<br>\nRandom description</p>",
"Importance": "Normal",
"Subject": "sample data",
"To": "<to_email>"
},
"host": {
"connection": {
"name": "#parameters('$connections')['office365']['connectionId']"
}
},
"method": "post",
"path": "/v2/Mail"
},
"runAfter": {
"Get_blob_content_(V2)": [
"Succeeded"
]
},
"type": "ApiConnection"
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"parameters": {
"$connections": {
"defaultValue": {},
"type": "Object"
}
},
"triggers": {
"manual": {
"inputs": {
"schema": {
"properties": {
"customer": {
"type": "string"
},
"id": {
"type": "string"
}
},
"type": "object"
}
},
"kind": "Http",
"type": "Request"
}
}
},
"parameters": {
"$connections": {
"value": {
"azureblob": {
"connectionId": "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Web/connections/azureblob",
"connectionName": "azureblob",
"id": "/subscriptions/xxx/providers/Microsoft.Web/locations/westus2/managedApis/azureblob"
},
"office365": {
"connectionId": "/subscriptions/xxx/resourceGroups/v-sarikontha-Mindtree/providers/Microsoft.Web/connections/office365",
"connectionName": "office365",
"id": "/subscriptions/xxx/providers/Microsoft.Web/locations/westus2/managedApis/office365"
}
}
}
}
}
The following is the resulting Mail image for reference:

Azure data-factory can't load data successfully through PolyBase if the source data in the last column of the first row is null

I am try using Azure DataFactory to load data from Azure Blob Storage to Azure Data warehouse
The relevant data is like below:
source csv:
1,james,
2,john,usa
sink table:
CREATE TABLE test_null (
id int NOT NULL,
name nvarchar(128) NULL,
address nvarchar(128) NULL
)
source dataset:
{
"name": "test_null_input",
"properties": {
"linkedServiceName": {
"referenceName": "StagingBlobStorage",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": "1.csv",
"folderPath": "test_null",
"container": "adf"
},
"columnDelimiter": ",",
"escapeChar": "",
"firstRowAsHeader": false,
"quoteChar": ""
},
"schema": []
}
}
sink dataset:
{
"name": "test_null_output",
"properties": {
"linkedServiceName": {
"referenceName": "StagingAzureSqlDW",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "AzureSqlDWTable",
"schema": [
{
"name": "id",
"type": "int",
"precision": 10
},
{
"name": "name",
"type": "nvarchar"
},
{
"name": "address",
"type": "nvarchar"
}
],
"typeProperties": {
"schema": "dbo",
"table": "test_null"
}
}
}
pipeline
{
"name": "test_input",
"properties": {
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "SqlDWSink",
"allowPolyBase": true,
"polyBaseSettings": {
"rejectValue": 0,
"rejectType": "value",
"useTypeDefault": false,
"treatEmptyAsNull": true
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"ordinal": 1
},
"sink": {
"name": "id"
}
},
{
"source": {
"ordinal": 2
},
"sink": {
"name": "name"
}
},
{
"source": {
"ordinal": 3
},
"sink": {
"name": "address"
}
}
]
}
},
"inputs": [
{
"referenceName": "test_null_input",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "test_null_output",
"type": "DatasetReference"
}
]
}
],
"annotations": []
}
}
The last column for the first row is null so when run the pipeline it pops out the below error:
ErrorCode=UserErrorInvalidColumnMappingColumnNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Invalid column mapping provided to copy activity: '{"Prop_0":"id","Prop_1":"name","Prop_2":"address"}', Detailed message: Column 'Prop_2' defined in column mapping cannot be found in Source structure.. Check column mapping in table definition.,Source=Microsoft.DataTransfer.Common,'
Tried set the treatEmptyAsNull to true, still the same error. Tried set skipLineCount to 1, it can work well, seems the last column null data in the first row affects the loading of the entire file. But the weirder thing is that it can also work well by enable staging even without setting treatEmptyAsNull and skipLineCount. In my scenario, it is unnecessary to enable it, since it is originally from blob to data warehouse. It seems unreasonable to change from blob to blob and then from blob to data warehouse after enabling, and it will bring additional data movement charges after enabling. I don't know why setting treatEmptyAsNull doesn't work, and then why enabling staging can work,this seems to make no sense?
I have reproduced the above with your Pipeline JSON and got same error.
This error occurred because as per your JSON, this is your copy data mapping between source and sink.
As per the above mapping you should have Prop_0, Prop_1 and Prop_2 as headers.
Here, as you didn't check the First Row as header in your source file, it is taking Prop_0, Prop_1 as headers. Since there is a null value in your first Row there is no Prop_2 column and that is the reason it is giving the error for that column.
To resolve it, Give a proper header your file in csv like below.
Then check the First Row as header in the source dataset.
It will give the mapping like below when you import.
Now, it will Execute successfully as mine.
Result:
You can see that the empty value taken as NULL in target table.

How to get Azure Data Factory to Loop Through Files in a Folder

I am looking at the link below.
https://azure.microsoft.com/en-us/updates/data-factory-supports-wildcard-file-filter-for-copy-activity/
We are supposed to have the ability to use wildcard characters in folder paths and file names. If we click on the 'Activity' and click 'Source', we see this view.
I would like to loop through months any days, so it should be something like this view.
Of course that doesn't actually work. I'm getting errors that read: ErrorCode: 'PathNotFound'. Message: 'The specified path does not exist.'. How can I get the tool to recursively iterate through all files in all folders, given a specific pattern of strings in a file path and file name? Thanks.
I would like to loop through months any days
In order to do this you can pass two parameters to the activity from your pipeline so that the path can be build dynamically based on those parameters. ADF V2 allows you to pass parameters.
Let's start the process one by one:
1. Create a pipeline and pass two parameters in it for your month and day.
Note: This parameters can be passed from the output of other activities as well if needed. Reference: Parameters in ADF
2. Create two datasets.
2.1 Sink Dataset - Blob Storage here. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.
2.2 Source Dataset - Blob Storage here again or depends as per your need. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.
Note:
1. The folder path decides the path to copy the data. If the container does not exists, the activity will create for you and if the file already exists the file will get overwritten by default.
2. Pass the parameters in the dataset if you want to build the output path dynamically. Here i have created two parameters for dataset named monthcopy and datacopy.
3. Create Copy Activity in the pipeline.
Wildcard Folder Path:
#{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}
where:
The path will become as: current-yyyy/month-passed/day-passed/* (the * will take any folder on one level)
Test Result:
JSON Template for the pipeline:
{
"name": "pipeline2",
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"wildcardFolderPath": {
"value": "#{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}",
"type": "Expression"
},
"wildcardFileName": "*.csv",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".csv"
}
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "DelimitedText1",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "DelimitedText2",
"type": "DatasetReference",
"parameters": {
"monthcopy": {
"value": "#pipeline().parameters.month",
"type": "Expression"
},
"datacopy": {
"value": "#pipeline().parameters.day",
"type": "Expression"
}
}
}
]
}
],
"parameters": {
"month": {
"type": "string"
},
"day": {
"type": "string"
}
},
"annotations": []
}
}
JSON Template for the SINK dataset:
{
"name": "DelimitedText1",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage1",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"container": "corpdata"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"quoteChar": "\""
},
"schema": []
}
}
JSON Template for the Source Dataset:
{
"name": "DelimitedText2",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage1",
"type": "LinkedServiceReference"
},
"parameters": {
"monthcopy": {
"type": "string"
},
"datacopy": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"folderPath": {
"value": "#concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),dataset().monthcopy,'/',dataset().datacopy)",
"type": "Expression"
},
"container": "copycorpdata"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"quoteChar": "\""
},
"schema": []
}
}

Using ADF REST connector to read and transform FHIR data

I am trying to use Azure Data Factory to read data from a FHIR server and transform the results into newline delimited JSON (ndjson) files in Azure Blob storage. Specifically, if you query a FHIR server, you might get something like:
{
"resourceType": "Bundle",
"id": "som-id",
"type": "searchset",
"link": [
{
"relation": "next",
"url": "https://fhirserver/?ct=token"
},
{
"relation": "self",
"url": "https://fhirserver/"
}
],
"entry": [
{
"fullUrl": "https://fhirserver/Organization/1234",
"resource": {
"resourceType": "Organization",
"id": "1234",
// More fields
},
{
"fullUrl": "https://fhirserver/Organization/456",
"resource": {
"resourceType": "Organization",
"id": "456",
// More fields
},
// More resources
]
}
Basically a bundle of resources. I would like to transform that into a newline delimited (aka ndjson) file where each line is just the json for a resource:
{"resourceType": "Organization", "id": "1234", // More fields }
{"resourceType": "Organization", "id": "456", // More fields }
// More lines with resources
I am able to get the REST connector set up and it can query the FHIR server (including pagination), but no matter what I try I cannot seem to generate the ouput I want. I set up an Azure Blob storage dataset:
{
"name": "AzureBlob1",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage1",
"type": "LinkedServiceReference"
},
"type": "AzureBlob",
"typeProperties": {
"format": {
"type": "JsonFormat",
"filePattern": "setOfObjects"
},
"fileName": "myout.json",
"folderPath": "outfhirfromadf"
}
},
"type": "Microsoft.DataFactory/factories/datasets"
}
And configure a copy activity:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"source": {
"type": "RestSource",
"httpRequestTimeout": "00:01:40",
"requestInterval": "00.00:00:00.010"
},
"sink": {
"type": "BlobSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"schemaMapping": {
"resource": "resource"
},
"collectionReference": "$.entry"
}
},
"inputs": [
{
"referenceName": "FHIRSource",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "AzureBlob1",
"type": "DatasetReference"
}
]
}
]
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
But at the end (in spite of configuring the schema mapping), it the end result in the blob is always just the original bundle returned from the server. If I configure the output blob as being a comma delimited text, I can extract fields and create a flattened tabular view, but that is not really what I want.
Any suggestions would be much appreciated.
So I sort of found a solution. If I do the original step of converting where the bundles are simply dumped in the JSON file and then do a nother conversion from the JSON file to what I pretend to be a text file into another blob, I can get the njson file created.
Basically, define another blob dataset:
{
"name": "AzureBlob2",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage1",
"type": "LinkedServiceReference"
},
"type": "AzureBlob",
"structure": [
{
"name": "Prop_0",
"type": "String"
}
],
"typeProperties": {
"format": {
"type": "TextFormat",
"columnDelimiter": ",",
"rowDelimiter": "",
"quoteChar": "",
"nullValue": "\\N",
"encodingName": null,
"treatEmptyAsNull": true,
"skipLineCount": 0,
"firstRowAsHeader": false
},
"fileName": "myout.json",
"folderPath": "adfjsonout2"
}
},
"type": "Microsoft.DataFactory/factories/datasets"
}
Note that this one TextFormat and also note that the quoteChar is blank. If I then add another Copy Activity:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"source": {
"type": "RestSource",
"httpRequestTimeout": "00:01:40",
"requestInterval": "00.00:00:00.010"
},
"sink": {
"type": "BlobSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"schemaMapping": {
"['resource']": "resource"
},
"collectionReference": "$.entry"
}
},
"inputs": [
{
"referenceName": "FHIRSource",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "AzureBlob1",
"type": "DatasetReference"
}
]
},
{
"name": "Copy Data2",
"type": "Copy",
"dependsOn": [
{
"activity": "Copy Data1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"source": {
"type": "BlobSource",
"recursive": true
},
"sink": {
"type": "BlobSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"columnMappings": {
"resource": "Prop_0"
}
}
},
"inputs": [
{
"referenceName": "AzureBlob1",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "AzureBlob2",
"type": "DatasetReference"
}
]
}
]
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
Then it all works out. It is not ideal in that I now have two copies of the data in blobs, but one can easily be deleted, I suppose.
I would still love to hear about it if somebody has a one-step solution.
As briefly discussed in the comment, the Copy Activity does not provide much functionality aside from mapping data. As stated in the documentation, the Copy activity does the following operations:
Reads data from a source data store.
Performs serialization/deserialization, compression/decompression, column mapping, etc. It does these operations based on the
configurations of the input dataset, output dataset, and Copy
Activity.
Writes data to the sink/destination data store.
It does not look like that the Copy Activity does anything else aside from efficiently copying stuff around.
What I found out to be working was to use Databrick.
Here are the steps:
Add a Databricks account to your subscription;
Go to the Databricks page by clicking the authoring button;
Create a notebook;
Write the script (Scala, Python or .Net was recently announced).
The script would the following:
Read the data from the Blob storage;
Filter out & transform the data as needed;
Write the data back to a Blob storage;
You can test your script from there and, once ready, you can go back to your pipeline and create a Notebook activity that will point to your notebook containing the script.
I struggled coding in Scala but it was worth it :)
For anyone finding this post in the future you can just can use the $export api call to accomplish this. Note that you have to have a storage account linked to your Fhir server.
https://build.fhir.org/ig/HL7/bulk-data/export.html#endpoint---system-level-export

Azure Data Factory Pipeline + ML

I am trying to do a pipeline in Azure Data factory V1 which will do an Azure Batch Execution on a file. I implemented it using a blob storage as input and output and it worked. However, I am not trying to change the input and output to a folder in my data lake store. When I try to deploy it, it gives me the following error:
Entity provisioning failed: AzureML Activity 'MLActivity' specifies 'DatalakeInput' in a property that requires an Azure Blob Dataset reference.
How can I have the input and output as a datalakestore instead of a blob?
Pipeline:
{
"name": "MLPipeline",
"properties": {
"description": "use AzureML model",
"activities": [
{
"type": "AzureMLBatchExecution",
"typeProperties": {
"webServiceInput": "DatalakeInput",
"webServiceOutputs": {
"output1": "DatalakeOutput"
},
"webServiceInputs": {},
"globalParameters": {}
},
"inputs": [
{
"name": "DatalakeInput"
}
],
"outputs": [
{
"name": "DatalakeOutput"
}
],
"policy": {
"timeout": "02:00:00",
"concurrency": 3,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "MLActivity",
"description": "description",
"linkedServiceName": "MyAzureMLLinkedService"
}
],
"start": "2016-02-08T00:00:00Z",
"end": "2016-02-08T00:00:00Z",
"isPaused": false,
"hubName": "hubname",
"pipelineMode": "Scheduled"
}
}
Output dataset:
{
"name": "DatalakeOutput",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "AzureDataLakeStoreLinkedService",
"typeProperties": {
"folderPath": "/DATA_MANAGEMENT/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
Input dataset:
{
"name": "DatalakeInput",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "AzureDataLakeStoreLinkedService",
"typeProperties": {
"fileName": "data.csv",
"folderPath": "/RAW/",
"format": {
"type": "TextFormat",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
AzureDatalakeStoreLinkedService:
{
"name": "AzureDataLakeStoreLinkedService",
"properties": {
"description": "",
"hubName": "xyzdatafactoryv1_hub",
"type": "AzureDataLakeStore",
"typeProperties": {
"dataLakeStoreUri": "https://xyzdatastore.azuredatalakestore.net/webhdfs/v1",
"authorization": "**********",
"sessionId": "**********",
"subscriptionId": "*****",
"resourceGroupName": "xyzresourcegroup"
}
}
}
The linked service was done following this tutorial based on data factory V1.
I assume there is some issue with AzureDataLakeStoreLinkedService. Please verify.
Depending on the authentication used for access data store, your AzureDataLakeStoreLinkedService json must look like below -
Using service principal authentication
{
"name": "AzureDataLakeStoreLinkedService",
"properties": {
"type": "AzureDataLakeStore",
"typeProperties": {
"dataLakeStoreUri": "https://<accountname>.azuredatalakestore.net/webhdfs/v1",
"servicePrincipalId": "<service principal id>",
"servicePrincipalKey": {
"type": "SecureString",
"value": "<service principal key>"
},
"tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>",
"subscriptionId": "<subscription of ADLS>",
"resourceGroupName": "<resource group of ADLS>"
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
Using managed service identity authentication
{
"name": "AzureDataLakeStoreLinkedService",
"properties": {
"type": "AzureDataLakeStore",
"typeProperties": {
"dataLakeStoreUri": "https://<accountname>.azuredatalakestore.net/webhdfs/v1",
"tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>",
"subscriptionId": "<subscription of ADLS>",
"resourceGroupName": "<resource group of ADLS>"
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
This is Microsoft Document for reference - Copy data to or from Azure Data Lake Store by using Azure Data Factory

Resources