We have multiple pipelines in Data Factory V1 for each brand for our organization and We have common gateway named “SQLServerGateway” (Self-Hosted) for on-premises SQL Server for all these pipelines which are running well on a scheduled basis.
Now, we are trying to create a single test pipeline in Data Factory V2 which is doing the same job in Data Factory V1. Hence we are creating Linked Services in Data Factory V2 and we are trying to link existing gateway “SQLServerGateway” in linked services in V2. But we are not able to fetch that gateway (SQLServerGateway) in dropdown while creating new linked service for on-premises SQL Server.
Due to gateway not populating in the dropdown, we coded the below part in an advanced note. But we still receive some error while testing the connection.
Hence we would like to know how to connect the existing gateway in Data Factory V2 linked service.
{
"name": "SQLConn_RgTest",
"properties": {
"type": "SqlServer",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "Data Source=XXXX;Initial Catalog=XXXX;Integrated Security=False;user id=XXXX;password=XXXX;"
}
},
"connectVia": {
"referenceName": "SQLServerGateway",
"type": "SelfHostedIntegrationRuntime"
}
}
}
Currently you cannot reuse the V1 gateways in Data Factory V2.
For your scenario, you need to create a new self-hosted IR in Data Factory V2 and install it on another machine to run the test pipeline. You can follow this tutorial to setup the pipeline. And we will support sharing the self-hosted IR across V2 data factories soon.
Related
I need to use multiple Blob Storage Accounts in ADF. I am trying to create a single linked service for all storages with parameters. I unable to parameterized managed private endpoint. When I hardcode storage name then managed private endpoint (which has been created in ADF) gets selected automatically. Is there a way to parameterize it through Advance->JSON OR by any other way? ]
Unable to parameterize managed private endpoint. Did not find any Microsoft documentation.
I created Azure data factory and storage account in azure portal. I setup the Integration Runtime as mention below
I created managed private endpoint in Azure Data factory.
Manage->Security->managed private endpoint
Image for reference:
After creation of managed private endpoint we need to approve in storage account settings. for that I click on
Manage approval in azure portal which mentioned above. It takes me to below page
select private end point and click on approve which mentioned below.
It approves the private endpoint.
Image for reference:
Managed private endpoint is created and approved successfully.
we can achieve Parameterize Managed Private Endpoint in Blob Linked Service ADF using below Json script
{
"name": "DataLakeBlob",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"parameters": {
"StorageAccountEndpoint": {
"type": "String",
"defaultValue": "https://<storage AccountName>.blob.core.windows.net"
}
},
"type": "AzureBlobStorage",
"typeProperties": {
"sasUri": "#{linkedService().StorageAccountEndpoint}"
},
"description": "Test Description"
}
}
Mark the Specify dynamic contents in JSON format when test connection it connected successfully.
Image for reference:
This works from my end please check from your end.
I have requirement to update a ADF linked service configuration by API(or any other way through code, except using UI). I need to add 'init scripts' in the job cluster configuration of a linked service.
I got some Microsoft documentation on this, but it is only for creating a linked service, not for editing it.
Please let me know if you have any leads on this.
You can update ADF linked Service configuration by API.
Sample Request
PUT https://management.azure.com/subscriptions/12345678-1234-1234-1234-12345678abc/resourceGroups/exampleResourceGroup/providers/Microsoft.DataFactory/factories/exampleFactoryName/linkedservices/exampleLinkedService?api-version=2018-06-01
Request body
{
"properties": {
"type": "AzureStorage",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "DefaultEndpointsProtocol=https;AccountName=examplestorageaccount;AccountKey=<storage key>"
}
},
"description": "Example description"
}
}
In this link Sample Request and Request body are given.
For example, If you want to update AzureBlobStorage LinkedService, You can update configurations given here
We use a PowerShell module azure.datafactory.tools for deployments of ADF components.
It can replace a Linked Service with a new definition. Furthermore, you can test the deployed Linked Service with the module.
Assume we have a Checkpoint Firewall Template created on Azure Portal. Is there a way to test the Template within Azure? Also if the Template is modified, is there a way to Test that new modified Template within Azure?
You can test an ARM Template by using it in a deployment. You can also use the what-if setting to produce hypothetical output without actually deploying anything.
Microsoft Azure Docs for What-If
To create a What-If deployment you can proceed a number of ways; Azure CLI, PowerShell, REST, etc. Here is an example using REST (Postman).
Use the endpoint
POST https://management.azure.com/subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.Resources/deployments/{deploymentName}/whatIf?api-version=2020-06-01
Provide a body payload:
{
"location": "westus2",
"properties": {
"mode": "Incremental",
"parameters": {},
"template": {}
}
}
Add your template and parameters. Supply a bearer token for authentication and deploy.
You can check the Azure What-If REST API docs here.
I have generated template from existing Azure API management resource, modified it a bit, and tried to deploy using Azure CLI. But I'm getting the following error:
Deployment failed. Correlation ID: 7561a68f-54d1-4370-bf6a-175fd93a4b99. {
"error": {
"code": "MethodNotAllowed",
"message": "System group membership cannot be changed",
"details": null
}
}
But all the APIs are getting created and working fine. Can anyone help me solve the error. This is the command I tried to deploy in my ubuntu machine:
az group deployment create -g XXXX --template-file azuredeploy.json --parameters #param.json
Service Group Template:
{
"type": "Microsoft.ApiManagement/service/groups",
"apiVersion": "2018-06-01-preview",
"name": "[concat(parameters('service_name'), '/administrators')]",
"dependsOn": [
"[resourceId('Microsoft.ApiManagement/service', parameters('service_name'))]"
],
"properties": {
"displayName": "Administrators",
"description": "Administrators is a built-in group. Its membership is managed by the system. Microsoft Azure subscription administrators fall into this group.",
"type": "system"
}
}
You have several options if you want to copy an API Management instance to a new instance. Using the template is not listed here.
Use the backup and restore function in API Management. For more information, see How to implement disaster recovery by using service backup and restore in Azure API Management.
Create your own backup and restore feature by using the API Management REST API. Use the REST API to save and restore the entities from the service instance that you want.
Download the service configuration by using Git, and then upload it to a new instance. For more information, see How to save and configure your API Management service configuration by using Git.
Update:
I have Confirmed with Microsoft engineer that ARM template deployment for APIM failed is an known issue and is planning to fix it.(5/7/2019)
How can you fetch data from an http rest endpoint as an input for a data factory?
My use case is to fetch new data hourly from a rest HTTP GET and update/insert it into a document db in azure.
Can you just create an endpoint like this and put in the rest endpoint?
{
"name": "OnPremisesFileServerLinkedService",
"properties": {
"type": "OnPremisesFileServer",
"description": "",
"typeProperties": {
"host": "<host name which can be either UNC name e.g. \\\\server or localhost for the same machine hosting the gateway>",
"gatewayName": "<name of the gateway that will be used to connect to the shared folder or localhost>",
"userId": "<domain user name e.g. domain\\user>",
"password": "<domain password>"
}
}
}
And what kind of component do I add to create the data transformation job - I see that there is a a bunch of things like hdinsight, data lake and batch but not sure what the differences or appropriate service would be to simply upsert the new set into the azure documentDb.
I think the simplest way will be to use the Azure Logic Apps.
You can make a call to any Restfull service using the Http Connector in Azure Logic App connectors.
So you can do GET and POST/PUT etc in a flow based on schedule or based on some other GET listener:
Here is the documentation for it:
https://azure.microsoft.com/en-us/documentation/articles/app-service-logic-connector-http/
To do this with Azure Data Factory you will need to utilize Custom Activities.
Similar question here:
Using Azure Data Factory to get data from a REST API
If Azure Data Factory is not an absolute requirement Aram's suggestion might serve you better utilizing Logic Apps.
Hope that helps.
This can be achieved with Data Factory. This is especially good if you want to run batches on a schedule and have a single place for monitoring and management. There is sample code in our GitHub repo for an HTTP loader to blob here https://github.com/Azure/Azure-DataFactory. Then, the act of moving data from the blob to docdb will do the insert for you using our DocDB connector. There is a sample on how to use this connector here https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-documentdb-connector/ Here are the brief steps you will take to fulfill your scenario
Create a custom .NET activity to get your data to blob.
Create a linked service of type DocumentDb.
Create linked service of type AzureStorage.
Use input dataset of type AzureBlob.
Use output dataset of type DocumentDbCollection.
Create and schedule a pipeline that includes your custom activity, and a Copy Activity that uses BlobSource and DocumentDbCollectionSink schedule the activities to the required frequency and availability of the datasets.
Aside from that, choosing where to run your transforms (HDI, Data Lake, Batch) will depend on your I/o and perf reqs. You can choose to run your custom activity on Azure Batch or HDI in this case.