I have an Azure Data Factory V2 with an Integration Runtime installed on the our internal cloud server and connected to our Java web platform API. This passes data one way into ADF on a scheduled trigger via a request to the IR API.
The Java web platform also has a DR solution at another site, which is a mirror build of the same servers and platforms. If I was to install another IR on this DR platform and link to ADF as a secondary IR. Is there a way for ADF to detect if the primary is down and auto failover to the secondary IR?
Thanks
For you question "Is there a way for ADF to detect if the primary is down and auto failover to the secondary IR?", the answer is no, Data Factory doesn't have the failover feature. The shared integration runtime nodes don't affect each other.
For another question in the comment, the IR can't be stop/pause automatically, we must set it manually on the machine:
Related
I'm trying to copy data, using the copy activity in a synapse-pipeline, from a self hosted integration runtime rest api call to a azure data lake gen2. Using preview I can see the data from the rest api call but when I try to do the copy activity it is queued endlessly. Any idea why this happens? The Source is working with a self hosted integration Runtime and the Sink with azure integration runtime. Could this be the problem? Otherwise both connections are tested and working...
Edit: When trying the the web call, it tells me it's processing for a long time but I know I can connect to the rest api source since when using the preview feature in the copy activity it shows me the response....
Running the diagnostic tool, I receive the following error:
It seems to be a problem with the certificate. Any ideas?
Integration runtime
If you use a Self-hosted Integration Runtime (IR) and copy activity waits long in the queue until the IR has available resource to execute, suggest scaling out/up your IR.
If you use an Azure Integration Runtime that is in a not optimal region resulting in slow read/write, suggest configuring to use an IR in another region.
You can also try performance tuning tips
how do i scale up the IR?
Scale Considerations
is possible to configure a linked service between 2 or more datafactory?
I red documentation but i didn't found it
Thanks
Per my experience, we can't do that and never heard such configuration.
Just as I know, we only could share the Integration runtime between 2 or more Data Factory.
But we still need to create the linked service to connect to the same on-premise data source through shared shared self-hosted integration runtime
In one word, it's impossible to configure a linked service between 2 or more Data Factory.
I need to limit a download speed of one self hosted IR in Azure to an on premise server to prevent the network to get clogged up.
What are my options here? Is it possible to set this is ADF directly or in the IR or do I have to set this in the network?
According to this MSDN thread it's not possible to throttle bandwidth natively in Azure Data Factory.
However, if you are using an Azure Data Factory Self-Hosted Integration Runtime you could probably throttle the bandwidth at the VM level. For example, for Windows VMs you could try a QoS Group Policy to throttle the network for all applications or just the Integration Runtime executable.
Was there any loss of packets or impact on data - we recently just deployed ADF IR on-prem we have to set QOS to 50Mpbs across a 200Mpbs network.
I'm wondering if anyone has any experience in calling datasets dynamically in Azure Data Factory. The situation we have is that we dynamically sweep all tables in from IaaS (on-premise SQL Server installations on an Azure VM) application systems to a data lake. We want to have one pipeline that can pass server name, database name, user name and password to the pipeline's activities. The pipelines will then sweep whatever source they've been told to read from the parameters. The source systems are currently within a separate subscription and domain within our Enterprise Agreement.
We have looked into using the AutoResolveIntegrationRuntime on a generic SQL Server dataset but, as it is Azure and the runtimes on the VMs are self-hosted, it can't resolve and we get 'cannot connect' errors. So,
i) I don't know if this problem goes away if they are in the same subscription and domain?
That leaves whether anyone can assist with:
ii) A way of getting a dynamic runtime to resolve which SQL Server runtime it should use (we have one per VM for resilience purposes, but they can all see each other's instances). We don't want to parameterise a linked service on a particular VM as it places reliance for other VMs on that single VM.
iii) Ability to parameterise a dataset to call a runtime (doesn't look possible in the UI).
iv) Ability to parameterise the source and sink connections with pipeline activities to call a dataset parameter.
Servers, database, tableNames are possible to be dynamic by using parameters. The key problem here is that all the reference in ADF can’t be parameterized, like linked services reference in dataset, integrationRuntime reference in linked service. If you don’t have too many selfhosted integrationRuntime, maybe you can try setup different pipelines for different network?
I am looking at achieving Master Data deduplication based on match percentages in AzureDB...was looking at something equivalent to Master Data Services/ DQS (Data Quality Services) in SQL Server2012
https://channel9.msdn.com/posts/SQL11UPD05-REC-06
Broadly looking for controls on match rules (exact, close match etc), handle dependencies and audit trail(undo capability etc)
I reckon this must be available in Azure cloud, if this is made available in SQL Server. Could you pls point me to how I get this done on AzureDB
Please note- I am NOT looking for data Sources like MelissaDAta, D&B that are listed on the Azure marketplace
Master Data Services is not just a database process: it also centrally involves a website component, which still (as of 2021) requires some Windows server running IIS.
This can be an Azure Virtual Machine (link to documentation) but there is no serverless offering for this at this time.
The database itself can be hosted on an Azure SQL Managed Instance (link to documentation) but not on a standalone Azure SQL DB, as far as I can tell. This is presumably because some of the essential components of MDS sit outside the database, much like other services like SSIS are more than just a database.
Data Quality Services is a similar story: it uses three databases (link to documentation) and seemingly some components outside the databases, so wouldn't be possible to deploy in standalone Azure SQL DBs. It may be possible to run on a Managed Instance, I couldn't find a clear answer to that. And again, there is no fully-serverless offering at this time.
Of course, all of this can easily be run via IaaS (Infrastructure as a Service) using an Azure virtual machine running SQL Server.