Does azure data factory support real time copy activity? - azure

I want to copy data from an on-premise oracle db to sql server on a real time basis.

Data Factory has many strengths, but frequency is not one of them. Have you considered a different approach?
For real time integrations I would recommend Functional App, Logic App and/or Service Bus. I would call this App whenever there is a relevant change in the oracle DB. Alternatively you could have an API on top of this on-premise oracle DB that you call from a scheduled APP.
If you expect heavy traffic you might want to consider using a Service Bus. The illustration below shows how Azure Service Bus sends data from a Publisher(on-premise) to a Subscriber(Azure sql DB) using Topic Message.
Illustration of large scale service bus dataflow
Ref. Azure Service Bus

Welcome to Stack Overflow!
You can copy data from an Oracle database to any supported sink data store. For a list of data stores that are supported as sources or sinks by the copy activity, see the Supported data stores table.
To copy data from and to an Oracle database that isn't publicly accessible, you need to set up a Self-hosted Integration Runtime. The integration runtime provides a built-in Oracle driver. Therefore, you don't need to manually install a driver when you copy data from and to Oracle.
For more details and step by step procedure, refer Copy data from and to Oracle by using Azure Data Factory.
You can also use custom activity as per your needs. Refer custom activities.
Hope this helps.

You could copy data incrementally. But there is frequency limitation. Please reference this post. https://social.msdn.microsoft.com/Forums/en-US/54380f98-716b-4a95-88af-cad2ab7e47b5/what-type-of-data-ingestion-does-azure-data-factory-use?forum=AzureDataFactory

Related

Azure Architecture Implementation Ideas

We've designed a Data Architecture for our client on Azure wherein, We ingest the sources into a Raw Layer consisting of a Azure SQL Database. This Azure SQL Database acts as a Source Mirror and Has Near Real time sync enabled.
We also have an ODS Layer which is populated from the Previously mentioned Azure SQL Database (Source Mirror) as per the given Data Model. This Layer should ideally take anywhere between 30mins and 1 Hour to Load.
May I Know How Do I Handle the Concurrent Writes and Reads from the Raw Layer (Azure SQL Database, Source Mirror) ? It Syncs every 5 mins with the Sources but also read every 30mins - 1 Hour for ODS Layer.
I've to Use Azure Data Factory to Implement my Data Loads
Yes, Azure Data Factory platform is best fit for such scenarios. Its a cloud-based ETL and data integration tool that lets you build data-driven processes for managing data transportation and data transformation at scale. You can use Azure Data Factory to design and plan data-driven processes (also known as pipelines) that may consume data from a variety of sources. With data flows or computing services like Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database, you can design sophisticated ETL processes that convert data graphically.
When using control flow, you can use a GetMetadata activity to get a
list of files in a storage account, then pass that list to a for each
activity with the Sequential flag set to false to process all files
concurrently (in parallel) up to the maximum batch size according to
the activities defined in the for each loop.
Here, is the Microsoft Official Document for the Azure Datafactory Connectors Overview | Docs

How can I use Azure Stream Analytics to use an on-premise SQL Server as an Output?

I'm following the instructions to set up App Insights to spool to SQL using Azure Stream Analytics, but I'm trying to deviate slightly to use an on-premise SQL server (that the web application already uses) over VPN.
At the point of adding the output, this is failing with:
Is it the case that IP addresses are not supported, or is it something more fundamental than that?
You are probably looking for answers directly to your question, which Jean-Sébastien answers succinctly. But an alternative architecture, if you haven't considered it already...
You could stream to a transient Azure SQL Database or Blob storage (likely cheaper depending on your workload), and then use Azure Data Factory tunnelled via a Self-Hosted Data Factory Integration Runtime to "send" the data back to on-premise SQL.
Data Factory V2 also has blob triggers, so rather than needing a schedule it could pickup any new blobs in micro batches.
I say "send" in quotation marks as the Integration Runtime actually creates an outgoing connection to from on-premise to Azure, yet gives the capability for push-like data transfer.
If data factory proves useful, here is a guide creating copy pipelines: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-hybrid-copy-portal
Albeit this guide is for on-prem sql to blob, but it gives you a stronger starting point.
At this time only Azure SQL Databases are supported in Azure Stream Analytics.
Sorry for the inconvenience.
Thanks,
JS (Azure Stream Analytics)

Which Azure products are needed for a staging database?

I have several external data APIs that I access using some Python scripts. My scripts run from an on-premises server, transform the data, and store it in a SQL Server database on the same server. I suppose it's a rudimentary ETL system run with Python and T-SQL.
The system is about to grow quite a bit with new APIs and will require more complex data pipelines (for example, some of the API data will be spun off to more than one table). I think this would be a good time to move the system onto Azure (we are heavily integrated with Microsoft so it will have to be Azure!).
I have spent a few days researching the Azure products that would let me run Python scripts to access data from web APIs and store the processed data in a cloud database. I'm looking for advice on what sort of Azure products other people have used for similar jobs. At the moment it seems I will need:
Azure SQL Database to hold the processed data that can be accessed by various colleagues.
Azure Data Factory to manage, log, and schedule the pipeline jobs and to run my custom Python scripts (is this even possible?).
Azure Batch to run the aforementioned Python scripts but I'm not sure about this.
I want to put together a proposal basically and start thinking about costs but it would be good to hear from someone who has done something similar - am I on the right track or completely off? Should I just stay on-premises? Thank you in advance.
Azure SQL Database, Azure SQL Data Warehouse are good for relational data. And if you want to use NoSQL, you could go with Azure Cosmos DB. If you want to use Files to store data, you could use Azure Data Lake.
For python scripts, you could use custom activity or Data bricks for Azure Data Factory.
Azure SQL Warehouse should be used if the amount of data you want to load is in petabytes. Also, Azure Data warehouse is not meant for complex transformations. I would recommend it for plain data load with PolyBase.

Azure IOT Suite how to create a dashboard to display sensor data

I am new to Azure. I have sensors and would like to send data from sensors to the Azure backend, preferably to a database. After collecting those sensor data I would like to display them on a dashboard. I wonder if there is a sample tutorial or source code to implement such a solution. Hope you can help me.
Thanks in advance & Best Regards.
The Azure IoT Suite is an accelerator that configures a solution using standard Azure services and each one comes with a dashboard. The source code is available on GitHub: Remote Monitoring and Predictive Maintenance
There are multiple ways of achieving this as Azure is not a final product but consists of different "modules" if you will.
If the idea is to create a dashboard that shows the sensor data, you don't necessarily need to store them into a database. You can create live streams and display them. If the storage of data is also of concern, then in parallel you can do that as well.
The redirection logic of the data here would be Stream Analytics, it works with the concepts of an input sink, a query and an output sink. Now in your case you might want to create an Event Hub/ IoT Hub, then use it as an input sink to the Stream Analytics, and use PowerBI as the output sink. This will get the data and display it in PBi. If you also want to store the data, you can add another output sink for different options like , blob storage, document db, azure sql database etc.
Also there is an Azure IoT suite remote monitoring solution, that automates most of those tasks with some extra modules, you can use that to create a solution and use it as a boilerplate.
Below are step-by-step tutorials for;
Event Hubs
Stream analytics
PowerBI
Hope this helps!
Mert

Is Azure Data Factory suitable for downloading data from non-Azure REST APIs?

Consider a data processing pipeline as follows:
Fetch a large amount of data from a REST API that's hosted somewhere on the internet and persist it to a data store.
Perform some complex data transformations on the persisted data.
Persist the results of the data transformations on a data store.
Aiming to implement such a pipeline in Azure, steps 2 and 3 seem like a good fit for implementation as Azure Data Factory activities.
My questions is - Does it make sense to implement step 1 in an Azure Data Factory activity as well?
Technically it might be possible to code a .Net activity that perform the data download and persistence.
No - do not implement step 1 in an Azure Data Factory activity.
Technically it is possible to run the entire process from ADF but I would argue that the choice is more costly (relatively) than other options available to you because you will pay for each activity in Azure Data Factory.
For instance, what if the rest api does not have any new data to offer when you initiate the (scheduled) activity? You'll pay for that.
You might consider the following as an easy to implement alternative:
1 - Create a .NET console app, publish as a WebJob, schedule to run daily.
2 - The long-running console app can query the rest api, persist data into azure storage / documentdb, push a message into queue which triggers ADF steps 2/3 to run against the saved data.
I have done exactly that using .Net Activity. I had a need to fetch data from Salesforce api. This has been working well for my needs. Here is a post I wrote up about creating a .net activity and storing the data in azure data lake.
As in Newport99's answer yes you will incur costs for that activity but I am not sure how cost effect it would be to be running a separate web app to host a web job and also run the Azure Data Factory pipeline. When I was originally designing a solution the WebJob was my first choice but in the end I prefer to have the whole solution utilizing one azure service instead of multiple.
Hope that helps.
There have been a lot of improvements to ADF in the years since this question was posted, including a REST connector.
Here's the approach recommended by ADF at this time...
Copy data from a REST endpoint by using Azure Data Factory

Resources