Populate Azure Data warehouse from Oracle - azure

I have an Oracle database which I want to be available for BI (Power BI)
I want to set up a data warehouse in Microsoft SQL preferably in Azure and I'm wondering what approaches/tooling is recommended?
I'm a bit old school so in the past I would have set up some SSIS ETL to get data from Oracle to a Microsoft database. I would then run the DTSX packages nightly to keep things upto date.
With Azure and data lakes/PaaS I'm expecting there are much easier ways?

Related

Near real-time ETL of Oracle data to Azure SQL

I have an Oracle DB with data that I need to load and transform into an Azure SQL Database. I have no control over either the DB nor the application that updates its data.
I'm looking at Azure Data Factory, but I really need data changes in Oracle to be reflected as near to real-time as possible.
I would appreciate any suggestions / insights.
Is ADF the correct tool for the job? If so, what is a good approach to use? If not suitable, what should I consider using instead?
For real-time you don't really want an ELT/ETL tool like ADF. Consider a replication agent like Attunity or (gulp at the licensing costs) GoldenGate.
I don't think Data Factory is not good for you. Yes you can copy data from Oracle to Azure SQL database with it. But like #Thiago Custodio said, we need need to do it to each table you have. That's too complicated.
Just reference: Copy data from and to Oracle by using Azure Data Factory.
As you said, you really need data changes in Oracle to be reflected as near to real-time as possible.
The migration/copy time must be very short. Then the data in Oracle and Azure SQL database could be same before the Oracle data changed next time. I searched a lot and didn't find any real-time copy tools. Actually, I think you want the copy could be something like 'data sync'.
I found this link Sync Oracle Database with SQL Azure, hope it could give some good ideas for you.
About the data migration or copy, You can using bellow ways:
SQL Server Migration Assistant for Oracle (OracleToSQL)
Azure Database Migration Service (DMS)
Reference tutorial:
Migrating Oracle Databases to SQL Server (OracleToSQL): SQL Server Migration Assistant (SSMA) for Oracle is a comprehensive environment that helps you quickly migrate Oracle databases to Azure SQL database.
How to migrate Oracle to Azure SQL Database with minimum downtime:
Hope this helps.
For the record, we went with a product named QLik Replicate (aka Attunity) and it is working very well!

Azure Data Factory architecture with Azure SQL database to Power BI

I'm no MS expert - recently hopped onto the Azure train and apologies in advance if I get some information wrong.
Basically need some input in Azure's architecture utilising Azure Data Factory (as the ETL/ELT tool) and Azure SQL database (as the storage), to a BI output - Power BI. My situation is this;
I have on-premise data sources such as Oracle DB, Oracle Cloud SSAS, MS SQL server db
I'd like to have a MS cloud infrastructure solution for reporting purposes.
No data migration needed - merely pumping on-prem data onto cloud and producing a BI reporting solution
Based on my limited knowledge and Google research, Azure Data Factory caters for all my on-prem sources, as well as the future cloud Azure SQL database. If future analysis is needed, Azure Storage and Azure Databricks can be added in to this architecture. I have sketched out the architecture of my proposed solution.
Just confirming my understanding
Without Azure Storage & Databricks (the 2 pink boxes), the 2 Azure component (DF & SQL database) is sufficient to take data from on-premise sources, process on cloud & output into Power BI.
With Azure Storage & Databricks (the 2 pink boxes), processing will be more efficient as their summarised function is to store training data models & act as an analytics processing engine.
Azure SQL database is more suitable, as compared to Azure SQL datawarehouse as my data sources does not exceed 1TB; cost-wise is cheaper AND one of my data sources contain data from call centers, hence OLTP is more suitable. Plus I have Azure Databricks to support the analytical bit that SQL datawarehouse does (OLAP).
Any other comments to help me understand this whole architecture will be great!
I am a new learner of Azure. I was wondering if we have #Query (value="...") kind or any equivalence for DocumentDb (CosmosDB). Because, the documentDB does not take #Query. I am looking to convert the sql query (From jpa to cosmosDB).
Taking data from on-prem or IaaS sources like SQL on a VM, Oracle etc, requires a Self-Hosted Integration Runtime (SHIR).
Please review the Modern Data Warehouse pattern which sounds similar to what you are proposing.

How to decide between Azure Data Lake vs Azure SQL vs Azure Data Lake Analytics vs Azure SQL VM?

I am new to Azure and hence trying to understand what services to use when and how.
At the moment, I have one excel file that has couple of tabs that require some transformation to create one excel file tab (inside the source file itself - say Tab "x"). The final tab "x" created is then being useful for creating one final excel file that is shared to various team.
At present, everything is done manually.
This needs to change and the excel file shared to team has to be automated. The source of the file is the excel file that has various tabs (excluding tab "x") and the reporting tool will be SSRS with excel data being stored in cloud.
Keeping this scenario in mind, what is the best way to store excel data into cloud? The excel data will be stored in cloud on a monthly basis. I am confused as to whether to store data in Azure-SQL, Azure Data Lake Gen 2 or Azure Data Lake Analytics or Azure SQL VM?
Every month data can be fetched from Excel file and populate into Azure using azure data factory. But I am not sure what is the best way to store data in the cloud considering the fact that some ETL process is needed to generate data in format similar to tab "X".
I think you can think about to using Azure SQL database.
Azure SQL database or SQL server support you import data from the excel( or csv) files. For more details and limits, please see: Import data from Excel to SQL Server or Azure SQL Database.
If your data have stored in Azure SQL database, you also can using EXCEL to get the data from Azure SQL database:
Connect Excel to a single database in Azure SQL Database and import data and create tables and charts based on values in the database. In this tutorial you will set up the connection between Excel and a database table, save the file that stores data and the connection information for Excel, and then create a pivot chart from the database values.
Reference: Import data from Excel to SQL Server or Azure SQL Database.
I think you don't need to store these excel files in Azure Data Lake.Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob storage. It's still a storage.
The more Azure resource you use, the more cost you need to pay.
If your excel file stored in you local computer, you can using Azure Data Factory to access these local files or with self host integration runtime.
Please referenceļ¼š Copy data to or from a file system by using Azure Data Factory.
Hope this helps.
Your storage requirements are very minimal, so I would select Data Lake to store your documents. The alternative is Blob Storage, but I always prefer Data Lake because it works with Azure Active Directory.
In your scenario, drop it in the ADL, and use the ADL as the source in Azure Data Factory.
Edit:
Honestly, your original post is a little confusing. You have a RAW Excel document, you do some transformations on the RAW document, to generate an Excel Source document. This source document holds the final dataset that the dev team will use to build out SSRS reports. You need to make this dataset available to the teams so that they can connect to it to build the reports? My suggestion is to keep it simple and drop the final source dataset in Excel format, into blob or data lake storage and then ask the dev guys to pick it up from the location. If you are going the route of designing and maintaining a data pipeline (Blob > Data Factory > SQL, or CSV, TSV - then you are introducing unnecessary complications.

How can I connect Excel to my Azure SQL database so I can download and update data?

I read here that I can download data from Azure to Excel:
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-connect-excel
But is there a way that I could then update the data in a row and then have that update go back and change the Azure data or can I only do a dump of the data from Azure to Excel one way?
I find a tutorial about Update an SQL Table from Excel.
It says you can use Excel SQL Spreads.
SQL Spreads solves some common data management problems for Microsoft SQL Server. It makes it fast and simple to update an SQL table from an Excel spreadsheet. And it gives you the control you need to manage data entered by various users on a collaborative team.
Summary:
For more details, please reference How to Update an SQL Table from Excel.
I didn't try it, but I think it's useful for you.
Hope this helps.

trigger azure ml experiment from powerbi

I have created an azure ml experiment which fetches data from API and updates it in sql azure database. My power bi report picks data from this database and displays the report. The data from the source is changing frequently. So I need something like a checkbox in power bi which when checked will trigger the azure ml experiment and update the database with latest data.
I know that we can schedule it to run in Rstudio pipeline but we are not thinking of this approach as it is not financially viable.
Thanks in Advance.
You could use a direct query connection from Power BI to your Azure SQL instance. Then the reports in power bi will be always up to date with the latest data you have. Then the only question is when to trigger the ML experiment. If this really needs to be on demand (rather than on a schedule) you could do that in a button in your own App. You could embed the report in your app so that you get an end to end update.
You could have a look at the Azure Data Factory (ADF), that will help you build data pipelines in the cloud.
You can use ADF to read the data from the API (refresh your data), batch-wise-score it in Azure Machine Learning, and push it directly to your Azure SQL making PowerBI always seeing the latest data which will be scored.
Take a look at the following blog where they take data through this kind of pipeline. You just have to change that the data doesn't come from Stream Analytics but from your API.
http://blogs.msdn.com/b/data_insights_global_practice/archive/2015/09/16/event-hubs-stream-analytics-azureml-powerbi-end-to-end-demo-part-i-data-ingestion-and-preparation.aspx

Resources