How to decide between Azure Data Lake vs Azure SQL vs Azure Data Lake Analytics vs Azure SQL VM? - excel

I am new to Azure and hence trying to understand what services to use when and how.
At the moment, I have one excel file that has couple of tabs that require some transformation to create one excel file tab (inside the source file itself - say Tab "x"). The final tab "x" created is then being useful for creating one final excel file that is shared to various team.
At present, everything is done manually.
This needs to change and the excel file shared to team has to be automated. The source of the file is the excel file that has various tabs (excluding tab "x") and the reporting tool will be SSRS with excel data being stored in cloud.
Keeping this scenario in mind, what is the best way to store excel data into cloud? The excel data will be stored in cloud on a monthly basis. I am confused as to whether to store data in Azure-SQL, Azure Data Lake Gen 2 or Azure Data Lake Analytics or Azure SQL VM?
Every month data can be fetched from Excel file and populate into Azure using azure data factory. But I am not sure what is the best way to store data in the cloud considering the fact that some ETL process is needed to generate data in format similar to tab "X".

I think you can think about to using Azure SQL database.
Azure SQL database or SQL server support you import data from the excel( or csv) files. For more details and limits, please see: Import data from Excel to SQL Server or Azure SQL Database.
If your data have stored in Azure SQL database, you also can using EXCEL to get the data from Azure SQL database:
Connect Excel to a single database in Azure SQL Database and import data and create tables and charts based on values in the database. In this tutorial you will set up the connection between Excel and a database table, save the file that stores data and the connection information for Excel, and then create a pivot chart from the database values.
Reference: Import data from Excel to SQL Server or Azure SQL Database.
I think you don't need to store these excel files in Azure Data Lake.Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob storage. It's still a storage.
The more Azure resource you use, the more cost you need to pay.
If your excel file stored in you local computer, you can using Azure Data Factory to access these local files or with self host integration runtime.
Please referenceļ¼š Copy data to or from a file system by using Azure Data Factory.
Hope this helps.

Your storage requirements are very minimal, so I would select Data Lake to store your documents. The alternative is Blob Storage, but I always prefer Data Lake because it works with Azure Active Directory.
In your scenario, drop it in the ADL, and use the ADL as the source in Azure Data Factory.
Edit:
Honestly, your original post is a little confusing. You have a RAW Excel document, you do some transformations on the RAW document, to generate an Excel Source document. This source document holds the final dataset that the dev team will use to build out SSRS reports. You need to make this dataset available to the teams so that they can connect to it to build the reports? My suggestion is to keep it simple and drop the final source dataset in Excel format, into blob or data lake storage and then ask the dev guys to pick it up from the location. If you are going the route of designing and maintaining a data pipeline (Blob > Data Factory > SQL, or CSV, TSV - then you are introducing unnecessary complications.

Related

Ingest Data From On-Premise SFTP Folder To Azure SQL Database (Azure Data Factory)

Usecase: I have data files of varying size copied to a specific SFTP folder periodically (Daily/Weekly). All these files needs to be validated and processed. Then write them to related tables in Azure SQL. Files are of CSV format and are actually a flat text file which directly corresponds to a specific Table in Azure SQL.
Implementation:
Planning to use Azure Data Factory. So far, from my reading I could see that I can have a Copy pipeline in-order to copy the data from On-Prem SFTP to Azure Blob storage. As well, we can have SSIS pipeline to copy data from On-Premise SQL Server to Azure SQL.
But I don't see a existing solution to achieve what I am looking for. can someone provide some insight on how can I achieve the same?
I would try to use Data Factory with a Data Flow to validate/process the files (if possible for your case). If the validation is too complex/depends on other components, then I would use functions and put the resulting files to blob. The copy activity is also able to import the resulting CSV files to SQL server.
You can create a pipeline that does the following:
Copy data - Copy Files from SFTP to Blob Storage
Do Data processing/validation via Data Flow
and sink them directly to SQL table (via Data Flow sink)
Of course, you need an integration runtime, that can access the on-prem server - either by using VNet integration or by using the self hosted IR. (If it is not publicly accessible)

Azure Data Lake Excel Export To CSV as Same Folder / Path

have you ever made an azure data convert Azure Data Lake excel conversion to CSV file.
first, I have tried using SSIS with Azure Data Lake Source, but when Mapping is not possible, the choice is to add text.
second, says try using azure apps logic with create CSV table but the csv that comes out is only the structure in that folder
Thank you in advance
There is not a built-in way to extract from excel file in Azure data lake. I would suggest you to try one of the below approaches:
Write Custom .NET library for converting Excel to CSV and deploy that to Azure Data Lake Analytics. Azure Data Lake Analytics Programming Guide
Write a custom .NET activity in Azure Data Factory to do this. Custom Activities in Azure Data Factory
Use Azure Functions and Open XML do this activity as detailed in the stack overflow post
Use SSIS Package to do the conversion. You can have SSIS Runtimes in Azure Data Factory. SSIS packages running in Azure Data Factory
As I know about Azure, These isn't any way can help convert the excel file to csv directly.
You could follow these steps:
Download the excel file to you computer.
Import the excel file to the you SQL database.
Then export the table data as CSV file to you Blob Storage.
You could reference this document:
Import data from Excel to SQL Server or Azure SQL Database
Connect to Azure Blob Storage (SQL Server Import and Export
Wizard)
Hope this helps.

How to truncate Dynamics 365 entities with Data Factory (and copy to Azure data lake)?

I am currently using a Data Factory to copy entities from Dynamics 365 in bulk to an Azure Data Lake. The entities are saved as CSV files in the Data Lake every 24 hours.
Instead of bulk copying, I would like to truncate entities to new data and append to the files that already exist in the data lake.
I think this is a common operation for SQL databases, but can this be done between Dynamics 365 and a Data Lake?
You could add a filter to your queries to get those records that have been modified within the last 24 hours.
Additionally you can setup Dynamics to replicate its data to an external SQL database.
Replicate data to Azure SQL Database
Azure Data Lake storage Gen2 as a source type only support three Copy behaviors.
I tried the three cope behaviors, they all could not help you append to the files that already exist in the data lake. If you choose the exist file, when the copy active completed, the exist file will be overwrite.
Fore more details, you can reference: Azure Data Lake storage Gen2 as a source type.
It can not be done between between Dynamics 365 and a Data Lake with Azure Data Factory.
Thanks for James Wood provided a good solution for us. And Combine my answer and his , the problem will be solved.
Hope this helps.

Best way to extract data from Azure Data Lake to SQL Server

I am looking for a best programmatic way to extract data from Azure Data Lake to MSSQL database, which is installed on a VM within Azure.
Currently I am considering following options:
Azure Data Factory
SSIS (Using Azure Data Lake Store Connection Manager)
User-Defined Outputter Example1, Example2
Custom C# code that reads Azure Data Lake data and inserts it into SQL Server DB
Any other good ways I am missing?
Data factory v2 (currently in public preview), also supports hosting SSIS to give you a data factory AND ssis option.
And not necessarily a good idea for many scenarios, but Azure Logic Apps has both a data lake store connector and SQL Server connector, which could be useful in scenarios such as writing lots of small files on a schedule or trigger.
You also may not need to go full on c# and instead use PowerShell, there are powershell modules for both data lake store and sql server.

Bulk upload Excel to SQL Azure daily

I have a requirement to bulk upload data from a excel file to an Azure SQL table on a daily basis. I did some research and found that we could create a VM install full SQL and use SSIS package to do this.
Is there any other reliable way to go about this? The excel may contain up to 10,000 rows.
I have also read we could upload file to a blob storage and read from there but found it's not very robust approach.
Can anyone suggest if this is feasible approach-
Place excel file in Azure Website accessed via FTP
Azure Timer job using SQL Bulk copy code to update the SQL table
Any help would be highly appreciated!
You could use Azure Data Factory - check out the documentation here. Place your files in Azure Data Lake and the ADF will process them.

Resources