Copy files from on-prem to azure - azure

I'm new to Azure eco system. I'm doing some research on copying data from on-prem to azure. I found following options:
AzCopy
Azure Data Factory (Copy Data Tool)
Data Management Gateway
Ours is a Microsoft shop; so, I'm looking for tools that gel with MS platform. Also, down the line, we want to automate the entire thing as much as we can. So, I think, Azure Storage Explorer is out of the question. Is there a preference among the above 3. Or, are there any better tools?

I think you are mixing stuff, Copy Data Tool is just an Azure Data Factory Wizard to make some sample data moving between resources. Azure Data Factory uses the data management gateway to get on premises resources such as files and databases.
What you want to do can be made with Azure Data Factory. I recommend using version 2 (even in its preview version) because its Authoring is easier to understand if you are new to the tool. You can graphically configure linked services, datasets and pipelines from there.
I hope this helped, if you need further help just ask away!

If you're already familiar with SSIS, there's also the option to use SSIS in ADF that enables on-prem data access via VNet.

Related

Filesystem SDK vs Azure Data Factory

I'm very new to the Azure Data Lake Storage and currently training on Data Factory. I have a developer background so right away I'm not a fan of the 'tools' approach for development. I really don't like how there's all these settings to set and objects you have to create everywhere. I much prefer a code approach which allows us to detach the logic from the service (don't like the publishing thing to save), see everything by scrolling or navigate to different objects in a project, see differences easier in source control and etc. So I found this Micrososft's Filesystem SDK that seems to be an alternative to Data Factory:
https://azure.microsoft.com/en-us/blog/filesystem-sdks-for-azure-data-lake-storage-gen2-now-generally-available/
What has been your experience using this approach? Is this a good alternative? Is there a way to run SDK code in data factory? that way we can leverage scheduling and triggers? I guess i'm looking for Pros/cons.
thank you
Well, the docs refer to several SDKs, one of them being the .Net SDK and the title is
Use .NET (or Python or Java etc.) to manage directories, files, and ACLs in Azure Data Lake Storage Gen2
So, the SDK lets you manage the filesystem only. No support for triggers, pipelines, dataflows and the lot. You will have to stick to the Azure Data Factory for that.
Regarding this:
I'm not a fan of the 'tools' approach for development
I hate to tell you but the world is moving that way whether you like it or not. Take Logic Apps for example. Azure Data Factory isn't aimed at the hardcore developer but fulfils a need for people working with large sets of data like Data Engineers. I am already glad it integrates with git very well. Yes, there is some overhead in defining sinks and sources but they are reusable across pipelines.
If you really want to use code try Azure Databricks. Take a look at this Q&A as well.
TL;DR:
The FileSystem SDK is not an alternative.
The code-centric alternative to Azure Data Factory for building and managing your Azure Data Lake is Spark. Typically either Azure Databricks or Azure Synapse Spark.

What is the best way to automate creating SQL .bak or .bacpac backup files and saving them to an Azure cloud storage container?

Currently I am tasked with researching a solution to easily copying data from one environment to another (QA to DEV for example) as well as having the flexibility of going to different times to compare our data. It is an easy task to do locally with SSMS and I am looking for the best ways to do it using Azure and it's tools.
These are the options that I found so far:
Backup Service and Backup Vault (The MS solution that I am not asking for. They don't generate .bak files)
Azure Function to execute generate and transfer SQL (flexible but the code needs to be maintained + manage authentication)
Powershell process with Azure Automate (Flexible too but needs to be maintained)
Datafactory/SSIS (Still learning and researching)
Anyone got any tools/methods that are worth looking into before I dive deeper with a solution?
For Azure SQL database, SQL Data Sync is one of feature for the data sync between Azure SQL and SQL server(on-premise). Some limits are that Azure SQL database must be hub and each must have a primary key. That may not suit you.
Per my experience, Data Factory is the best one for you. You can copy the data between different environment, in Sink settings, we can using upsert(insert or update) operation to sync the data.
If you only want to schedule the backup automatically for the SQL, the third-part tool also could feed your request: SQL Backup and FTP.
Since you have searched a lot and found almost all the options in Azure, all the ways can achieve that. You need to know your real request, data sync or auto backup create the .bacpac file to storage. That's not a good question to help you find the best way. The way you like, the way is the best.
I went with writing an Azure Automate powershell script. including cmdlts like New-AzureRmSqlDatabaseExport and passing in the parameters was ticky but it finally did the job.

What would be best technology for extracting and parsing a file

I'm pretty new to Azure, and wanted some direction regarding my needs. I have this flat file from a provider, hosted in their FTP server. I need to retreive it, extract data from the file, store results, etc.
What feature on Azure would you recommend?
It depends, but I recommend the usage of Azure Data Factory as it's the default ETL solution available on Azure.
Other alternatives:
Azure Logic Apps
Azure Functions (with time trigger)

Connecting Qlikview to Azure Data Lake Store Gen 1

I am working on a new requirement and need to connect Qliksense and Qlikview to Azure Data Lake Store Gen 1. I search a lot about it but didn't find any useful information to claim the connectivity betweem Qlik and Microsoft Azure Data lake.
We are deploying Qlikview to to Azure and received the following assistance from Microsoft so hopefully it leads you in the correct direction:
https://help.qlik.com/en-US/sense/September2018/Subsystems/PlanningQlikSenseDeployments/Content/Sense_Deployment/Azure-architecture.htm
Azure deployment
In a Microsoft Azure deployment, you install Qlik Sense Enterprise on a Azure cloud infrastructure that is flexible, high performance, and is quick to set up.
Deploying Qlik Sense Enterprise on Azure will enable you to quickly add new applications in a simple, and scalable manner. You can do this with a basic knowledge of Azure security and scalability options but without the need to follow complex on-premise installation and configuration procedures. Using Azure will enable you to get your Qlik Sense infrastructure up and running in fraction of the time required for an on-premise deployment, and will enable you to scale your deployment quickly and easily, regardless of unexpected changes in demand.
You can deploy Qlik Sense to Azure manually, or you can use an Virtual Hard Disk (VHD) available in the Azure Marketplace that includes Qlik Sense preinstalled. However, predefined images do not include a file share, so can only support single node Qlik Sense deployments.

Azure Data Factory - moving data from On-Premise SQL to Azure SQL

A simple question: Can this be achieved directly? I mean without the Azure blob storage in between (as showed in all the examples)? Can someone provide some code example please.
yes, you can do this directly. In fact, you can do direct copies from any of our supported sources/sinks, you don't have to pass through blob. To go from on-prem SQL Server-->SQL azure, you will need to setup a Data Management Gateway connector on your on-prem server. Then, you use a linked service of type AzureStorage and an output dataset of type AzureSQLTable as the output dataset, instead of AzureBlob as is shown in the example. The exact steps to setup the DMG and the JSON code for the linked services, datasets, and pipelines can be found in our documentation. We are also improving our UI in the near future to make these kinds of copy setups an easy code-free experience.
https://azure.microsoft.com/en-us/documentation/articles/data-factory-sqlserver-connector/

Resources