Azure Databricks, how to auto download csv into local network drives? - azure

My job currently implemented Azure Databricks. Is it possible to have my dataframes be automatically downloaded as csv to a local network drive path on a recurring basis?
For example, our company have recurring reports and was hoping I could automate this by creating the dataframe in databricks and somehow have azure download the csv into a specific path in our company network folder. Would this be possible?
FYI, I understand i could save the csv file to filestore (dbfs), but the main problem is..how can I or azure have the csv be AUTO-populated into our company network on a recurring basis?

Write the file to blob storage, or a data lake rather than dbfs.
Use Azure Data Factory to run the notebook and then copy the output file to your in prem network.
You will need an integration runtime to be installed somewhere in your network for the file copy to access your network.

Related

azureml register datastore file share or blob storage

I have a folder called data with a bunch of csvs (about 80), file sizes are fairly small. This data is clean and has already been preprocessed. I want to upload this data folder and register as a datastore in azureml. Which would be best for this scenario data store created with file share or a data store created with blob storage?
AFAIK, based on your scenario you can make use of Azure File Share to create data store.
Please note that, Azure Blob storage is suitable for uploading large amount of unstructured data whereas Azure File Share is suitable for uploading and processing the structured files in chunks (more interaction with app to share files).
I have a folder called data with a bunch of csvs (about 80), file sizes are fairly small. This data is clean and has already been preprocessed.
As you mentioned CSV data is clean and preprocessed it comes under structured data. So, you can make you of Azure File Share to create data store.
To register a data store with Azure File Share you can make use of this MsDoc
To know more about Azure File Share and Azure Blob storage, please find below links:
Azure Blob Storage or Azure File Storage by Mike
azureml.data.azure_storage_datastore.AzureFileDatastore class - Azure Machine Learning Python | Microsoft Docs

Ingest Data From On-Premise SFTP Folder To Azure SQL Database (Azure Data Factory)

Usecase: I have data files of varying size copied to a specific SFTP folder periodically (Daily/Weekly). All these files needs to be validated and processed. Then write them to related tables in Azure SQL. Files are of CSV format and are actually a flat text file which directly corresponds to a specific Table in Azure SQL.
Implementation:
Planning to use Azure Data Factory. So far, from my reading I could see that I can have a Copy pipeline in-order to copy the data from On-Prem SFTP to Azure Blob storage. As well, we can have SSIS pipeline to copy data from On-Premise SQL Server to Azure SQL.
But I don't see a existing solution to achieve what I am looking for. can someone provide some insight on how can I achieve the same?
I would try to use Data Factory with a Data Flow to validate/process the files (if possible for your case). If the validation is too complex/depends on other components, then I would use functions and put the resulting files to blob. The copy activity is also able to import the resulting CSV files to SQL server.
You can create a pipeline that does the following:
Copy data - Copy Files from SFTP to Blob Storage
Do Data processing/validation via Data Flow
and sink them directly to SQL table (via Data Flow sink)
Of course, you need an integration runtime, that can access the on-prem server - either by using VNet integration or by using the self hosted IR. (If it is not publicly accessible)

Moving data from Teradata to Snowflake

Trying to move data from Teradata to Snowflake. Have created a process to run TPT scripts for each table to generate files for each table.
Files are also split to achieve concurrency while running COPY INTO in snowflake.
Need to understand what is the best way to move those Files from On Prem Linux Machine to Azure ADLS. Considering files in Terabyte size.
Does Azure provide any mechanism to move these files or can we directly create files on ADLS from Teradata?
The best approach to load data to snowflake via external table if you have the Azure Blob Storage or ADLS Gen2. Load data to blob storage and create external table and then load data data to snowflake.

Upload to Azure Storage container from FileServer using Azure databricks

I want to upload binary files from Windows FileSystem to Azure blob. I achieved it with Azure data factory with the below steps
Installed integration run time on the FileSystem
Created a linked service in ADF as FileSystem
Created a binary dataset with the above linked service
Use CopyData activity in a ADF Pipeline, set the binary dataset as source and Azure Blob as Sink
Post upload, I am performing some ETL activities. So my ADF pipeline has two components,
Copy Data
Databricks Notebook
I am wondering if I could move the Copy Data fragment to Databricks?
Can we upload binary files from Windows FileSystem to Azure blob using Azure Databricks?
I think it is possible, you may have to do network changes
https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/on-prem-network

SSIS Connector for Azure File Storage

I have a directory on a local machine that holds various source files containing data that I need to load into an Azure SQL Server instance. The source files are in a variety of formats including xlsx, xls, csv, txt, dat. I built a solution a while back that transforms and loads these files into a local sql server and ssis instance (developers edition).
Now that development has concluded I would like to deploy the db and packages to Azure. With an Azure account I created SQL Server and SSIS instances, then I created a file storage account in Azure and copied the source file directory into the file store. My intention was that I would be able to simply take the old solution, change the sources from local files to an Azure data lake store source and the destinations from local db to the Azure Sql instance.
However, I am having a lot of complications with active directory authentication and it also appears that the datalake and blob source tools in SSIS only work for text and azro files. Is there not a ways for SSIS to easily access files in an Azure file store?

Resources