Tableau + Azure Data Lake Gen2 multiple files or folder - azure

I am able to see my containers in an Azure Data Lake Storage Gen2 endpoint with signing in the AD.
I am able to see the files and select 1 singular file while browsing but the question is - is there an ability to select the folder of the container and bring in every single file from that container to build my dataset if they are all the same.
Or do I require something like an external table in Azure Data Explorer of some some sort?

Just drag the first file from your collection into the data pane
then right-click and use the edit union option

Related

ingest multiple json files from ADLS gen2 to ADX through ADF

I'm new to both of ADF(Azure Data Factory) and ADX(Azure Data Explorer).
I have multiple Json files in ADLS in different folder level, and I need to ingest all the files into ADX.
ex) UserData/Overground/UsersFolder/project1/main/data/json/demo-02/2021/01/28/03/demo-02-2021-01-28-03-30.json
UserData/Overground/UsersFolder/project1/main/data/json/demo-02/2021/01/28/04/demo-02-2021-01-28-03-30.json
UserData/Overground/UsersFolder/project1/main/data/json/demo-02/2021/01/29/03/demo-02-2021-01-28-03-30.json
UserData/Overground/UsersFolder/project1/main/data/json/demo-02/2021/02/23/03/demo-02-2021-01-28-03-30.json
I'm just wondering if I need to create as many tables in ADX as the number of the Json files in ADLS.. so if I have 1000 Json files in ADLS, should I create 1000 tables in ADX to copy the data from adls to adx?
and how could I copy the data from adls to adx in ADF?
Appreciate your help in advance
To copy from multiple folders, you can use Additional settings of Copy activity Source. For more information follow this official document. You may need to use wildcard for multiple files.
Additional settings:
recursive Indicates whether the data is read recursively from the subfolders or only from the specified folder. Note that when recursive is set to true and the sink is a file-based store, an empty folder or subfolder isn't copied or created at the sink.
Allowed values are true (default) and false.
This property doesn't apply when you configure fileListPath.
Also refer Azure Data Explorer as Sink
Azure Data Explorer is supported as a source, where data is copied from Azure Data Explorer to any supported data store, and a sink, where data is copied from any supported data store to Azure Data Explorer. Integrate Azure Data Explorer with Azure Data Factory

Ingest Data From On-Premise SFTP Folder To Azure SQL Database (Azure Data Factory)

Usecase: I have data files of varying size copied to a specific SFTP folder periodically (Daily/Weekly). All these files needs to be validated and processed. Then write them to related tables in Azure SQL. Files are of CSV format and are actually a flat text file which directly corresponds to a specific Table in Azure SQL.
Implementation:
Planning to use Azure Data Factory. So far, from my reading I could see that I can have a Copy pipeline in-order to copy the data from On-Prem SFTP to Azure Blob storage. As well, we can have SSIS pipeline to copy data from On-Premise SQL Server to Azure SQL.
But I don't see a existing solution to achieve what I am looking for. can someone provide some insight on how can I achieve the same?
I would try to use Data Factory with a Data Flow to validate/process the files (if possible for your case). If the validation is too complex/depends on other components, then I would use functions and put the resulting files to blob. The copy activity is also able to import the resulting CSV files to SQL server.
You can create a pipeline that does the following:
Copy data - Copy Files from SFTP to Blob Storage
Do Data processing/validation via Data Flow
and sink them directly to SQL table (via Data Flow sink)
Of course, you need an integration runtime, that can access the on-prem server - either by using VNet integration or by using the self hosted IR. (If it is not publicly accessible)

How to decide between Azure Data Lake vs Azure SQL vs Azure Data Lake Analytics vs Azure SQL VM?

I am new to Azure and hence trying to understand what services to use when and how.
At the moment, I have one excel file that has couple of tabs that require some transformation to create one excel file tab (inside the source file itself - say Tab "x"). The final tab "x" created is then being useful for creating one final excel file that is shared to various team.
At present, everything is done manually.
This needs to change and the excel file shared to team has to be automated. The source of the file is the excel file that has various tabs (excluding tab "x") and the reporting tool will be SSRS with excel data being stored in cloud.
Keeping this scenario in mind, what is the best way to store excel data into cloud? The excel data will be stored in cloud on a monthly basis. I am confused as to whether to store data in Azure-SQL, Azure Data Lake Gen 2 or Azure Data Lake Analytics or Azure SQL VM?
Every month data can be fetched from Excel file and populate into Azure using azure data factory. But I am not sure what is the best way to store data in the cloud considering the fact that some ETL process is needed to generate data in format similar to tab "X".
I think you can think about to using Azure SQL database.
Azure SQL database or SQL server support you import data from the excel( or csv) files. For more details and limits, please see: Import data from Excel to SQL Server or Azure SQL Database.
If your data have stored in Azure SQL database, you also can using EXCEL to get the data from Azure SQL database:
Connect Excel to a single database in Azure SQL Database and import data and create tables and charts based on values in the database. In this tutorial you will set up the connection between Excel and a database table, save the file that stores data and the connection information for Excel, and then create a pivot chart from the database values.
Reference: Import data from Excel to SQL Server or Azure SQL Database.
I think you don't need to store these excel files in Azure Data Lake.Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob storage. It's still a storage.
The more Azure resource you use, the more cost you need to pay.
If your excel file stored in you local computer, you can using Azure Data Factory to access these local files or with self host integration runtime.
Please referenceļ¼š Copy data to or from a file system by using Azure Data Factory.
Hope this helps.
Your storage requirements are very minimal, so I would select Data Lake to store your documents. The alternative is Blob Storage, but I always prefer Data Lake because it works with Azure Active Directory.
In your scenario, drop it in the ADL, and use the ADL as the source in Azure Data Factory.
Edit:
Honestly, your original post is a little confusing. You have a RAW Excel document, you do some transformations on the RAW document, to generate an Excel Source document. This source document holds the final dataset that the dev team will use to build out SSRS reports. You need to make this dataset available to the teams so that they can connect to it to build the reports? My suggestion is to keep it simple and drop the final source dataset in Excel format, into blob or data lake storage and then ask the dev guys to pick it up from the location. If you are going the route of designing and maintaining a data pipeline (Blob > Data Factory > SQL, or CSV, TSV - then you are introducing unnecessary complications.

Azure Lake to Lake transfer of files

My company has two Azure environments. The first one was a temporary environment and is being re-purposed / decommissioned / I'm not sure. All I know is I need to get files from one Data Lake on one environment, to a DataLake on another. I've looked at adlcopy and azcopy and neither seem like they will do what I need done. Has anyone encountered this before and if so, what did you use to solve it?
Maybe you can think about Azure Data Factory, it can helps you transfer files or data from one Azure Data Lake to Another Data Lake.
You can reference Copy data to or from Azure Data Lake Storage Gen2 using Azure Data Factory.
This article outlines how to use Copy Activity in Azure Data Factory to copy data to and from Data Lake Storage Gen2. It builds on the Copy Activity overview article that presents a general overview of Copy Activity.
For example, you can learn from this tutorial: Quickstart: Use the Copy Data tool to copy data.
In this quickstart, you use the Azure portal to create a data factory. Then, you use the Copy Data tool to create a pipeline that copies data from a folder in Azure Blob storage to another folder.
Hope this helps.

Copy File/Folders in Azure Data Lake Gen1

In Azure Data Lake Storage Gen1 I can see the folder structure, See folders and files etc.
I can preform actions on the files like renaming them/Deleting them and more
One operation that is missing in the Azure portal and in other means is the option to create a copy of a folder or a file
I have tried to do it using PowerShell and using the portal itself
and it seems that this option is not available
Is there a reason for that?
Are there any other options to copy a folder in Data-lake?
The data-lake storage is used as part of an HDInsight cluster
You can use Azure Storage Explorer to copy files and folders.
Open Storage Explorer.
In the left pane, expand Local and Attached.
Right-click Data Lake Store, and - from the context menu - select Connect to Data Lake Store....
Enter the Uri, then the tool navigates to the location of the URL you just entered.
Select the file/folder you want to copy.
Navigate to your desired destination.
Click Paste.
Other options for copying files and folders in a data lake include:
Azure Data Factory
AdlCopy (command line tool)
My suggestion is to use Azure Data Factory (ADF). It is the fastest way, if you want to copy large files or folders.
Based on my experience 10GB files will be copied for approximately in 1 min 20 sec.
You just need to create simple pipeline with one data store, which will be used as source and destination data store.
Using Azure Storage Explorer (ASE) for copy large files is to slow, 1GB more than 10 min.
Copying files with ASE is the most similar operation as in most file explorer (Copy/Paste) unlike ADF copying which requires create pipeline.
I think create simple pipeline is worth effort, especially because pipeline can be reused for copying another files or folders, with minimal editing.
I agree with the above comment, you can use ADF to copy the file. Just you need to look that it doesn't add up your costs. Microsoft Azure Storage Explorer (MASE) is also a good option to copy blob.
If you have very big files, then below option is more faster:
AzCopy:
Download a single file from blob to local directory:
AzCopy /Source:https://<StorageAccountName>.blob.core.windows.net/<BlobFolderName(if any)> /Dest:C:\ABC /SourceKey:<BlobAccessKey> /Pattern:"<fileName>"
If you are using the Azure Data Lake Store with HDInsight another very performant option is using the native hadoop file system commands like hdfs dfs -cp or if you want to copy a large number of files distcp. So for example:
hadoop distcp adl://<data_lake_storage_gen1_account>.azuredatalakestore.net:443/sourcefolder adl://<data_lake_storage_gen1_account>.azuredatalakestore.net:443/targetfolder
This is also a good option, if you are using multiple storage accounts. See also the documentation.

Resources