What are the sources supported in Azure Data Share? - azure

What are the sources/datasets are supported in Azure Data Share?
Does it support?
Blob storage
Azure Files
Queues Storage
Table storage
Disk storage

Looking at step 7 here and from the REST API documentation, I believe currently following sources are supported:
Azure Blob Storage
Azure Data Lake Gen 1
Azure Data Lake Gen 2
I would not be surprised if more data sources are supported down the road considering the service is in preview currently.

Related

Copy Data from Azure Data Lake to SnowFlake without stage using Azure Data Factory

All the Azure Data Factory examples of copying data from Azure Data Lake Gen 2 to SnowFlake use a storage account as stage. If the stage is not configured (as shown in picture), I get this error in Data Factory even when my source is a csv file in Azure data lake - "Direct copying data to Snowflake is only supported when source dataset is DelimitedText, Parquet, JSON with Azure Blob Storage or Amazon S3 linked service, for other dataset or linked service, please enable staging".
At the same time, SnowFlake documentation says the the external stage is optional. How can I copy data from Azure Data Lake to SnowFlake using Data Factory's Copy Data Activity without having an external storage account as stage?
If staging storage is needed to make it work, we shouldn't say that data copy from Data Lake to SnowFlake is supported. It works only when, Data Lake data is is first copied in a storage blob and then to SnowFlake.
Though Snowflake supports blob storage, Data Lake storage Gen2, General purpose v1 & v2 storages, loading data into snowflake is supported- through blob storage only.
The source linked service is Azure Blob storage with shared access signature authentication. If you want to directly copy data from Azure Data Lake Storage Gen2 in the following supported format, you can create an Azure Blob linked service with SAS authentication against your ADLS Gen2 account, to avoid using staged copy to Snowflake.
Select Azure blob storage in linked service, provide SAS URI details of Azure data lake gen2 source file.
Blob storage linked service with data lake gen2 file:
You'll have to configure blob storage and use it as staging. As an alternative you can use external stage. You'll have to create a FILE TYPE and NOTIFICATION INTEGRATION and access the ADLS and load data into Snowflake using copy command. Let me know if you need more help on this.

Uploading Data(csv file) using Azure Functions(Nodejs) To Azure DataLakeGen2

I am currently trying to send a csv file using Azure Function with NodeJs to Azure Data Lake gen2 but unable to do the same, Any suggestions regarding the same would be really helpful.
Thanks.
I have tried to use Credentials of blob storage present in ADLS gen2 using the Blob storage API's but i am getting an error.
For now this could not be implemented with SDK. Please check this known issue:
Blob storage APIs are disabled to prevent feature operability issues that could arise because Blob Storage APIs aren't yet interoperable with Azure Data Lake Gen2 APIs.
And in the table of features, you could find the information about APIs for Data Lake Storage Gen2 storage accounts:
multi-protocol access on Data Lake Storage is currently in public preview. This preview enables you to use Blob APIs in the .NET, Java, Python SDKs with accounts that have a hierarchical namespace. The SDKs don't yet contain APIs that enable you to interact with directories or set access control lists (ACLs). To perform those functions, you can use Data Lake Storage Gen2 REST APIs.
So if you want to implement it, you have to use the REST API:Azure Data Lake Store REST API.

Will Azure Data Lake Analytics support ADLS Gen2?

We have a number of projects based on ADLA+ADLS Gen1 and we recently noticed that prices for Gen1 are not available here any more. Also ADLA isn't listed in the Gen1->Gen2 migration guide.
Googling brought no relief, so seeking for advise and insights re:
Will ADLA support ADLS Gen2?
Will ADLS Gen1 will be discontinued?
Will ADLA be discontinued?
1.As of now, ADLA does not support ADLS Gen2, there is already a user voice here. You can upvote for it.
2.For ADLS Gen1 will be discounted or not, there is a link about this as following:
As of now there are no plans to retire ADLS gen 1. However it's recommended to migrate to Gen2 as most of the latest features and improvements will be rolled out to Gen2.
3.Based on 1, if Microsoft supports 1 as per the user voice, then ADLA will be not retired. But not very sure.
Hope it helps.
Azure Data Lake Storage Gen1 will be retired on Feb 29, 2024. MS are not investing in ADLA.
Customers were recommended migrate Azure Data Lake Storage Gen1 to Azure Data Lake Storage Gen2 and ADLA to an alternate framework such as Azure HDInsight, Azure Synapse Analytics or Azure Databricks.

Read from ADLS gen 2 with SSIS

Does anyone know which connection and Data Flow Component to use for ADLS (Azure Data Lake Store) gen2?
I've managed to use the blob connector in the connection manager and successfully connect to ADLS Gen2, but when I try to use the blob source component I get a 400 bad request. Works fine if it's just a blob storage without HNS.
The ADLS components states it's just for ADLS gen 1.
So how to read and write to/from ADLS Gen 2?
A current version of SSIS Azure Feature Pack supports ADLS Gen2. It can be used as a data source or destination in dataflow:
The screenshot is to show it as a destination, but the ADLSgen2 works well also as a source via corresponding "Flexible File Destination" and "Flexible File Source"
First of all, based on the great link provided by #rickvdbosch it looks like that there are many temporary limitations with Azure Data Lake Storage Gen2 concerning the BLOB Storage API. Which means that it is not a component limitation and maybe you should wait until it will be integrated with SSIS.
Microsoft SQL SERVER Feature pack for Azure
If you meant these components when you mentioned that:
The ADLS components states it's just for ADLS gen 1.
Then ignore this part.
I am not pretty sure if it supports Gen2, but I think you can use the Azure Data Lake Store components which are a part of the Microsoft SQL SERVER feature pack for Azure. For more information you can refer to:
Azure Data Lake Store in SSIS
Azure Data Lake Store Source
Azure Data Lake Store Destination
Download Link
Azure Feature Pack for Integration Services (SSIS)
Other methods
If the suggestion above didn't work then you should use Azure Data Factory or a command line by Installing AWS CLI and using AzCopy v10
I got the following info:
"At the moment Gen 2 don’t support BLOB API (but it will in a short time) and hence, SSIS is not able to connect."
So for SSIS it's currently either ADLS Gen 1, or blob store
I used the Script Task to write files or System.Objects (converted to csv in Memory) to Azure Storage Gen 2 (Hierarchical Namespace Enabled) using the Rest API. I did this as a demo until the SSIS components are released.
You can't write to ADLS Gen2 using the old components from the Azure Feature Pack, but you can connect to the blob Gen2 (non-hierarchical) using the Azure Blob Destination Component.

Difference between Azure Data Lake Storage x Azure Blob Storage and Azure File Storage

I have a question about the use cases of the different Azure storage services:
Azure Data Lake Storage.
Azure Blob Storage.
Azure File Storage.
what is the difference between these services? and when to use them since they all provide the same functionality (storage) on Azure's cloud platform.
You can take a look at this article: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage
I'd say main differences between Data Lake and Azure Storage Blob is scale and permissions model.
It really makes no sense to paste the whole article here. But you might want to look at Data Lake v2, which (as MS claims) is a mesh or Data Lake v1 and Azure Storage Blob: https://learn.microsoft.com/azure/storage/blobs/data-lake-storage-introduction
As for the Azure File storage its just an SMB share over HTTPS (and it not really fast due to being 1 "stream" only).

Resources