Uploading Data(csv file) using Azure Functions(Nodejs) To Azure DataLakeGen2 - node.js

I am currently trying to send a csv file using Azure Function with NodeJs to Azure Data Lake gen2 but unable to do the same, Any suggestions regarding the same would be really helpful.
Thanks.
I have tried to use Credentials of blob storage present in ADLS gen2 using the Blob storage API's but i am getting an error.

For now this could not be implemented with SDK. Please check this known issue:
Blob storage APIs are disabled to prevent feature operability issues that could arise because Blob Storage APIs aren't yet interoperable with Azure Data Lake Gen2 APIs.
And in the table of features, you could find the information about APIs for Data Lake Storage Gen2 storage accounts:
multi-protocol access on Data Lake Storage is currently in public preview. This preview enables you to use Blob APIs in the .NET, Java, Python SDKs with accounts that have a hierarchical namespace. The SDKs don't yet contain APIs that enable you to interact with directories or set access control lists (ACLs). To perform those functions, you can use Data Lake Storage Gen2 REST APIs.
So if you want to implement it, you have to use the REST API:Azure Data Lake Store REST API.

Related

Copy files to ADLS gen2 from mobile phone

I have few files regularly created on my mobile phone. How can I upload these files on my ADLS gen2 storage account. I generally use azcopy to copy, but how can it be done on android phones
Is there a upload file rest api for ADLS gen2 or any other SDK?
Yes, as #GeorgeChen 's comment said. By now as I known, there is not any SDK for Azure Data Lake Storage Gen2, so the only solution is to use its REST APIs.
There is a very similar SO thread Upload data to the Azure ADLS Gen2 from on-premise using Python or Java which you can refer to, my answer for it to post the Python script which defines 7 functions to help using REST APIs include auth, mkfs, mkdir, touch_file, append_file, flush_file and mkfile.
For using Java, you can refer to my code in Python to write your Java code with okhttp.
Update: I reviewed Azure offical documents and searched the offical GitHub repos for ADLS Gen2, there is a public preview version of ADLS Gen2 SDK named Azure File Data Lake client library for Java. I see it default used the Netty HTTP client, but you can use OkHTTP as the Alternate HTTP client as the content of README said, so I think you can try to use it with the alternate HTTP client OkHTTP for Android.

Using service principal to access blob storage from Databricks

I followed Access an Azure Data Lake Storage Gen2 account directly with OAuth 2.0 using the Service Principal and want to achieve the same but with blob storage general purpose v2 (with hierarchical fs disabled). Is it possible to get this working, or authenticating using access key or SAS is the only way?
No that is not possible as of now. OAuth Bearer Token is supported for Azure Data Lake Storage Gen2 (with the hierarchical namespace enabled when creating the storage account). To access Azure Data Lake Store Gen2 the ABFS-driver is used:
abfss://<your-file-system-name>#<your-storage-account-name>.dfs.core.windows.net/
To access the Blob Storage you use WASB:
wasbs://<your-container-name>#<your-storage-account-name>.blob.core.windows.net
only supporting token based access.

When Will Azure ADLS Gen 2 SDK Be Released?

It seems like the SDKs for Data Lake Storage Gen2 are not available now. Are there other ways / workarounds?
This seems like a questions many others also have: https://github.com/MicrosoftDocs/azure-docs/issues/22913
Any news about an SDK for gen2 datalake?
According to the known issues about ADLS GEN2:
You can use Data Lake Storage Gen2 REST APIs, but APIs in other Blob
SDKs such as the .NET, Java, Python SDKs are not yet available.
So,you could use it by REST API, there are some threads for you reference:
1.https://social.msdn.microsoft.com/Forums/en-US/45be0931-379d-4252-9d20-164261cc64c5/error-while-calling-adls-gen-2-rest-api-to-create-file?forum=AzureDataLake
2.https://social.msdn.microsoft.com/Forums/azure/en-US/dc102604-bdb7-47be-8de4-dc47a42e31a4/azure-data-lake-gen2-rest-api?forum=AzureDataLake
To push the progress of sdk, you could submit your feedback here so that azure team will leave the latest comments.

Read from ADLS gen 2 with SSIS

Does anyone know which connection and Data Flow Component to use for ADLS (Azure Data Lake Store) gen2?
I've managed to use the blob connector in the connection manager and successfully connect to ADLS Gen2, but when I try to use the blob source component I get a 400 bad request. Works fine if it's just a blob storage without HNS.
The ADLS components states it's just for ADLS gen 1.
So how to read and write to/from ADLS Gen 2?
A current version of SSIS Azure Feature Pack supports ADLS Gen2. It can be used as a data source or destination in dataflow:
The screenshot is to show it as a destination, but the ADLSgen2 works well also as a source via corresponding "Flexible File Destination" and "Flexible File Source"
First of all, based on the great link provided by #rickvdbosch it looks like that there are many temporary limitations with Azure Data Lake Storage Gen2 concerning the BLOB Storage API. Which means that it is not a component limitation and maybe you should wait until it will be integrated with SSIS.
Microsoft SQL SERVER Feature pack for Azure
If you meant these components when you mentioned that:
The ADLS components states it's just for ADLS gen 1.
Then ignore this part.
I am not pretty sure if it supports Gen2, but I think you can use the Azure Data Lake Store components which are a part of the Microsoft SQL SERVER feature pack for Azure. For more information you can refer to:
Azure Data Lake Store in SSIS
Azure Data Lake Store Source
Azure Data Lake Store Destination
Download Link
Azure Feature Pack for Integration Services (SSIS)
Other methods
If the suggestion above didn't work then you should use Azure Data Factory or a command line by Installing AWS CLI and using AzCopy v10
I got the following info:
"At the moment Gen 2 don’t support BLOB API (but it will in a short time) and hence, SSIS is not able to connect."
So for SSIS it's currently either ADLS Gen 1, or blob store
I used the Script Task to write files or System.Objects (converted to csv in Memory) to Azure Storage Gen 2 (Hierarchical Namespace Enabled) using the Rest API. I did this as a demo until the SSIS components are released.
You can't write to ADLS Gen2 using the old components from the Azure Feature Pack, but you can connect to the blob Gen2 (non-hierarchical) using the Azure Blob Destination Component.

Difference between Azure Data Lake Storage x Azure Blob Storage and Azure File Storage

I have a question about the use cases of the different Azure storage services:
Azure Data Lake Storage.
Azure Blob Storage.
Azure File Storage.
what is the difference between these services? and when to use them since they all provide the same functionality (storage) on Azure's cloud platform.
You can take a look at this article: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage
I'd say main differences between Data Lake and Azure Storage Blob is scale and permissions model.
It really makes no sense to paste the whole article here. But you might want to look at Data Lake v2, which (as MS claims) is a mesh or Data Lake v1 and Azure Storage Blob: https://learn.microsoft.com/azure/storage/blobs/data-lake-storage-introduction
As for the Azure File storage its just an SMB share over HTTPS (and it not really fast due to being 1 "stream" only).

Resources