Difference between Azure Data Lake Storage x Azure Blob Storage and Azure File Storage - azure

I have a question about the use cases of the different Azure storage services:
Azure Data Lake Storage.
Azure Blob Storage.
Azure File Storage.
what is the difference between these services? and when to use them since they all provide the same functionality (storage) on Azure's cloud platform.

You can take a look at this article: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage
I'd say main differences between Data Lake and Azure Storage Blob is scale and permissions model.
It really makes no sense to paste the whole article here. But you might want to look at Data Lake v2, which (as MS claims) is a mesh or Data Lake v1 and Azure Storage Blob: https://learn.microsoft.com/azure/storage/blobs/data-lake-storage-introduction
As for the Azure File storage its just an SMB share over HTTPS (and it not really fast due to being 1 "stream" only).

Related

Are you able to apply Azure Information Protection on Azure Data Lake or Blob Storage?

I'm wanting to know if you can apply Azure Information Protection on files that are in Azure Data Lake or Blob storage.
I cant seem to find any documentation that confirms this.
Cheers
Tim

Azure Data Lake Gen2 vs Storage account

I have a requirement to process some big data and planning to deploy Databricks cluster & a storage technology. Currently evaluating Data Lake Gen2 which supports both object and file storage. The storage account (blob, file, table, queue) also has similar capabilities which can handle both file based and object based storage requirements. I am bit puzzled to go for an option because of these similarities. Can someone clarify the following questions please?
Except HDFS support, what else is a significant feature that I should use Data Lake Gen2 against Storage Account?
Storage Account v2 with Hierarchical namespace enabled == Data Lake Gen2. If so, can I use File System to create file shares and mount them in my VM as like Storage acc's File system?
For accessing data from Databricks, which one of these two will be better for big data workloads. I can see Storage account can also be mounted as DBFS which can still leverage the distributed processing.
Except HDFS support, what else is a significant feature that I should
use Data Lake Gen2 against Storage Account?
Answer: There're also other benefits. In short, the benefits are Performance / Management / Security as well it's cost. For more details, you can refer to this official article.
Storage Account v2 with Hierarchical namespace enabled == Data Lake
Gen2. If so, can I use File System to create file shares and mount
them in my VM as like Storage acc's File system?
Answer: Of course, the ADLS Gen2 supports file shares mount as the blob storage does.
For accessing data from Databricks, which one of these two will be
better for big data workloads. I can see Storage account can also be
mounted as DBFS which can still leverage the distributed processing.
Answer: ADLS Gen2 can also be mounted as DBFS. And as per Answer 1, the better one should be ADLS Gen2.

Uploading Data(csv file) using Azure Functions(Nodejs) To Azure DataLakeGen2

I am currently trying to send a csv file using Azure Function with NodeJs to Azure Data Lake gen2 but unable to do the same, Any suggestions regarding the same would be really helpful.
Thanks.
I have tried to use Credentials of blob storage present in ADLS gen2 using the Blob storage API's but i am getting an error.
For now this could not be implemented with SDK. Please check this known issue:
Blob storage APIs are disabled to prevent feature operability issues that could arise because Blob Storage APIs aren't yet interoperable with Azure Data Lake Gen2 APIs.
And in the table of features, you could find the information about APIs for Data Lake Storage Gen2 storage accounts:
multi-protocol access on Data Lake Storage is currently in public preview. This preview enables you to use Blob APIs in the .NET, Java, Python SDKs with accounts that have a hierarchical namespace. The SDKs don't yet contain APIs that enable you to interact with directories or set access control lists (ACLs). To perform those functions, you can use Data Lake Storage Gen2 REST APIs.
So if you want to implement it, you have to use the REST API:Azure Data Lake Store REST API.

What are the sources supported in Azure Data Share?

What are the sources/datasets are supported in Azure Data Share?
Does it support?
Blob storage
Azure Files
Queues Storage
Table storage
Disk storage
Looking at step 7 here and from the REST API documentation, I believe currently following sources are supported:
Azure Blob Storage
Azure Data Lake Gen 1
Azure Data Lake Gen 2
I would not be surprised if more data sources are supported down the road considering the service is in preview currently.

Azure storage underlying technology

What is Azure storage made of, the underlying storage technology which supports the Azure storage we access in azure portal?
Is it object based storage or block storage (persistent/ephemeral) similar to categorization in Ceph?
If there is a mix of block and object based, which storage is used for each of exposed Azure storage service - block blob, append blob, page blob, storage tables, storage queues, Azure files
A few years ago, Azure Storage team made a presentation about internals of Azure Storage at 23rd ACM Symposium on Operating Systems Principles (SOSP).
You can read more about this presentation here: https://azure.microsoft.com/en-in/blog/sosp-paper-windows-azure-storage-a-highly-available-cloud-storage-service-with-strong-consistency/.
Direct links:
Paper: http://sigops.org/sosp/sosp11/current/2011-Cascais/printable/11-calder.pdf
Video of presentation: https://www.youtube.com/watch?v=QnYdbQO0yj4
Powerpoint: http://sigops.org/sosp/sosp11/current/2011-Cascais/11-calder.pptx
Please go through this material. Hopefully it will give an idea about how Azure Storage is designed.

Resources