PDFs in Azure Blob Storage; better block or page blobs?

PDFs in Azure Blob Storage; better block or page blobs? - azure

I have read this article, but I am still not sure whether I should store PDFs as page or block blobs in Azure Blob Storage.
The documents are just corporate documents for archiving, i.e. they will never be modified but need to be accessed via web and downloaded. The size of each document varies between 50 kB and 5 MB.
Any insights would be greatly appreciated.

You should use block blobs since you don't need random read or write operations.
If you really only need to archive files, consider using Azure Archive storage, which is the lowest-priced storage offer in Azure.

#Meneghino Using Block blob would be best for objects such as PDFs. Page blobs are suitable for VHDs, basically, by default when you create a VM resource, the VHDs get stored on Page blobs due to its optimization to read and write operations.
Page Blob: are a collection of 512-byte pages optimized for random read and write operations. To create a page blob, you initialize the page blob and specify the maximum size the page blob will grow. To add or update the contents of a page blob, you write a page or pages by specifying an offset and a range that align to 512-byte page boundaries. A write to a page blob can overwrite just one page, some pages, or up to 4 MB of the page blob. Writes to page blobs happen in-place and are immediately committed to the blob. The maximum size for a page blob is 8 TB.
Block blobs: let you upload large blobs efficiently. Block blobs are comprised of blocks, each of which is identified by a block ID. You create or modify a block blob by writing a set of blocks and committing them by their block IDs. Each block can be a different size, up to a maximum of 100 MB (4 MB for requests using REST versions before 2016-05-31), and a block blob can include up to 50,000 blocks. The maximum size of a block blob is therefore slightly more than 4.75 TB (100 MB X 50,000 blocks). For REST versions before 2016-05-31, the maximum size of a block blob is a little more than 195 GB (4 MB X 50,000 blocks). If you are writing a block blob that is no more than 256 MB (64 MB for requests using REST versions before 2016-05-31) in size, you can upload it in its entirety with a single write operation;
More information can be found here: https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs

Related

Azure Blob Storage - What exactly does "Write Operations" mean?

I'm using the Azure Pricing Calculator for estimating storage costs for files (more specifically, SQL backups).
I'm currently selecting Block Blob Storage with Blob Storage account type.
There's a section in the pricing calculator that shows the cost of Write Operations and describes which API calls are Write Ops:
The following API calls are considered Write Operations: PutBlob, PutBlock, PutBlockList, AppendBlock, SnapshotBlob, CopyBlob and SetBlobTier (when it moves a Blob from Hot to Cool, Cool to Archive or Hot to Archive).
I looked at the docs for PutBlob and PutBlock, but both don't really seem to mention "file" at all anywhere (except for PubBlob which mentions a filename).
The PutBlob description says:
The Put Blob operation creates a new block, page, or append blob, or updates the content of an existing block blob.
The PutBlock description says:
The Put Block operation creates a new block to be committed as part of a blob.
Is it 1 block per file or is a file multiple blocks?
Are those 2 Put commands used for uploading files?
Does a write operation effectively mean 1 operation per 1 file?
For example, if i have 100 files is that 100 write operations?
Or can 1 write operation write multiple files in a single op?

Let me try to explain it with a couple of scenarios. Considering you are using block blobs, I will explain using them only.
Uploading a 1 MB File: Assuming you have a 1 MB local file that you wish to save as block blob. Considering the file size is relatively small, you can upload this file in blob storage using Put Blob operation. Since you're calling this operation only once, you will be performing one write operation.
Uploading a 1 GB File: Now let's assume that you have a 1 GB local file that you wish to save as block blob. Considering the file size is big, you decide to logically split the file in 1 MB chunks (i.e. you logically split your 1 GB local file in 1024 chunks). BTW, these chunks are also known as blocks. Now you upload each of these blocks using Put Block operation and then finally stitch these blocks together using Put Block List operation to create your blob. Since you're calling 1024 put block operations (one for each block) and then 1 put block list operation, you will be performing 1025 write operations (1024 + 1).
Now to answer your specific questions:
Is it 1 block per file or is a file multiple blocks?
It depends on whether you used Put Blob operation or Put Block operation to upload the file. In scenario 1 above, it is just 1 block per file (or blob) because you used put blob operation however in scenario 2, it is 1024 blocks per file (or blob) because you used put block operation.
Are those 2 Put commands used for uploading files?
Yes. Again depending on the file size you may decide to use either put blob or put block/put block list operation to upload files. Maximum size of a file that can be uploaded by a put blob operation is 100 MB. What that means is that if the file size is greater than 100 MB, then you must use put block/put block list operation to upload a file. However if the file size is less than 100 MB, then you can use either put blob or put block/put block list operation.
Does a write operation effectively mean 1 operation per 1 file? For
example, if i have 100 files is that 100 write operations?
At the minimum, yes. If each of the 100 files is uploaded using put blob operation, then it would amount to 100 write operations.
Or can 1 write operation write multiple files in a single op?
No, that's not possible.

Operations are at the REST level. So, for a given blob being written, you may see more than one operation for a given blob, especially if its total size exceeds the maximum payload of a single Put Block/Page operation (either 4MB or 100MB for a block, 4MB for a page).
For a block blob, there's a follow-on Put Block List call, after all of the Put Block calls, resulting in yet another metered operation.
There are similar considerations for Append blobs.

Azure SQL Database storage error, it shows a lot of unallocated error

The current specification of Azure SQL Managed Instance is 8vcore with Storage 1280 GB.
The issue is when I am going through the Size it shows me there is 380 GB of unallocated storage out of 1280 GB How can I allocate this storage.

Azure SQL database_size is always larger than the sum of reserved + unallocated space because it includes the size of log files, but reserved and unallocated_space consider only data pages.
FYI that “database size” property is the size of the current database, it includes both data and log files. “Space Available” is space in the database that has not been reserved for database objects. With our situation, if database objects take up all the unallocated space, data and log files will grow by “autogrowth” value (we can see these property under database, properties, Files tab) until database size reaches 10 GB.
For more detail information, please refer to the following link:
sp_spaceused (Transact-SQL):
http://msdn.microsoft.com/en-us/library/ms188776.aspx
Database Properties (Files Page):
http://msdn.microsoft.com/en-us/library/ms180254.aspx

Understanding Azure Block Blob Storage Limitations

I am a bit confused on how block blob storage works so I bit confused on how the limitations work(from what I read that most people will not get even close to the limitations but I still like to understand how it's been applied). I been reading this post
The limitations seem to be like this
Block blobs store text and binary data, up to about 4.7 TB. Block
blobs are made up of blocks of data that can be managed individually.
Azure Blob storage limits Resource Target Max size of single blob
container Same as max storage account capacity Max number of blocks in
a block blob or append blob 50,000 blocks Max size of a block in a
block blob
I understand the above picture but I don't understand in the above image what is actually the "block blob".
I don't understand if I store say all my pictures in 1 container will I be reaching the limit?
Say if I had something super crazy like I have in this picture container 10 million photos each photo is 100mb, would I have gone over the limit?
or does block blob mean if I had img001 and it was 1gb it would get separated into blocks and the limit would be 50,000 blocks?
Then img002 would have it's own 50,000 block limit and so forth(ie the limits for storage are against each image not the total container size)

The 50000 blocks limit is for each object in container. You can have multiple objects each having 50000*100mb size.
Normally you don't hit these limits. At least I have never been able to hit them. :-)
You can find more information at https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs.

How to choose blob block size in Azure

I want to use Append blobs in Azure storage.
When Im uploading a blob, I should choose the block size.
What I should consider when choosing the block size?
I see no difference if Im uploading a file which has bigger size then block size.
How to choose the right block size?

According to your description, I did some research, you could refer to it for a better understanding about blocks of append blob:
I just checked the CloudAppendBlob.AppendText and CloudAppendBlob.AppendFromFile. If the file size or text content size less than 4MB, then it would be uploaded to a new individual block. Here I used CloudAppendBlob.AppendText for appending text content (byte size less than 4MB) three times, you could refer to the network traces as follows:
For content size > 4MB, then the client SDK would divide the content into small pieces (4MB) and upload them into each blocks. Here I uploaded a file with the size about 48.8MB, you could refer to the network traces as follows:
As Gaurav Mantri mentioned that you could choose small block size for low speed network. Moreover, for small block size write, you would retrieve the better performance for write requests, but when you reading data, your data spans across multiple separate blocks, it would slow down your read requests. It depends on the write/read ratio your application expected, for optimal reads, I recommend that you need to batch writes to be as near 4MB as possible, which would bring you with slower write requests but reads to be much faster.

A few things to consider when deciding on the block size:
In case of an Append Blob, maximum size of a block can be 4 MB so you can't go beyond that number.
Again, a maximum of 50000 blocks can be uploaded so you would need to divide the blob size with 50000 to decide the size of a block. For example, if you're uploading a 100MB file and decide to choose 100 byte block, you would end up with 1048576 (100x1024x1024/100) blocks which is more than allowed limit of 50000 so it is not allowed.
Most importantly, I believe it depends on your Internet speed. If you have a really good Internet connection, you can go up to 4MB block size. For not so good Internet connections, you can reduce the limit. For example, I always try to use 256-512KB block size as the Internet connection I have is not good.

Is it possible to read text files from Azure Blob storage from the end?

I have rather large blob files that I need to read and ingest only latest few rows of information from. Is there an API (C#) that would read the files from the end until I want to stop, so that my app ingests the minimum information possible?

You should already know that BlockBlobs are designed for sequential access, while Page Blobs are designed for random access. And AppendBlobs for Append operations, which in your case is not what we are looking for.
I believe your solution would be to save your blobs as PageBlob as opposed the default BlockBlob. Once you have a Page Blob, you have nice methods like GetPageRangesAsync which returns an IEnbumerable of PageRange. The latter overrides ToString() method to give you the string content of the page.

Respectfully, I disagree with the answer. While it is true that Page Blobs are designed for random access, they are meant for different purpose all together.
I also agree that Block Blobs are designed for sequential access, however nothing is preventing you from reading a block blob's content from the middle. With the support for range reads in block blob, it is entirely possible for you to read partial contents of a block blob.
To give you an example, let's assume you have a 10 MB blob (blob size = 10485760 bytes). Now you want to read the blob from the bottom. Assuming you want to read 1MB chunk at a time, you would call DownloadRangeToByteArray or DownloadRangeToStream (or their Async variants) and specify 9437184 (9MB marker) as starting range and 10485759 (10MB marker) as ending range. Read the contents and see if you find what you're looking for. If not, you can read blob's contents from 8MB to 9MB and continue with the process.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string