Blob file limit for query with BlobClientBase.queryWithResponse() - azure

I am using the code sample from BlobClientBase.queryWithResponse(BlobQueryOptions queryOptions, Duration timeout, Context context) Method to try to query data from a blob file on Azure Blob Storage. I was able to successfully retrieve data from smaller files such as file size of 1KB. But when I try to query larger files such as file size of 1MiB, I get an exception stating record is larger than supported. Is there a size limit to the blob file I can query using this method?

I try to query larger files such as file size of 1MiB, I get an exception stating record is larger than supported. Is there a size limit to the blob file I can query using this method?
Yes, according to documentation:
queryWithResponse(BlobQueryOptions queryOptions, Duration timeout, Context context) this method queries an entire blob into an output stream.
The maximum size of the query expression is 256KiB.
References: BlobClientBase Class - queryWithResponse(), BlobClientBase.queryWithResponse and BlobClientBase.java

Related

RequestEntityTooLarge in cosmos stored procedure

I am trying to pass the more than 3 MB json data as input parameter to a CosmosDB stored procedure but I get an error
RequestEntityTooLarge
Is there any limitation or there is other way to do that?
Below is the screenshot where I am just returning the constant to check there is Ru's consumption over the data param or not.
Note: Container RU's is set to 10k
Hi max size for the item is 2MB
Per-item limits
Depending on which API you use, an Azure Cosmos item can represent either a document in a collection, a row in a table, or a node or edge in a graph. The following table shows the limits per item in Cosmos DB.
Maximum size of an item: 2 MB (UTF-8 length of JSON representation)

Azure data factory - Copy Data activity Sink - Max rows per file property

I would like to spilt my big size file into smaller chunks inside blob storage via ADF copy data activity. I am trying to do so using Max Rows per file property in Copy activity sink but my file is not getting spilt into smaller files rather I get the same big size file in result, can anyone share any valuable info here?
I tested it, and it can work fine.
Source dataset
2.Source setting
3.Sink dataset
4.Sink setting
Result:

Azure Blob Storage - What exactly does "Write Operations" mean?

I'm using the Azure Pricing Calculator for estimating storage costs for files (more specifically, SQL backups).
I'm currently selecting Block Blob Storage with Blob Storage account type.
There's a section in the pricing calculator that shows the cost of Write Operations and describes which API calls are Write Ops:
The following API calls are considered Write Operations: PutBlob, PutBlock, PutBlockList, AppendBlock, SnapshotBlob, CopyBlob and SetBlobTier (when it moves a Blob from Hot to Cool, Cool to Archive or Hot to Archive).
I looked at the docs for PutBlob and PutBlock, but both don't really seem to mention "file" at all anywhere (except for PubBlob which mentions a filename).
The PutBlob description says:
The Put Blob operation creates a new block, page, or append blob, or updates the content of an existing block blob.
The PutBlock description says:
The Put Block operation creates a new block to be committed as part of a blob.
Is it 1 block per file or is a file multiple blocks?
Are those 2 Put commands used for uploading files?
Does a write operation effectively mean 1 operation per 1 file?
For example, if i have 100 files is that 100 write operations?
Or can 1 write operation write multiple files in a single op?
Let me try to explain it with a couple of scenarios. Considering you are using block blobs, I will explain using them only.
Uploading a 1 MB File: Assuming you have a 1 MB local file that you wish to save as block blob. Considering the file size is relatively small, you can upload this file in blob storage using Put Blob operation. Since you're calling this operation only once, you will be performing one write operation.
Uploading a 1 GB File: Now let's assume that you have a 1 GB local file that you wish to save as block blob. Considering the file size is big, you decide to logically split the file in 1 MB chunks (i.e. you logically split your 1 GB local file in 1024 chunks). BTW, these chunks are also known as blocks. Now you upload each of these blocks using Put Block operation and then finally stitch these blocks together using Put Block List operation to create your blob. Since you're calling 1024 put block operations (one for each block) and then 1 put block list operation, you will be performing 1025 write operations (1024 + 1).
Now to answer your specific questions:
Is it 1 block per file or is a file multiple blocks?
It depends on whether you used Put Blob operation or Put Block operation to upload the file. In scenario 1 above, it is just 1 block per file (or blob) because you used put blob operation however in scenario 2, it is 1024 blocks per file (or blob) because you used put block operation.
Are those 2 Put commands used for uploading files?
Yes. Again depending on the file size you may decide to use either put blob or put block/put block list operation to upload files. Maximum size of a file that can be uploaded by a put blob operation is 100 MB. What that means is that if the file size is greater than 100 MB, then you must use put block/put block list operation to upload a file. However if the file size is less than 100 MB, then you can use either put blob or put block/put block list operation.
Does a write operation effectively mean 1 operation per 1 file? For
example, if i have 100 files is that 100 write operations?
At the minimum, yes. If each of the 100 files is uploaded using put blob operation, then it would amount to 100 write operations.
Or can 1 write operation write multiple files in a single op?
No, that's not possible.
Operations are at the REST level. So, for a given blob being written, you may see more than one operation for a given blob, especially if its total size exceeds the maximum payload of a single Put Block/Page operation (either 4MB or 100MB for a block, 4MB for a page).
For a block blob, there's a follow-on Put Block List call, after all of the Put Block calls, resulting in yet another metered operation.
There are similar considerations for Append blobs.

Is there a way to increase the request size limit when inserting data into cosmosdb?

I have a requirement of reading multiple files (105 files) from ADLS(Azure data lake storage); parsing them and subsequently adding the parsed data directly to multiple collections in azure cosmos db for mongodb api. All this needs to be done in one request. Average file size is 120kb.
The issue is that after multiple documents are added,an error is raised "request size limit too large"
Please let me know if someone has any inputs on this.
It's unclear how you're performing multi-document inserts but... You can't increase maximum request size. You'll need to perform individual inserts, or insert in smaller batches.

Cassandra: Issue with blob creation for large file

We are trying to load a file in to a blob column in Cassandra. When we load files of 1-2 MB files, it goes through fine. While loading large file, say around 50 MB, getting following error:
Cassandra failure during write query at consistency LOCAL_QUORUM (1 responses were required but only 0 replica responded, 1 failed)
It is a single node development DB. Any hints or support will be appreciated.
50mb is pretty big for a cell. Although a little out of date its still accurate: http://cassandra.apache.org/doc/4.0/faq/#can-large-blob
There is no mechanism for streaming out of cells in Cassandra so the cells content needs to be serialized in as single response, in memory. Your probably hitting a limit or bug somewhere thats throwing an exception and causing the failed query (check cassandras system.log, may be an exception in there that will describe whats occuring better).
If you have a CQL collection or logged batch there are additional lower limits.
http://docs.datastax.com/en/cql/3.3/cql/cql_reference/refLimits.html
You can try chunking your blobs into parts. Id actually recommend like 64kb, and on client side, iterate through them and generate a stream (to also prevent loading it completely in memory on your side).
CREATE TABLE exampleblob (
blobid text,
chunkid int,
data blob,
PRIMARY KEY (blobid, chunkid));
Then just SELECT * FROM exampleblob WHERE blobid = 'myblob'; and iterate through results. Inserting gets more complex though since you have to have logic to split up your file, this can also be done in streaming fashion though and be memory efficient on your app side.
Another alternative is to just upload the blob to S3 or some distributed file store, use a hash of the file as the bucket/filename. In Cassandra just store the filename as a reference to it.

Resources