How to convert from Azure Append Blob to Azure Block Blob - azure

Is their any any to convert from Append Blob to Block Blob .
Regards
C

For a blob conversion, I am using a
--blob-type=BlockBlob
option at the end of my azcopy.exe statement. So far it works well.
Good luck!

Is their any any to convert from Append Blob to Block Blob .
Once the blob has been created, its type cannot be changed, and it can be updated only by using operations appropriate for that blob type, i.e., writing a block or list of blocks to a block blob, appending blocks to a append blob, and writing pages to a page blob.
More information please refer to this link: Understanding Block Blobs, Append Blobs, and Page Blobs

Is their any any to convert from Append Blob to Block Blob .
Automatic conversion between blob types is not allowed. What you would need to do is download the blob and reupload it as Block Blob.

Given: i have source blob which is append blob
And: i have to copy source to new blob container as block blob
When: i use CopyBlobToBlobckBlobContainer functionThen: destination container will have same blob as source but as block blob.
public void CopyBlobToBlobckBlobContainer(string sourceBlobName)
{
var sourceContainerClient = new BlobContainerClient(sourceConnectionString, BlobContainerName);
var destinationContainerClient = new BlobContainerClient(destinationConnectionString, OutBlobContainerName);
destinationContainerClient.CreateIfNotExists();
var sourceBlobClient = sourceContainerClient.GetBlockBlobClient(sourceBlobName);
var sourceUri = sourceBlobClient.GenerateSasUri(BlobSasPermissions.Read, ExpiryOffset);
var destBlobClient = destinationContainerClient.GetBlockBlobClient(sourceBlobName);
var result = destBlobClient.SyncUploadFromUri(sourceUri, overwrite: true);
var response = result.GetRawResponse();
if (response.Status != 201) throw new BlobCopyException(response.ReasonPhrase);
}

Use the below command on azure cli.
azcopy copy 'https://<storage-account-name>.<blob or dfs>.core.windows.net/<container-name>/<append-or-page-blob-name>' 'https://<storage-account-name>.<blob or dfs>.core.windows.net/<container-name>/<name-of-new-block-blob>' --blob-type BlockBlob --block-blob-tier <destination-tier>
The --block-blob-tier parameter is optional. If you omit that
parameter, then the destination blob infers its tier from the default
account access tier setting. To change the tier after you've created a
block blob, see Change a blob's tier.

Related

add header at top in csv file of append blob

I am creating a pipeline in Azure data factory where I am using Function app as one of activity to transform data and store in append blob container as csv format .As I have taken 50 batches in for loop so 50 times my function app is to process data for each order.I am appending header in csv file with below logic.
//First I am creating file as per business logic //
//csveventcontent is my source data //
var dateAndTime = DateTime.Now.AddDays(-1);
string FileDate = dateAndTime.ToString("ddMMyyyy");
string FileName = _config.ContainerName + FileDate + ".csv";
StringBuilder csveventcontent = new StringBuilder();
OrderEventService obj = new OrderEventService();
//Now I am checking if todays file exists and if it doesn't we create it.//
if (await appBlob.ExistsAsync() == false)
{
await appBlob.CreateOrReplaceAsync(); //CreateOrReplace();
//Append Header
csveventcontent.AppendLine(obj.GetHeader());
}
Now the problem is header is appending so many times in csv file .Sometimes it is not appending at top.Probably due to parralel function app is running 50 times.
How I can fixed header at top only at one time.
I have tried with data flow and logic app also but unable to do it.If it can be handled through code that would be easier I guess.
I think you are right there. Its the concurrency of the function app that is causing the problem. The best approach would be to use a queue and process messages one by one. Or you could use a distributed lock to ensure only one function writes to the file at a time. You can use blob leases for this.
The Lease Blob operation creates and manages a lock on a blob for write and delete operations. The lock duration can be 15 to 60 seconds, or can be infinite.
Refer: Lease Blob Request Headers

azure container_client delete_blobs causes oom

I am using Azure blob SDK python version. Here is how I do blobs deletion:
blobs = container_client.list_blobs(name_starts_with="myprefix")
container_client.delete_blobs(*blobs)
if blobs here return a large amount of blob objects, the above code crashes.
What is the standard practice here? Are there other ways to do batch deletion?
Update:
reply to #Ivan Yang:
This is slightly different from your solution. I ran it but got error
There is a partial failure in the batch operation.
blobs = container_client.list_blobs(name_starts_with="myprefix")
blobs_list = list(blobs)
i in range(0, len(blobs_list), 10):
container_client.delete_blobs(*blobs_list[i: i+10])
You'd better specify how many blobs you are trying to delete by using delete_blobs method. Then it's easier for debug.
As a workaround, you can fetch a certain number(like 10 blobs) of blobs each time, then delete until the continuation token is null.
Here is the sample code:
#define a continuation token
continuation_token = None
while True:
#fetch 10 blobs each time
blob_list = container_client.list_blobs(name_starts_with="xxx",results_per_page=10).by_page(continuation_token=continuation_token)
list_segment=[blob.name for blob in list(next(blob_list))]
container_client.delete_blobs(*list_segment)
continuation_token = blob_list.continuation_token
if not continuation_token:
break

What is the difference between BlobAttribute vs BlobTriggerAttribute?

Can anyone elaborate on the difference between BlobAttribute vs BlobTriggerAttribute?
[FunctionName(nameof(Run))]
public async Task Run(
[BlobTrigger("container/{name}")]
byte[] data,
[Blob("container/{name}", FileAccess.Read)]
byte[] data2,
string name)
{
}
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob?tabs=csharp#trigger
It seems BlobTrigger has all the functionality.
From the doc you could find the main difference is the blob contents are provided as input with BlobTrigger. It means it could only read the blob, not able to write the blob.
And the the BlobAttribute supports binding to single blobs, blob containers, or collections of blobs and supports read and write.
Also the BlobTrigger could only be used to read the blob when a new or updated blob is detected. And the Blob binding could be used in every kind function.
Further more information about these two binding you could check the binding code: BlobAttribute and BlobTriggerAttribute.

GCP - get full information about bucket

I need to get the file information stored in Google Bucket. Information Like Filesize, Storage Class, Last Modified, Type. I searched for the google docs but it can be done by curl or console method. I need to get that information from the Python API like downloading the blob, uploading the blob to the bucket. Sample code or any help is appreciated!!
To get the object metadata you can use the following code:
from google.cloud import storage
def object_metadata(bucket_name, blob_name):
"""Prints out a blob's metadata."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.get_blob(blob_name)
print('Blob: {}'.format(blob.name))
print('Bucket: {}'.format(blob.bucket.name))
print('Storage class: {}'.format(blob.storage_class))
print('ID: {}'.format(blob.id))
print('Size: {} bytes'.format(blob.size))
print('Updated: {}'.format(blob.updated))
print('Generation: {}'.format(blob.generation))
print('Metageneration: {}'.format(blob.metageneration))
print('Etag: {}'.format(blob.etag))
print('Owner: {}'.format(blob.owner))
print('Component count: {}'.format(blob.component_count))
print('Crc32c: {}'.format(blob.crc32c))
print('md5_hash: {}'.format(blob.md5_hash))
print('Cache-control: {}'.format(blob.cache_control))
print('Content-type: {}'.format(blob.content_type))
print('Content-disposition: {}'.format(blob.content_disposition))
print('Content-encoding: {}'.format(blob.content_encoding))
print('Content-language: {}'.format(blob.content_language))
print('Metadata: {}'.format(blob.metadata))
object_metadata('bucketName', 'objectName')
Using the Cloud Storage client library, and checking at the docs for buckets, you can do this to get the Storage Class:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('YOUR_BUCKET')
print(bucket.storage_class)
As for the size and last modified files (at least it's what I understood from your question), those belong to the files itself. You could iterate the list of blobs in your bucket and check that:
for blob in bucket.list_blobs():
print(blob.size)
print(blob.updated)

Blob.getCopyState() returning null

Is this function not implemented in the java sdk? It appears to always return null. I am copy one page blob to another and want to track the status of the copy.
CloudPageBlob srcBlob = container.getPageBlobReference("source.vhd";
String newname="dst.vhd";
CloudPageBlob dstBlob = container.getPageBlobReference(newname);
dstBlob.startCopyFromBlob(srcBlob);
//Get the blob again for updated state
dstBlob = container.getPageBlobReference(newname);
CopyState state = dstBlob.getCopyState();
Is there any other way to get status? I am using azure-storage-1.2.0.jar
getPageBlobReference() is purely a local operation, it does not communicate with the Azure Storage service. You need to call dstBlob.downloadAttributes() in between calling getPageBlobReference() and getCopyState(). This will make the service call that will populate the blob's properties, including the copy state.

Resources