Understanding azcopy resumption when downloading blob objects from Azure storage

Understanding azcopy resumption when downloading blob objects from Azure storage - azure

I have a scenario where I need to download millions of blobs from Azure storage. I'm using azcopy.
More blobs are being added constantly to this storage device (# ~10K/day).
Imagine my download was disrupted. I try again 30 mins later. By then, ~200 more blobs have already been added to the storage. On my command line, I see:
Incomplete operation with same command line detected at the journal directory "/home/myuser/Microsoft/Azure/AzCopy", do you want to resume the operation?
Which of the following 2 scenarios will happen if I enter "Yes"?
1) It will download the remaining blob files, including the 200 new ones that were added
2) It will download the remaining blob files, excluding the new ones.
Please confirm.

It depends on the names of your new blobs. AzCopy records the last listed blob name into its journal file as a checkpoint, and after resuming, it will continue listing the blobs page by page since that checkpoint. If names of your new blobs are before the checkpoint, they won't be included in this downloading transfer job; otherwise, they will be included.

Related

Is there possibility to synchronize data between two Azure blob storages

I have a task to copy BLOB storage to the other location and make synchronization between them. Unfortunately, I didn't find a solution for it. Is there a possibility to make it more simple way?

You can use the AzCopy utility to synchronize files, or replicate a source location to a destination location.
The azcopy sync command identifies all files at the destination, and then compares file names and last modified timestamps before the starting the sync operation. If you set the --delete-destination flag to true AzCopy deletes files without providing a prompt. If you want a prompt to appear before AzCopy deletes a file, set the --delete-destination flag to prompt.
Also check the az storage blob sync from Azure CLI.

Also consider the newer: Object Replication for Block Blobs.
Object replication asynchronously copies block blobs between a source
storage account and a destination account.

Azure blob storage overwriting duplicate files

I am using Azure Blob storage to upload/download files. The problem is, if I upload any new file to azure blob that have the same name as already uploaded file then its automatically overwriting the content of previously uploaded file.
For example
These are the files uploaded on azure blob storage -
file1.docx
file2.png
file1.png
So if i am uploading a new file named as "file1.docx" which have the different content. Then blob storage is replacing the previous uploaded file1.docx . So in this case i am losing the previously uploaded file.
Is there any way that blob storage can automatically detect that there is duplicate so it can append _1 or (1) in the end or any other way to solve this problem ?

Is there any way that blob storage can automatically detect that there
is duplicate so it can append _1 or (1) in the end or any other way to
solve this problem ?
Out of the box this feature is not available and you will have to handle this in your application. If your upload operation fails with a Conflict (HTTP Status Code 409) error that would mean that a blob by the name of the uploaded file exists. You would then need to retry the operation by appending _1 or (1). You will need to keep on doing it by increasing the counter till the time your upload does not fail with conflict status code.

You can also append GUID to your file name which will make the file unique.

Re-play/Repeat/Re-Fire Azure BlobStorage Function Triggers for existing files

I've just uploaded several 10s of GBs of files to Azure CloudStorage.
Each file should get picked up and processed by a FunctionApp, in response to a BlobTrigger:
[FunctionName(nameof(ImportDataFile))]
public async Task ImportDataFile(
// Raw JSON Text file containing data updates in expected schema
[BlobTrigger("%AzureStorage:DataFileBlobContainer%/{fileName}", Connection = "AzureStorage:ConnectionString")]
Stream blobStream,
string fileName)
{
//...
}
This works in general, but foolishly, I did not do a final test of that Function prior to uploading all the files to our UAT system ... and there was a problem with the uploads :(
The upload took a few days (running over my Domestic internet uplink due to CoViD-19) so I really don't want to have to re-do that.
Is there some way to "replay" the BlobUpload Triggers? so that the function triggers again as if I'd just re-uploaded the files ... without having to transfer any data again!

As per this link
Azure Functions stores blob receipts in a container named
azure-webjobs-hosts in the Azure storage account for your function app
(defined by the app setting AzureWebJobsStorage).
To force reprocessing of a blob, delete the blob receipt for that blob
from the azure-webjobs-hosts container manually. While reprocessing
might not occur immediately, it's guaranteed to occur at a later point
in time. To reprocess immediately, the scaninfo blob in
azure-webjobs-hosts/blobscaninfo can be updated. Any blobs with a last
modified timestamp after the LatestScan property will be scanned
again.

I found a hacky-AF work around, that re-processes the existing file:
If you add Metadata to a blob, that appears to re-trigger the BlobStorage Function Trigger.
Accessed in Azure Storage Explorer, but Right-clicking on a Blob > Properties > Add Metadata.
I was settings Key: "ForceRefresh", Value "test".

I had a problem with the processing of blobs in my code which meant that there were a bunch of messages in the webjobs-blobtrigger-poison queue. I had to move them back to azure-webjobs-blobtrigger-name-of-function-app. Removing the blob receipts and adjusting the scaninfo blob did not work without the above step.
Fortunately Azure Storage Explorer has a menu option to move the messages from one queue to another:

I found a workaround, if you aren't invested in the file name:
Azure Storage Explorer, has a "Clone with new name" button in the top bar, which will add a new file (and trigger the Function) without transferring the data via your local machine.
Note that "Copy" followed by "Paste" also re-triggers the blob, but appears to transfer the data down to your machine and then back up again ... incredibly slowly!

AzCopy uploading local files to Azure Storage as files, not Blobs

I'm attempting to upload 550K files from my local hard drive to Azure Blob Storage using the following command (AzCopy 5.1.1) -
AzCopy /Source:d:\processed /Dest:https://ContainerX.file.core.windows.net/fec-data/Reports/ /DestKey:SomethingSomething== /S
It starts churning right away.
But it's actually creating a new Azure File Storage folder called fec-data/reports rather than creating new blobs in the Azure Blob folder fec-data/reports I've already created.
What am I missing?
Also, is there anyway to keep the date created (or similar) values of the old files?
Thanks,

But it's actually creating a new Azure File Storage folder called
fec-data/reports rather than creating new blobs in the Azure Blob
folder fec-data/reports I've already created.
What am I missing?
The reason you're seeing this behavior is because you're uploading to File storage instead of Blob storage. To upload the files to Blob storage, you need to specify blob service endpoint (blob.core.windows.net). So your command would be:
AzCopy /Source:d:\processed /Dest:https://ContainerX.blob.core.windows.net/fec-data/Reports/ /DestKey:SomethingSomething== /S
Also, is there anyway to keep the date created (or similar) values of
the old files?
Assuming you want to keep the date created of the blob same as that of the desktop file, then it is not possible. Blob's Last Modified Date/Time is a system property that gets assigned when a blob is created and is updated every time that blob is changed. You could however make use of blob's metadata and store file's creation date/time there.

I think you have to get the instance of the bob where you want to deploy the file
like :
AzCopy /Source:d:\processed /Dest:https://ContainerX.blob.core.windows.net/fec-data/Reports/ /DestKey:SomethingSomething== /S
Blob: Upload
Upload single file
AzCopy /Source:C:\myfolder/Dest:https://myaccount.blob.core.windows.net/mycontainer /DestKey:key /Pattern:"abc.txt"
If the specified destination container does not exist, AzCopy will create it and upload the file into it.
Upload single file to virtual directory
AzCopy /Source:C:\myfolder /Dest:https://myaccount.blob.core.windows.net/mycontainer/vd /DestKey:key /Pattern:abc.txt
If the specified virtual directory does not exist, AzCopy will upload the file to include the virtual directory in its name (e.g., vd/abc.txt in the example above).
please refer the link :https://learn.microsoft.com/en-us/azure/storage/storage-use-azcopy

Could not verify the copy source within the specified time. RequestId: (blank)

I am trying to copy some blob files from one storage account to another one. I am using AzCopy in order to fulfill this goal.
The process works for copying files between containers within the same storage account, but not between different storage accounts.
The command I am issuing is:
AzCopy /Source:https://<storage_account1>.blob.core.windows.net/<container_name1>/<path_to_desired_blobs> /Dest:https://<storage_account2>.blob.core.windows.net/<container_name2>/<path_to_store>/ /SourceKey:<source_key> /DestKey:<dest_key> /Pattern:<some_pattern> /S
The error I am getting is the following:
The remote server returned an error: (400) Bad Request.
Could not verify the copy source within the specified time.
RequestId:
Time:2016-04-01T19:33:01.0527460Z
The only difference between the two storage accounts is that one is Standard, whereas the other one is Premium.
Any help will be appreciated!

From your description, you're trying to copy Block Blob from source account to Page Blob in destination account, which is not supported in Azure Storage Service and AzCopy.
To work around it, you can firstly use AzCopy to download the Block Blobs from source account to local file system, and then upload them from local file system to destination account with option /BlobType:Page (this option is only valid when uploading from local to blob).

Premium Storage only supports page blobs. Please confirm that you are copying page blobs from standard to premium storage account. Also, specify the BlobType parameter to "page" in order to copy the data as page blobs into destination premium storage account.

From the description, I am assuming your source blob is a block blob. Azure's "Async Copy Blob" process (which is used by AzCopy as the default method) preserves the blob type. That is, you cannot convert a blob type from Block to Page through async copy blob.
Instead, can you try AzCopy again with "/SyncCopy" option along with "/BlobType:page" parameter? That might help change the destination blob type to Page.
(If that doesn't work, only other solution would be to first download the blob, and then upload it with "/BlobType:page")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Understanding azcopy resumption when downloading blob objects from Azure storage - azure

Related

Is there possibility to synchronize data between two Azure blob storages

Azure blob storage overwriting duplicate files

Re-play/Repeat/Re-Fire Azure BlobStorage Function Triggers for existing files

AzCopy uploading local files to Azure Storage as files, not Blobs

Could not verify the copy source within the specified time. RequestId: (blank)

Categories

Resources