Azure Storage copy an image from blob to blob - node.js

I am using Azure Storage Nodejs and what i need to do is to copy image from one blob to another.
First i tried to getBlobToFile to get the image on temp location in disk and then just createBlockBlobFromFile from that temp location. That method did the task, but for some reason it didn't copied completely in 10% of cases.
The i was trying to use getBlobToText and the result of that put into createBlockBlobFromText, also tried to put options which is need blob to be image container. That method failed completely, image not even opened after copy.
Perhaps there is a way to copy blob file and paste it in other blobl but i didn't find that method.
What else can i do?

I'm not sure what your particular copy-error is, but... with getLocalBlobToFile(), you're actually physically moving blob content from blob storage to your VM (or local machine), and then with createBlockBlobFromLocalFile() you're pushing the entire contents back to blob storage, which is resulting in two physical network moves.
The Azure Storage system supports blob-copy as a 1st-class operation. While it's available via REST API call, it's also wrapped in the same SDK you're using, in the method BlobService.startCopyBlob() (source code here). This will instruct the storage to initiate an async copy operation, completely within the storage system (meaning no download+upload on your side). You'll be able to set source and destination, set timeouts, etc. (all parameters are fully documented in the source code).

The link in the accepted answer is broken, although the method is correct: the method startCopyBlob is documented here
(Updated: Jan 3, 2020) https://learn.microsoft.com/en-us/javascript/api/azure-storage/BlobService?view=azure-node-latest#azure_storage_BlobService_createBlockBlobFromLocalFile
(The old link) https://learn.microsoft.com/en-us/javascript/api/azure-storage/BlobService?view=azure-node-latest#azure_storage_BlobService_createBlockBlobFromLocalFile

Related

Azure block blob uncommited update behavior

I am writing a project that needs concurrent update to the block blob. From Microsoft documentation:
Uncommitted Block List: The list of blocks that have been uploaded for a blob using Put Block, but that have not yet been committed. These blocks are stored in Azure in association with a blob, but do not yet form part of the blob.
I could not find any documentation on
Whether an update on uncommitted blob could/should be performed
What happens when you write to an uncommitted blob.
How long is the time bound on a block blob to go from uncommitted -> committed based on the consistency policy you choose.
Could someone provide more context on the concurrent update behavior of uncommitted block blobs?
I tested it with fiddler, and the latest blob storage nuget package Microsoft.Azure.Storage.Blob, version 11.1.0
When you're using UploadFromByteArray method to upload to azure blob storage, there're some scenarios:
1.The files(or bytes array) are not big, like 10M or 100M, then there is no uncommitted status for the blob. In this case, by default, the concurrency policy "last writes win" apply. So here, you don't need to worry about uncommitted thing.
2.If the files(or bytes array) are big, like 200M, when you use UploadFromByteArray method, it will split into many blocks with a unique block id.
In this case, when the blob is uncommitted(before it calls put block list api), you can not perform another write operation for the blob. If you have a 2nd write operation, there is a error message "The specified blob or block content is invalid." for the 2nd write operation. I tested this, you can see the screenshot below:
Regarding your 3rd question, as per my test, when the status changes from uncommitted(when using put block api) -> committed(when using put block list), with the help of fiddler, I calculate the time is very short, less than 1s:
Hope it helps.

Azure blob upload rename if blob name exist

In Azure blob upload, a file is overwritten if you upload a new file with the same file name (in the same container).
I would like to rename the new file before saving it, to avoid overwriting any files - Is this possible?
Scenario:
Upload file "Image.jpg" to container "mycontainer"
Upload file "Image.jpg" to container "mycontainer" (with different content)
Rename second "Image.png" to "Image_{guid}.jpg" before saving it to "mycontainer".
You cannot rename a blob (there's no API for it). Your options:
check if blob name exists, prior to uploading, and choosing a different name for your about-to-be-uploaded blob if the name is already in use
simulate rename by copying existing blob to new blob of different name, then deleting original blob
As #juunas pointed out in comments: You'd have to manage your workflow to avoid potential race condition regarding checking for existence, renaming, etc.
I recommend using an "If-None-Match: *" conditional header (sometimes known as "If-Not-Exists" in the client libraries). If you include this header on either your PutBlob or PutBlockList operations, the call will fail and data will not be overwritten. You can catch this client-side and retry the upload operation (with a different blob name.)
This has two advantages over checking to see if the blob exists before uploading. First, you no longer have the potential race condition. Second, calling Exists() adds a lot of additional overhead - an additional HTTP call for every upload, which is significant unless your blobs are quite large or latency doesn't matter. With the access condition, you only need multiple calls when the name collides, which should hopefully be a rare case.
Of course, it might be easier / cleaner to just always use the GUID, then you don't have to worry about it.
Needing to rename may be indicative of an anti-pattern. If your ultimate goal is to change the name of the file when downloaded, you can do so and keep the blob name abstract and unique.
You can set the http download filename by assigning ContentDisposition property with
attachment;filename="yourfile.txt"
This will ensure that the header is set when the blob is accessed either as public or a SAS url.

Azure function resize image in place

I'm trying to resize the image from blob storage using the Azure function - the easy task, lots of samples, works great, but. works only when resized image is saved to a different file. My problem is that I would like to replace the original image with resized one - with the sane location and name.
when I set output blob to be the same as input blob, it is triggered over and over again without the finish.
is there any way I could change blob using azure function and store result in the same file?
The easiest option is to accept two invocations for the same file, but add a check of the size of the incoming file. If the size is already OK, do nothing and quit without changing the file again. This should break you out of the loop.
Blob trigger uses Storage Logs to watch for new or changed blobs. It then compares the changed blob against Blob Receipts in a container named azure-webjobs-hosts in the Azure storage account. Each receipt has ETag associated with it, so when you change a blob, the ETag changes and the Blob is submitted to the function again.
Unless you want to go fancy and update ETag's in receipts from within a function (not sure if it's feasible), your changed files will go for re-processing.

Writing and Reading to Local Storage in Azure WebJobs

I need to use local storage in an Azure WebJob (continuous if it matters). What is the recommended path for this? I want this to be as long-lasting as possible, so I am not wanting a Temp directory. I am well aware local storage in azure will always need to be backed by Blob storage or otherwise, which I already will be handling.
(To preempt question on that last part: This is a not frequently changing but large file (changes maybe once per week) that I want to cache in local storage for much faster times on startup. When not there or if out of date (which I will handle checking), it will download from the source blob and so forth.)
Related questions like Accessing Local Storage in azure don't specifically apply to a WebJob. However, this question is vitally connected, but 1) the answer replies on using Server.MapPath which is a System.Web dependent solution I think, and 2) I don't find this answer to have any research or definitive basis (though it is probably a good guess for the best solution). It would be nice if the Azure team gave more direction on this important issue, we're talking about nothing less than usage of the local hard drive.
Here are some Environment variables worth considering, though I don't know which to use:
Environment.CurrentDirectory: D:\local\Temp\jobs\continuous\webjobname123\idididid.id0
[PUBLIC, D:\Users\Public]
[ALLUSERSPROFILE, D:\local\ProgramData]
[LOCALAPPDATA, D:\local\LocalAppData]
[ProgramData, D:\local\ProgramData]
[WEBJOBS_PATH, D:\local\Temp\jobs\continuous\webjobname123\idididid.id0]
[SystemDrive, D:]
[LOCAL_EXPANDED, C:\DWASFiles\Sites\#1appservicename123]
[WEBSITE_SITE_NAME, webjobname123]
[USERPROFILE, D:\local\UserProfile]
[USERNAME, RD00333D444333$]
[WEBSITE_OWNER_NAME, asdf1234-asdf-1234-asdf-1234asdf1234+eastuswebspace]
[APP_POOL_CONFIG, C:\DWASFiles\Sites\#1appservicename123\Config\applicationhost.config]
[WEBJOBS_NAME, webjobname123]
[APPSETTING_WEBSITE_SITE_NAME, webjobname123]
[WEBROOT_PATH, D:\home\site\wwwroot]
[TMP, D:\local\Temp]
[COMPUTERNAME, RD00333D444333]
[HOME_EXPANDED, C:\DWASFiles\Sites\#1appservicename123\VirtualDirectory0]
[APPDATA, D:\local\AppData]
[WEBSITE_INSTANCE_ID, asdf1234asdf134asdf1234asdf1234asdf1234asdf1234asdf12345asdf12342]
[HOMEPATH, \home]
[WEBJOBS_SHUTDOWN_FILE, D:\local\Temp\JobsShutdown\continuous\webjobname123\asdf1234.pfs]
[WEBJOBS_DATA_PATH, D:\home\data\jobs\continuous\webjobname123]
[HOME, D:\home]
[TEMP, D:\local\Temp]
Using the %HOME% environment variable as a base path works for me nicely. I use a subfolder to store job-specific data, but other folder structure on top of this base path can be valid. For more details take a look at https://github.com/projectkudu/kudu/wiki/Understanding-the-Azure-App-Service-file-system and https://github.com/projectkudu/kudu/wiki/File-structure-on-azure

Avoid over-writing blobs AZURE

if i upload a file on azure blob in the same container where the file is existing already, it is over-writing the file, how to avoid overwriting the same? below i am mentioning the scenario...
step1 - upload file "abc.jpg" on azure in container called say "filecontainer"
step2 - once it gets uploaded, try uploading some different file with the same name to the same container
Output - it will overwrite existing file with the latest uploaded
My Requirement - i want to avoid this overwrite, as different people may upload files having same name to my container.
Please help
P.S.
-i do not want to create different containers for different users
-i am using REST API with Java
Windows Azure Blob Storage supports conditional headers using which you can prevent overwriting of blobs. You can read more about conditional headers here: http://msdn.microsoft.com/en-us/library/windowsazure/dd179371.aspx.
Since you want that a blob should not be overwritten, you would need to specify If-None-Match conditional header and set it's value to *. This would cause the upload operation to fail with Precondition Failed (412) error.
Other idea would be to check for blob's existence just before uploading (by fetching it's properties) however I would not recommend this approach as it may lead to some concurrency issues.
You have no control over the name your users upload their files with. You, however, have control over the name you store those files with. The standard way is to generate a Guid and name each file accordingly. The chances of conflict is almost zero.
A simple pseudocode looks like this:
//generate a Guid and rename the file the user uploaded with the generated Guid
//store the name of the file in a dbase or what-have-you with the Guid
//upload the file to the blob storage using the name you generated above
Hope that helps.
Let me put it that way:
step one - user X uploads file "abc1.jpg" and you save it io a local folder XYZ
step two - user Y uploads another file with same name "abc1.jpg", and now you save it again in a local folder XYZ
What do you do now?
With this I am illustrating that your question does not relate to Azure in any way!
Just do not rely on original file names when saving files. Where-ever you are saving them. Generate random names (GUIDs for example) and "attach" the original name as meta-data.

Resources