Blob Store to Blob Store - azure

I'm currently working on a project for one our managed services clients.
We're looking to take data out of their blob store (a) and move it into another blob store (b) using AzCopy.
My question is will blob store B update from blob store (b) when new data arrives or will we have to do a full copy each time we want to move new data across?
Seems like a silly question however I couldn't find out online the answer to my question.
Thanks in advance!

AZCopy is just a command line tool that will allow you to copy blob x or container y from storage account A to storage account B, it's not doing anything special. If a blob already exists it will give you the option to not copy or to overwrite, like any normal copy operation.
The choice of what it copies will be down to the script you are running that triggers AZCopy, what are you telling it to copy?
You also might want to look at Azure Data Factory for doing blob to blob copies.

Related

Copying new Azure blobs to different container

We have 5 vendors that are SFTPing files to Blob Storage. When the files come in, I need to copy them to another container and create a folder in that container named with the date to put the files in. From the second container, I need to copy the files to a file share on an Azure server. What is the best way to go about this?
I'm very new to Azure and unsure what the best way is to accomplish what I am being asked to do. Any help would be greatly appreciated.
I'd recommend using Azure Synapse for this task. It will let you move data to and from different storage securely and with little-to-no code.
Specifically, I'd put a blob storage trigger on the SFTP blob container so that the Synapse Pipeline to move data automatically runs when your vendors drop their files.
Note that when you look for documentation on how to do things in Synapse, most of the time the Azure Data Factory documentation will also be applicable, since most of Data Factory's functionality is now in Synapse.
The ADF and Synapse YouTube channels are excellent resources, as well as the Microsoft Learn courses on Data Engineering.
I need to copy them to another container and create a folder in that container named with the date to put the files in.
You can use Azcopy to copy a files to another container by using SAS token.
command:
azcopy copy 'https://<storage account>.blob.core.windows.net/test/files?SAS' 'https://<storage account >.blob.core.windows.net/mycontainer/12-01-2023?SAS' --recursive
Console:
Portal:
I need to copy the files to a file share on an Azure server
You can also copy the files from container to file share by using Azcopy.
Command:
azcopy copy 'https://<storage account>.blob.core.windows.net/test?SAS' 'https://<storage account >.file.core.windows.net/fileshare/12-01-2023?SAS' --recursive
Console:
Portal:
You can get the SAS token through portal:
Go to portal -> your storage account -> shared access signature -> check the resource types -> click generate SAS and Connection-string.
Portal:
Probably azcopy is a good way to move all or part of the blobs from one container to another one. But I would suggest to automate it with Azure Functions. I think it can be atomated triggering an Azure Function every time a blob or set of blobs (Azure could process a batch of blobs) are updoladed to the source container.
Note on Azure Functions, depends on the quantity of blobs to be moved and the time that it could take, durable functions should be better solution to skip timeout exception. Durable function returns inmediate response but are running in "background".
Consider this article to have a better approach to this solution:
https://build5nines.com/azure-functions-copy-blob-between-azure-storage-accounts-in-c/

How to archive Azure blob storage content?

I'm need to store some temporary files may be 1 to 3 months. Only need to keep the last three months files. Old files need to be deleted. How can I do this in azure blob storage? Is there any other option in this case other than blob storage?
IMHO best option to store files in Azure is either Blob Storage or File Storage however both of them don't support auto expiration of content (based on age or some other criteria).
This feature has been requested long back for Blobs Storage but unfortunately no progress has been made so far (https://feedback.azure.com/forums/217298-storage/suggestions/7010724-support-expiration-auto-deletion-of-blobs).
You could however write something of your own to achieve this. It's rather very simple: Periodically (say once in a day) your program will fetch the list of blobs and compare the last modified date of the blob with current date. If the last modified date of the blob is older than the desired period (1 or 3 months like you mentioned), you simply delete the blob.
You can use WebJobs, Azure Functions or Azure Automation to schedule your code to run on a periodic basis. In fact, there's readymade code available to you if you want to use Azure Automation Service: https://gallery.technet.microsoft.com/scriptcenter/Remove-Storage-Blobs-that-aae4b761.
As I know, Azure Blob is a appropriate approach for you to storage some temporary files. For your scenario, I assumed that there is no in-build option for you to delete the old files, and you need to programmatically or manually delete your temporary files.
For a simple way, you could try to upload your blob(file) with the specific format (e.g. https://<your-storagename>.blob.core.windows.net/containerName/2016-11/fileName or https://<your-storagename>.blob.core.windows.net/2016-11/fileName), then you could manually manage your files via Microsoft Azure Storage Explorer.
Also, you could check your files and delete the old files before you uploading the new temporary file. For more details, you could follow storage-blob-dotnet-store-temp-files and override the method CleanStorageIfReachLimit to implement your logic for deleting blobs(files).
Additionally, you could leverage a scheduled Azure WebJob to clean your blobs(files).
You can use Azure Cool Blob Storage.
It is cheaper than Blob storage and is more suitable for archives.
You can store your less frequently accessed data in the Cool access tier at a low storage cost (as low as $0.01 per GB in some regions), and your more frequently accessed data in the Hot access tier at a lower access cost.
Here is a document that explains its features:
https://azure.microsoft.com/en-us/blog/introducing-azure-cool-storage/

Azure Data Factory Only Retrieve New Blob files from Blob Storage

I am currently copying blob files from an Azure Blob storage to an Azure SQL Database. It is scheduled to run every 15 minutes but each time it runs it repeatedly imports all blob files. I would rather like to configure it so that it only imports if any new files have arrived into the Blob storage. One thing to note is that the files do not have a date time stamp. All files are present in a single blob container. New files are added to the same blob container. Do you know how to configure this?
I'd preface this answer with a change in your approach may be warranted...
Given what you've described your fairly limited on options. One approach is to have your scheduled job maintain knowledge of what it has already stored into the SQL db. You loop over all the items within the container and check if it has been processed yet.
The container has a ListBlobs method that would work for this. Reference: https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/
foreach (var item in container.ListBlobs(null, true))
{
// Check if it has already been processed or not
}
Note that the number of blobs in the container may be an issue with this approach. If it is too large consider creating a new container per hour/day/week/etc to hold the blobs, assuming you can control this.
Please use CloudBlobContainer.ListBlobs(null, true, BlobListingDetails.Metadata) and check CloudBlob.Properties.LastModified for each listed blob.
Instead of a copy activity, I would use a custom DotNet activity within Azure Data Factory and use the Blob Storage API (some of the answers here have described the use of this API) and Azure SQL API to perform your copy of only the new files.
However, with time, your blob location will have a lot of files, so, expect that your job will start taking longer and longer (after a point taking longer than 15 minutes) as it would iterate through each file every time.
Can you explain your scenario further? Is there a reason you want to add data to the SQL tables every 15 minutes? Can you increase that to copy data every hour? Also, how is this data getting into Blob Storage? Is another Azure service putting it there or is it an external application? If it is another service, consider moving it straight into Azure SQL and cut out the Blob Storage.
Another suggestion would be to create folders for the 15 minute intervals like hhmm. So, for example, a sample folder would be called '0515'. You could even have a parent folder for the year, month and day. This way you can insert the data into these folders in Blob Storage. Data Factory is capable of reading date and time folders and identifying new files that come into the date/time folders.
I hope this helps! If you can provide some more information about your problem, I'd be happy to help you further.

Is copying blob within the same Azure storage account instant?

I'm using the StartCopyFromBlob to copy a 2GB blob from container A to container B within the same storage account. I noticed that it's an instant operation as the CopyState status is Success right away. This is very good for us, so want to confirm that we can actually rely on this.
I can't find any MSDN document about this "copy optimization" when copying within the same storage account. Is there a document on this copy behavior within the same account? Just want to make sure it is officially supported.
Only storage accounts created on or after June 7th, 2012 allow the Copy Blob operation to copy from another storage account. http://msdn.microsoft.com/en-us/library/windowsazure/dd894037.aspx
you might find this post interesting: Introducing Asynchronous Cross-Account Copy Blob http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/12/introducing-asynchronous-cross-account-copy-blob.aspx
I hope this helps let me know if you need anything else.

How to convert exist Block Blob to PageBlob

I used the CloudBerry explorer to copy the VM(Iaas) disk file to a another Storage.
But when I finished duplication, I found the new create Blob is a Block Blob, not a Page Blob.
The tool didn't duplicate the source blob type which is Page Blob.
Is there anyway to Convert to Page Blob from Block Blob? Thanks
No. Once a blob is created/uploaded you can't change the blob type. Unfortunately you would need to recreate/re-upload the blob. However I'm somewhat surprised. You mentioned that you copied the blob from one storage account to another. Copy Blob operation within Windows Azure (i.e. from one storage account to another) preserves the source blob type. It may seem a bug in CloudBerry explorer. I wrote a blog post some days ago about moving virtual machines from one subscription to another (http://gauravmantri.com/2012/07/04/how-to-move-windows-azure-virtual-machines-from-one-subscription-to-another/) and it has some sample code and other useful information for copying blobs across storage account. You may want to take a look at that. HTH.
Has been a while since the original question, but it seems that the solution I used is not known or at least is not being used.
In Azure Storage you can not change the blob type for an existing file. Some people recommends download the files and upload again. But you can also use azcopy from the Cloud Shell in the Azure portal. At least in PowerShell the azcopy utility is available. I haven't tried in bash.
You need 2 SAS URLs with addecuate permission to read from the original container and to write to the destination. You also need the LIST permission. Having that, open the Cloud Shell and write the command.
azcopy copy 'https://<source-storage-account-name>.blob.core.windows.net/<source-container-name>?<SAS-token>' 'https://<dest-storage-account-name>.blob.core.windows.net/<dest-container-name>?<SAS-token>' --recursive --blob-type=BlockBlob
After coping, just delete the old page blobs.
More options for azcopy copy command can be found in the documentation.
This is the sample output:

Resources