We have 5 vendors that are SFTPing files to Blob Storage. When the files come in, I need to copy them to another container and create a folder in that container named with the date to put the files in. From the second container, I need to copy the files to a file share on an Azure server. What is the best way to go about this?
I'm very new to Azure and unsure what the best way is to accomplish what I am being asked to do. Any help would be greatly appreciated.
I'd recommend using Azure Synapse for this task. It will let you move data to and from different storage securely and with little-to-no code.
Specifically, I'd put a blob storage trigger on the SFTP blob container so that the Synapse Pipeline to move data automatically runs when your vendors drop their files.
Note that when you look for documentation on how to do things in Synapse, most of the time the Azure Data Factory documentation will also be applicable, since most of Data Factory's functionality is now in Synapse.
The ADF and Synapse YouTube channels are excellent resources, as well as the Microsoft Learn courses on Data Engineering.
I need to copy them to another container and create a folder in that container named with the date to put the files in.
You can use Azcopy to copy a files to another container by using SAS token.
command:
azcopy copy 'https://<storage account>.blob.core.windows.net/test/files?SAS' 'https://<storage account >.blob.core.windows.net/mycontainer/12-01-2023?SAS' --recursive
Console:
Portal:
I need to copy the files to a file share on an Azure server
You can also copy the files from container to file share by using Azcopy.
Command:
azcopy copy 'https://<storage account>.blob.core.windows.net/test?SAS' 'https://<storage account >.file.core.windows.net/fileshare/12-01-2023?SAS' --recursive
Console:
Portal:
You can get the SAS token through portal:
Go to portal -> your storage account -> shared access signature -> check the resource types -> click generate SAS and Connection-string.
Portal:
Probably azcopy is a good way to move all or part of the blobs from one container to another one. But I would suggest to automate it with Azure Functions. I think it can be atomated triggering an Azure Function every time a blob or set of blobs (Azure could process a batch of blobs) are updoladed to the source container.
Note on Azure Functions, depends on the quantity of blobs to be moved and the time that it could take, durable functions should be better solution to skip timeout exception. Durable function returns inmediate response but are running in "background".
Consider this article to have a better approach to this solution:
https://build5nines.com/azure-functions-copy-blob-between-azure-storage-accounts-in-c/
Related
In my azure subscription I have a storage account with a lot of tables that contains important data.
As far as I know azure offers a backup point-in-time for the storages and blobs, and geo redundancy in event of a failover. But I couldn't find anything regarding the backup of table storages.
The only way to do so is by using azCopy which is fine and a logic, but I couldn't make it work as I had some issues with permissions even if I set the Azure Blob Data Contributor to my container.
So as an option, I was thinking if there is a way how to implement this using python code to loop throu all the tables in a specific container and make a copy into another container.
Can anyone enlighten me on this matter please?
Did you set the Azure Storage firewall: allow access from all networks?:
Python code is a way but we can't help you design the code. And there isn't an example for you. It doesn't meet Stack Overflow's guideline.
If you still couldn't figure it out with AzCopy, I would suggest you think about use Data Factory to schedule backup the data from table storage to another container.
Create a pipeline with copy active to copy the data from Table
Storage. Ref this tutorial:Copy data to and from Azure Table
storage by using Azure Data Factory.
Create a schedule trigger for the pipeline to make the jobs
automatic.
If the Table storage has many tables, the easiest way is using Copy Data Tool.
Update:
Copy data tool source settings:
Sink settings: auto create the table in sink table storage
HTH.
I am updating a system that had all of it's files stored inside of sql server.
It's going from an on prem server to a Azure webapp.
My questions are:
I think I should be using a storage blob for these files. Is that correct or is there a better option inside of Azure that I should be using?
Is there a quick way to migrate files from sql to that blob?
For storage purposes, do I write the file to the blob and then store the hyperlink to that file?
The staging environment gets updated with the latest data from production when they do a release, is there a way to migrate storage blob to a different resource group for when they do this?
Yes, I would use blob.
Quickest way would be a quick powershell or cli script or console app to pull the files from the database and upload them to blob.
I don't store the entire hyperlink to the file in the database, just the path. That way the storage account and container can be environment configurations.
I would recommend against doing this... we've found since we started doing automated continuous deployment, we haven't had a reason to move backwards, which has eliminated a lot of effort. That being said, AzCopy is a utility that allows you to do server-side copy of blobs between storage accounts (along with many other types of source and destination if needed). That should do what you need.
To answer your questions:
I think I should be using a storage blob for these files. Is that
correct or is there a better option inside of Azure that I should be
using?
That's correct. Blob storage is meant for this purpose only.
Is there a quick way to migrate files from sql to that blob?
I'm not aware of any automated way to do that. What you would need to do is read the binary data from SQL Database and then create a stream out of it and upload that stream. You can use Azure Storage SDK for uploading purpose.
For storage purposes, do I write the file to the blob and then store
the hyperlink to that file?
Under normal circumstances, it is recommended approach however considering you have a need to create a staging environment that will be a copy of production environment (including database I am assuming), I would recommend you store 2 things in your database: blob container name and blob name (or you could store relative URL e.g. <container-name>/<blob-name>). Assuming you keep storage account name somewhere in the configuration file, you can create the URL dynamically using https://<account-name>.blob.core.windows.net/<container-name>/<blob-name> pattern.
The staging environment gets updated with the latest data from
production when they do a release, is there a way to migrate storage
blob to a different resource group for when they do this?
Azure Storage provides Copy Blobs functionality using which you can copy blobs from one blob container to another in same or a different storage account. You can use that to copy data from production environment to staging environment.
I'm currently working on a project for one our managed services clients.
We're looking to take data out of their blob store (a) and move it into another blob store (b) using AzCopy.
My question is will blob store B update from blob store (b) when new data arrives or will we have to do a full copy each time we want to move new data across?
Seems like a silly question however I couldn't find out online the answer to my question.
Thanks in advance!
AZCopy is just a command line tool that will allow you to copy blob x or container y from storage account A to storage account B, it's not doing anything special. If a blob already exists it will give you the option to not copy or to overwrite, like any normal copy operation.
The choice of what it copies will be down to the script you are running that triggers AZCopy, what are you telling it to copy?
You also might want to look at Azure Data Factory for doing blob to blob copies.
I want to write the output of pipeline to an FTP folder. ADF seems to support on-premises file but not FTP folder.
How can I write the output in text format to an FTP folder?
Unfortunately FTP Servers are not a supported data store for ADF as of right now. Therefore there is no OOTB way to interact with an FTP Server for either reading or writing.
However, you can use a custom activity to make it possible, but it will require some custom development to make this happen. A fellow Cloud Solution Architect within MS put together a blog post that talks about how he did it for one of his customers. Please take a look at the following:
https://blogs.msdn.microsoft.com/cloud_solution_architect/2016/07/02/creating-ftp-data-movement-activity-for-azure-data-factory-pipeline/
I hope that this helps.
Upon thinking about it you might be able to achieve what you want in a mildly convoluted way by writing the output to a Azure Blob storage account and then either
1) manually: downloading and pushing the file to the "FTP" site from the Blob storage account or
2) automatically: using Azure CLI to pull the file locally and then push it to the "FTP" site with a batch or shell script as appropriate
As a lighter weight approach to custom activities (certainly the better option for heavy work).
You may wish to consider using azure functions to write to ftp (note there is a time out when using a consumption plan - not in other plans, so it will depend on how big the files are).
https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-storage-blob-triggered-function
You could instruct data factory to write to a intermediary blob storage.
And use blob storage triggers in azure functions to upload them as soon as they appear in blob storage.
Or alternatively, write to blob storage. And then use a timer in logic apps to upload from blob storage to ftp. Logic Apps hide a tremendous amount of power behind there friendly exterior.
You can write a Logic app that will pick your file up from Azure storage and send it to an FTP site. Then call the Logic App using a Data Factory Web Activity.
Make sure you do some error handling in your Logic app to return 400 if the ftp fails.
I used the CloudBerry explorer to copy the VM(Iaas) disk file to a another Storage.
But when I finished duplication, I found the new create Blob is a Block Blob, not a Page Blob.
The tool didn't duplicate the source blob type which is Page Blob.
Is there anyway to Convert to Page Blob from Block Blob? Thanks
No. Once a blob is created/uploaded you can't change the blob type. Unfortunately you would need to recreate/re-upload the blob. However I'm somewhat surprised. You mentioned that you copied the blob from one storage account to another. Copy Blob operation within Windows Azure (i.e. from one storage account to another) preserves the source blob type. It may seem a bug in CloudBerry explorer. I wrote a blog post some days ago about moving virtual machines from one subscription to another (http://gauravmantri.com/2012/07/04/how-to-move-windows-azure-virtual-machines-from-one-subscription-to-another/) and it has some sample code and other useful information for copying blobs across storage account. You may want to take a look at that. HTH.
Has been a while since the original question, but it seems that the solution I used is not known or at least is not being used.
In Azure Storage you can not change the blob type for an existing file. Some people recommends download the files and upload again. But you can also use azcopy from the Cloud Shell in the Azure portal. At least in PowerShell the azcopy utility is available. I haven't tried in bash.
You need 2 SAS URLs with addecuate permission to read from the original container and to write to the destination. You also need the LIST permission. Having that, open the Cloud Shell and write the command.
azcopy copy 'https://<source-storage-account-name>.blob.core.windows.net/<source-container-name>?<SAS-token>' 'https://<dest-storage-account-name>.blob.core.windows.net/<dest-container-name>?<SAS-token>' --recursive --blob-type=BlockBlob
After coping, just delete the old page blobs.
More options for azcopy copy command can be found in the documentation.
This is the sample output: