I am trying to copy data from one azure blob location to another blob location using hadoop distcp command(running this in spark scala). from destination location, users will query the data. during copy transition, if users query the data maybe they will get duplicate data. so, I can acquire lock on azure destination location. is there any fastest way/best way to copy data with transaction.
I recommend you to use AzCopy.
1.Copy a single blob from one container to another within the same storage account
AzCopy /Source:https://myaccount.blob.core.windows.net/mycontainer1 /Dest:https://myaccount.blob.core.windows.net/mycontainer2 /SourceKey:key /DestKey:key /Pattern:abc.txt
2.Copy a single blob from one storage account to another
AzCopy /Source:https://sourceaccount.blob.core.windows.net/mycontainer1 /Dest:https://destaccount.blob.core.windows.net/mycontainer2 /SourceKey:key1 /DestKey:key2 /Pattern:abc.txt
3.Copy all blobs in a container to another storage account
AzCopy /Source:https://sourceaccount.blob.core.windows.net/mycontainer1
/Dest:https://destaccount.blob.core.windows.net/mycontainer2 /SourceKey:key1 /DestKey:key2 /S
For more details, you could refer to this article.
Related
I have a not empty directory in Azure Blob storage.
But Azure CLI says that it not exists.
az storage blob directory exists -c jenkinsworkspaces -d "uild-pr-new_ecom-lora-ng_PR-5593" --connection-string="XXX" -o json
This command is implicitly deprecated because command group 'storage blob directory'
is deprecated and will be removed in a future release. Use 'az storage fs directory'
instead.
{
"exists": false
}
Don't understand why?
I precise that is the same with az storage fs directory exists
I believe you are getting false back is because your storage account is a regular storage account and does not have hierarchical namespace enabled i.e. your storage account is not a Data Lake Gen2 account.
In regular storage accounts, the folders are virtual. They are real folders in Data Lake Gen2 accounts. All the directory related commands will only work with Data Lake Gen2 accounts. Documentation unfortunately does not mention it.
i am writing az copy script which captures linux 18.04 system log using azcopy and store it into storage account container, but this whole steps I am doing with terraform automation. i have created machine code and I integrate shell script file with terraform extension.
so the issue is when azcopy copy the file from system and pass to a storage account need azcopy login to authenticate this process but these steps we can't perform through automation.
using following azcopy script and version is v10 please help me on this
AzCopy /Source:/var/log/syslog/Dest:https://testingwt.blob.core.windows.net/insights-operational-logs//SourceKey:y/bUACOu/wogikUT1EG0XeaPC4Y6spHcZly2d26QeENKwMiRpjFu5PwmXrThRbNGS3PiPfqEX8WsYC3dg== /S
updated error of azcopy using linux machine in azure
To upload files to the Storage Blob with a shell script automatically, you can use the SAS token of the storage, or use the azcopy login with a service principal or the VM managed identity.
For the SAS token:
azcopy copy "/path/to/file" "https://account.blob.core.windows.net/mycontainer1/?sv=2018-03-28&ss=bjqt&srt=sco&sp=rwddgcup&se=2019-05-01T05:01:17Z&st=2019-04-30T21:01:17Z&spr=https&sig=MGCXiyEzbtttkr3ewJIh2AR8KrghSy1DGM9ovN734bQF4%3D" --recursive=true
For the Service Principal, you need to set the environment variable AZCOPY_SPA_CLIENT_SECRET with the secret of the service principal as value and assign the role Storage Blob Data Contributor or role Storage Blob Data Owner of the storage Blob:
azcopy login --service-principal --application-id <application-id> --tenant-id=<tenant-id>
azcopy copy "/path/to/file" "https://account.blob.core.windows.net/mycontainer1/" --recursive=true
For the VM managed identity, you need also to assign the VM managed identity with the role Storage Blob Data Contributor or role Storage Blob Data Owner of the storage Blob:
azcopy login --identity
azcopy copy "/path/to/file" "https://account.blob.core.windows.net/mycontainer1/" --recursive=true
But when you use the VM managed identity, you need to execute the shell script in the Azure VM, it means you need to deploy the Terraform in the Azure VM. So the best way is that use a service principal, you can execute the shell script in other Linux OS, for example, your local Linux machine. The SAS token is also a good way without assigning the role. For more details, see the Use Azcopy for the Azure Storage Blob.
Am new to this and am trying to do something which I think is relatively simple.
I download a file from a URL to my Azure VM using wget (its a large file and I don't want to store it locally). I want to now copy this file to an existing container in blob storage. This is completely defeating me.
It's a single line command in the aws universe
aws s3 sync <file_name> s3://<bucket name>
is there an equivalent in azure?
There are a bunch of ways by you can accomplish this and you don't even have to download this large file on your local computer first and then upload in blob storage.
For example, you can use az storage blob copy command which is part of Azure CLI tools, to do so. Here's the sample command for that:
az storage blob copy start --account-key <your-azure-storage-account-key> --account-name <your-azure-storage-account-name> --destination-blob <name-of-the-blob> --destination-container <name-of-the-container> --source-uri <uri-of-the-file>
You can also accomplish the same using azcopy utility or Azure PowerShell Storage Cmdlets. The Cmdlet you would want to use is Start-AzStorageBlobCopy.
I am trying to copy all files from one container to another. I am using AzCopy to accomplish this task.
AzCopy command as below:
azcopy copy "https://xxxxxxx.blob.core.windows.net/customers" "https://xxxxxxx.blob.core.windows.net/archive" --recursive
Error:
Alternatively is it possible to Move files between containers?
Please follow this doc to grant your user account the RBAC role Storage Blob Data Contributor in your account or your containers.
Besides, there isn't a "move" operation for Azure Blob Storage, you need to delete the original container after copying it.
Suppose I do this operation between storage accounts:
AzCopy /Source:https://sourceaccount.blob.core.windows.net/mycontainer1 /Dest:https://destaccount.blob.core.windows.net/mycontainer2 /SourceKey:key1 /DestKey:key2 /Pattern:abc.txt
In mycontainer1 I have permission "Blob ..", but in mycontainer2 the permission becomes "Private ..." . Is there a way to prevent this to happen / force the same permission on the "new" container?
AzCopy doesn't support this. It's designed for transferring blobs/files. You need to reconfigure the permission of your destination container by yourself.