azure blob storage cli list directories - azure

I'm writing a PowerShell script that uses the Azure Storage CLI. I'm looking for a way to list only the directories under a certain path. I can easily list blobs using az storage blob list but I'm hoping I can just get the directories without post-processing the result. Given that Azure blob storage has a flat file structure, I'm not really sure this is possible but thought someone might know something I don't.
Thanks.

You can use the --delimiter option to get closer to what you want -- top level folders (and blobs), but not nested blobs:
# suppress nested blobs
$ az storage blob list -c foo --delimiter '/' --output table
Name Blob Type Content Type
--------- ----------- -------------------------
dir1/
dir2/
dir3/
dir4/
file.txt BlockBlob text/plain; charset=utf-8
This suppresses all the blobs at deeper levels. From there, you can filter down to just the "folders" because the folder entries have zero metadata, while the blobs do. The folder entries also end in a /. The az CLI can filter that for you, too, if you provide the query option:
az storage blob list -c foo --delimiter '/' --query '[].name | [?ends_with(#, `/`)]'
[ "dir1/",
"dir2/",
"dir3/",
"dir4/" ]
Edit:
Another flag --auth-mode login might be needed if you have trouble with fetching resources due to the authentication issues
z storage blob list -c foo --auth-mode login --delimiter '/' --query '[].name | [?ends_with(#, `/`)]'

As mentioned, Azure Blob Storage is flat. For a hierarchical filesystem, check out Azure File Storage.
Check out the quickstart for the CLI: https://learn.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-cli

I ended up deciding that this wasn't really an option since the file system is flat and the "paths" are just virtual. I decided to go a different direction with my script that didn't require batching based on these virtual directories.

Related

Is there a way to filter by tier in azure blob storage

I would like to list all the files stored in a particular tier. This is what I tried:
az storage fs file list \
--file-system 'cold-backup' \
--query "[?contains(properties.blobTier, 'Cold')==\`true\`].properties.blobTier"
But it doesn't work. I also tried with "blobTier" only. No luck.
This is the error I get:
Invalid jmespath query supplied for '--query': In function contains(), invalid type for value: None, expected one of: ['array', 'string'], received: "null"
The command az storage fs file list is for ADLS Gen2 file system, there is no blobTier property in the output, so you could not query with it, also the blobTier should be Cool instead of Cold.
If you want to list the files filter with blobTier, you could use az storage blob list, it applies to blob storage, but it can also be used for ADLS Gen2 file system.
Sample:
az storage blob list --account-name '<storage-account-name>' --account-key 'xxxxxx' --container-name 'cold-backup' --query "[?properties.blobTier=='Cool']"
If you want to output the blobTier, use --query "[?properties.blobTier=='Cool'].properties.blobTier" instead in the command.
The accepted answer works perfectly fine. However, if you have a lot of files then the results are going to be paginated. The CLI tool will return NextMarker which has to be used in the subsequent call using --marker parameter. In case of huge number of files, this will have to be scripted out using something like power shell.Also az storage blob list makes --container-name mandatory. Which means only one container can be queried at a time.
Blob Inventory
I have a ton of files and many containers. I found an alternate method that worked best for me. Under Data management there is an option called Blob Inventory.
This will basically generate a report of all the blobs across all the containers in a storage account. The report can be customized to include the fields of your choice, for example: Name, Access Tier, Blob Type etc. There are also options to filter certain blobs (include and exclude filters).
The report will be generated in CSV or Parquet format and stored in the container of your choice at a daily or weekly frequency. The only downside is that the report can't be generated on-demand (only scheduled).
Further, If you wish to run SQL on the Inventory report (CSV/Parquet file) then simply use DBeaver to do this.

Azure container copy only changes

I would like to update static website assets from github repos. The documentation suggests to use an action based on
az storage blob upload-batch --account-name <STORAGE_ACCOUNT_NAME> -d '$web' -s .
If I see this correct, this copies all files regardless of the changes. Even if only one file was altered. Is it possible to only transfer files that have been changed? Like rsync does.
Else I would try to judge the changed files based on the git history and only transfer them. Please also answer, if you know an existing solution in this direction.
You can use azcopy sync to achieve that. That is a different tool, though.
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-synchronize?toc=/azure/storage/blobs/toc.json
https://learn.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy-sync
Based on the sugggestion by #4c74356b41, I discovered that the mentioned tool was recently integrated into the az tool.
It can be used the same way as az storage blob upload-batch. The base command is:
az storage blob sync

Azure Premium Storage - azure storage blob upload both block and page blobs fail

I'm attempting the daunting task of using Azure as a hosting platform(Do not recommend, use AWS for production).
Since the platform has NO BACKUP SOLUTION for premium storage based VMs, I am attempting to backup the files in a more manual method by copying them to blob storage.
azure storage blob upload (key/file/container blah blah)
error: Block blobs are not supported.
Oh.. Ok, hey if premium storage requires page blobs I'll just use them(oh yeah btw we wont tell you at all how to do any of this btw)!
azure storage blob upload --blobtype page (key/file/container blah blah)
error: Page blob length must be multiple of 512.
What!? You mean I need to know the exact length of every file I upload? Wow Azure you sure are great!
I realize I can just use standard storage but I find this very annoying.
Oh yeah and for anyone out there in need this is one solution I used that works if you are offloading to standard storage:
The yes n | will automatically say no to every request to overwrite an existing blob with the same name, if your files change you would remove the n and have it overwrite each time. You will need to install azure cli(linux) and setup credentials for it.
#!/bin/bash
yes n |for i in $(find /data/directoryiwanttobackup/ -type f -name '*'); do echo $i; azure storage blob upload -a storageaccount -k "storageaccountkey" $i --container containername $i; done >> /var/log/mybackuplog$(date +"%m%d%Y").log

How to sync between two Azure storage (blobs) hosted on two different data centers

We are planning to deploy our azure web application to two separate data centers (one located in West Europe and the other located in Southeast Asia) for purely performance reasons. We allow users to upload files which means we need to keep the blob storage of the two data centers in sync. I know Azure provides support for synchronizing structured data but there seems to be no such support for blob synchronization. My questions is:
Is there a service that provides blob synchronization between different data centers? if not, how can I implement one? I see many samples on the web to sync between Azure blob storage and local file system and vice versa but not between data centers.
Is there a service that provides blob synchronization between
different data centers?
No. Currently no such service exists out of the box which would synchronize content between 2 data centers.
if not, how can I implement one?
Although all the necessary infrastructure is available for you to implement this, the actual implementation would be tricky.
First you would need to decide if you want real-time synchronization or will a batched synchronization would do?
For realtime synhroniztion you could rely on Async Copy Blob. Using async copy blob you can actually instruct the storage service to copy blob from one storage account to another instead of manually download the blob from source and uploading to target. Assuming all uploads are happening from your application, as soon as a blob is uploaded you would know in which datacenter it is being uploaded. What you could do is create a SAS URL of this blob and initiate an async copy to the other datacenter.
For batched synchronization, you would need to query both storage accounts and list blobs in each blob container. In case the blob is available in just one storage account and not other, then you could simply create the blob in destination storage account by initiating async copy blob. Things would become trickier if the blob (by the same name) is present in both storage accounts. In this case you would need to define some rules (like comparing modified date etc.) to decide whether the blob should be copied from source to destination storage account.
For scheduling the batch synchronization, you could make use of Windows Azure Scheduler Service. Even with this service, you would need to write code for synchronization logic. Scheduler service will only take care of scheduling part. It won't do the actual synchronization.
I would recommend making use of a worker role to implement synchronization logic. Another alternative is Web Jobs which are announced recently though I don't know much about it.
If your goals are just about performance and the content is public use Azure CDN for this. Point it at your primary blob storage container and it will copy the files around the world for best performance.
I know this is an old query and much would have changed in the recent past. I ended up this link while searching for the similar task . So thought will update the latest from AzCopy v10. It has an sync option ;
Synchronizes file systems to Azure Blob storage or vice versa.
Use azcopy sync . Ideal for incremental copy scenarios.
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10
You can automate this task with powershell:
Download all Blobs (with Snapshots) from One Windows Azure Storage Account
http://gallery.technet.microsoft.com/scriptcenter/all-Blobs-with-Snapshots-3b184a79
Using PowerShell to move files to Azure Storage
http://www.scarydba.com/2013/06/03/using-powershell-to-move-files-to-azure-storage/
Copy all VHDs in Blob Storage from One Windows Azure Subscription to Another
http://gallery.technet.microsoft.com/scriptcenter/Copy-all-VHDs-in-Blog-829f316e
Old question I know, but the Windows.Azure.Storage.DataMovement library is good for this.
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-data-movement-library
Using Bash with Azure CLI and AZCopy - Code is on Github and associated video on YouTube to get it working.
https://github.com/J0hnniemac/yt-blobsync
#!/bin/bash
cd /home
app_id=""
tenant=""
sourceurl="https://<>.blob.core.windows.net"
destinationurl="https://<>.blob.core.windows.net"
pemfile="/home/service-principal.pem"
sourceaccount=$(echo $sourceurl | awk -F/ '{print $3}' | awk -F. '{print $1}')
destinationaccount=$(echo $destinationurl | awk -F/ '{print $3}' | awk -F. '{print $1}')
echo $app_id
echo $tenant
echo $sourceurl
echo $destinationurl
echo $sourceaccount
echo $destinationaccount
az login --service-principal --password $pemfile --username $app_id --tenant $tenant
# list storage containers
az storage container list --auth-mode login --account-name $sourceaccount -o=table | awk 'NR>1 {print $1}' | grep networking-guru > src.txt
az storage container list --auth-mode login --account-name $destinationaccount -o=table | awk 'NR>1 {print $1}' | grep networking-guru > dst.txt
grep -vf dst.txt src.txt > diff.txt
for blob_container in $(cat diff.txt);
do
echo $blob_container;
newcmd="az storage container create --auth-mode login --account-name $destinationaccount -n $blob_container --fail-on-exist"
echo "---------------------------------"
echo $newcmd
eval $newcmd
done
echo "performing AZCOPY login"
azcopy login --service-principal --certificate-path $pemfile --application-id $app_id --tenant-id $tenant
echo "performing AZCOPY sync for each container"
for blob_container in $(cat src.txt);
do
#Create timestame + 30 Minutes for SAS token
end=`date -u -d "30 minutes" '+%Y-%m-%dT%H:%MZ'`
sourcesas=`az storage container generate-sas --account-name $sourceaccount --as-user --auth-mode login --name $blob_container --expiry $end --permissions acdlrw`
echo $sourcesas
# remove leading and trailing quotes from SAS Token
sourcesas=$(eval echo $sourcesas)
echo $sourcesas
src="$sourceurl/$blob_container?$sourcesas"
dst="$destinationurl/$blob_container"
echo $src
echo $dst
synccmd="azcopy sync \"$src\" \"$dst\" --recursive --delete-destination=true"
echo $synccmd
eval $synccmd
done

How to clean an Azure storage Blob container?

I just want to clean (dump, zap, del .) an Azure Blob container. How can I do that?
Note: The container is used by IIS (running Webrole) logs (wad-iis-logfiles).
A one liner using the Azure CLI 2.0:
az storage blob delete-batch --account-name <storage_account_name> --source <container_name>
Substitute <storage_account_name> and <container_name> by the appropriate values in your case.
You can see the help of the command by running:
az storage blob delete-batch -h
There is only one way to bulk delete blobs and that is by deleting the entire container. As you've said there is a delay between deleting the container and when you can use that container name again.
Your only other choice is to delete the one at a time. If you can do the deleting from the same data centre where the blobs are stored it will be faster than running the delete locally. This probably means writing code (or you could RDP into one of your instances and install cloud explorer). If you're writing code then you can speed up the overall process by deleting the items in parallel. Something similar to this would work:
Parallel.ForEach(myCloudBlobClient.GetContainerReference(myContainerName).ListBlobs(), x => ((CloudBlob) x).Delete());
Update: Easier way to do it now (in 2018) is to use the Azure CLI. Check joanlofe's answer :)
Easiest way to do it in 2016 is using Microsoft Azure Storage Explorer IMO.
Download Azure Storage Explorer and install it
Sign in with the appropriate Microsoft Account
Browse to the container you want to empty
Click on the Select All button
Click on the Delete button
Try using cloudberry product for windows azure
this is the link: http://www.cloudberrylab.com/free-microsoft-azure-explorer.aspx
you can search in the blob for specific extension. select multiple blobs and delete them
If you mean you want to delete a container. I would like to suggest you to check http://msdn.microsoft.com/en-us/library/windowsazure/dd179408.aspx to see if Delete Container operation (The container and any blobs contained within it are later deleted during garbage collection) could fulfill the requirement.
If you are interested in a CLI way, then the following piece of code will help you out:
for i in `az storage blob list -c "Container-name" --account-name "Storage-account-name" --account-key "Storage-account-access-key" --output table | awk {'print $1'} | sed '1,2d' | sed '/^$/d'`; do az storage blob delete --name $i -c "Container-name" --account-name "Storage-account-name" --account-key "Storage-account-access-key" --output table; done
It first fetches the list of blobs in the container and deletes them one by one.
If you are using a spark (HDInsight) cluster which has access to that storage account, then you can use HDFS commands on the command line;
hdfs dfs -rm -r wasbs://container_name#account_name.blob.core.windows.net/path_goes_here
The real benefit is that the cluster is unlikely to go down, and if you have screen running on it, then you won't lose your session whilst you delete away.
For This case the better option is to identify the list of item found in the container. then delete each item from the container. That is the best option. If you delete the container you should have a run time error on the next time...
You can use Cloud Combine to delete all the blobs in your Azure container.

Resources