Download a list of specific files from Azure Blob - azure

I've got an issue with the files download from Azure Blob container. This is not that trivial as it may seem. I saw many examples of how you can download one file, but there's a problem if you need to download a bulk of them.
Issue definition:
I have an Azure Blob container that has ~30k files in it
At the same time, I've got a list of exact file names locally (around 300 files) that I want to download from that Azure Blob container (i.e., I need to download a whole bunch of files selectively, by their names)
I know that all these files exist in the given blob. So, I need a way to iterate over the list of the files and download them from the blob.
What I tried:
I tried 'azcopy copy' command. It works alright if you need to copy one or several files from the blob to your local disk, but there's no way you can pass a huge list of files as a parameter to download those
Tried to search for PowerShell examples that can do the same, but no luck
Please advice.

Please try something like the following. It makes use of Get-AzStorageBlobContent Cmdlet.
The idea is to have an array of blobs you wish to download and then loop over that array and call this Cmdlet for each item.
$accountName = "account-name"
$accountKey = "account-key"
$containerName = "container-name"
$context = New-AzStorageContext -StorageAccountName $accountName -StorageAccountKey $accountKey
$destination = "C:\temp"
$blobNames = #("blob1.txt", "blob2.txt", "blob3.txt", "blob4.txt")
For ($i=0; $i -lt $blobNames.Length; $i++) {
$blob = $blobNames[$i]
Write-Host "Downloading $blob. Please wait."
Get-AzStorageBlobContent -Blob $blob -Container $containerName -Destination $destination -Context $context -Verbose
}

Have you tried "Azure Storage Explorer" soft ?
I was able to download a whole folder from a blob storage thanks to it.
If data in the blob container is in a folder just right click on the folder > Download.
If the files are directly at the root of the container (not stored in a subfolder) you'll have to select all files with the "Select All > Select all files in cache" option then click on "Download".

$accountName = "account-name"
$accountKey = "account-key"
$containerName = "container-name"
$context = New-AzStorageContext -StorageAccountName $accountName -StorageAccountKey $accountKey
$destination = "C:\temp"
$blobNames = Get-Content $destination\list.txt
For ($i=0; $i -lt $blobNames.Length; $i++) {
$blob = $blobNames[$i]
Write-Host "Downloading $blob. Please wait."
Get-AzStorageBlobContent -Blob $blob -Container $containerName -Destination $destination -Context $context -Verbose
}
It'll work like a charm, given that the text file that holds a list of file names for the blobs you need to download is located in the '$destination' directory (could be any directory on your PC, though).
p.s., the files just need to be listed as one column (separated by a caret return, i.e., "\n" at the end of each file name).
Thanks #Gaurav Mantri for the solution.

Related

How to read content from a CSV or text from ADLS Gen2 location using powershell without downloading that file?

I have a use case where I have to read some files from Blob.. but I am unable to do it..I am only getting the Metadata. How to do the same without download the file and importing?
Get-AzDataLakeGen2ItemContent
I agree the discussion in the comments, there is no way you can read the blob directly from the blob storage. You need to download the blob locally and then read the contents from it using Import-CSV. Below is the complete code.
$context = New-AzStorageContext -StorageAccountName "<STORAGE_ACCOUNT_NAME>" -StorageAccountKey "<STORAGE_ACCOUNT_KEY>"
$blobName = "input.csv"
Get-Content (Get-AzStorageBlobContent -Context $context -Container "container" -Blob $blobName)
Import-CSV -Path $blobName

Using Set-AzStorageBlobContent to upload only new content without prompts

I'm enumerating a local folder and uploading to Azure storage. I want to only upload new content to my Azure storage. If I use Set-AzStorageBlobContent with -Force, it'll overwrite everything. If I use it without -Force, it'll prompt on items that already exist. I can use Get-AzStorageBlob to check if the item already exists, but it prints red errors if the item does not exist. I can't find a combination of these items that gracefully uploads only new content without printing any errors or prompting. Am I using the wrong approach?
FINAL EDIT: adding working solution based on suggestions from Ivan Yang. Now only new files are uploaded, without any error messages. The key was to use -ErrorAction Stop to convert the error message into an exception, and then catch the exception.
# In my code this is part of a Test-Blob function that returns $blobFound
$blobFound = $false
try
{
$blobInfo = Get-AzStorageBlob `
-Container $containerName `
-Context $storageContext `
-Blob $blobPath `
-ErrorAction Stop
$blobFound = ($null -ne $blobInfo)
}
catch [Microsoft.WindowsAzure.Commands.Storage.Common.ResourceNotFoundException]
{
# Eat the error that'd otherwise be printed
}
# Note in my code this is actually a call to my Test-Blob function
if ($false -eq $blobFound)
{
Set-AzStorageBlobContent `
-Container $containerName `
-Context $storageContext `
-File $sourcePath `
-Blob $blobPath `
-Force # -Force is unnecessary but just being paranoid to avoid prompts
}
I see you have mentioned trying Get-AzStorageBlob, why not use it continually?
The trick here is that you can use try-catch-finally, which can properly handle the error if the blob does not exist in azure.
The sample code works at my side for uploading a single file, and you can modify it to upload multi-files:
$account_name ="xxx"
$account_key ="xxx"
$context = New-AzStorageContext -StorageAccountName $account_name -StorageAccountKey $account_key
#use this flag to determine if a blob exists or not in azure. And assume it exists at first.
$is_exist = $true
try
{
Get-AzStorageBlob -Container test3 -Blob a.txt -Context $context -ErrorAction Stop
}
catch [Microsoft.WindowsAzure.Commands.Storage.Common.ResourceNotFoundException]
{
#if the blob does not exist in azure, do the following
$is_exist = $false
Write-Output "the blob DOES NOT exists."
}
finally
{
#only execute the code when the blob does not exist in azure blob storage.
if(!$is_exist)
{
Set-AzStorageBlobContent -Container test3 -File "d:\myfolder\a.txt" -Blob a.txt -Context $context
Write-Output "uploaded!"
}
}
Not a PowerShell solution but I would suggest that you take a look at AzCopy. It's like RoboCopy but for Azure storage. A command line tool which allows you to synch, copy, move and more. It's free, works on macOS, Linux and Windows. And also, it is fast!
I use AzCopy from PowerShell scripts and it makes lie a lot easier (I'm managing millions of files and the stability and speed of AzCopy really helps)
This command is not smart enough to detect which files are new. You need to keep in the folder just the files you want to upload.
Simply use Set-AzStorageBlobContent -Force all the time.
The alternative is to check for existing file, download the file content, compare the files, and upload if different. The amount of processing/IO will only increase this way.

When opening my static website on Azure Storage I get a download screen

I'm hosting a ReactJS app on a Azure Storage using the static website feature. Using this script:
$container = "`$web"
$context = New-AzureStorageContext -StorageAccountName $env:prSourceBranchName -StorageAccountKey "e4Nt0********dbFlEG2LN9g2i5/yQ=="
Get-ChildItem -Path $env:System_DefaultWorkingDirectory/_ClientWeb-Build-CI/ShellArtifact/out/build -File -Recurse | Set-AzureStorageBlobContent -Confirm:$false -Force -Container $container -Context $context
I'm uploading the files from my build to the Azure storage $web blob.
When I navigate to the URL of the static page I get a download screen:
When I remove all the files and upload a simpel index.html file I does load the index.html file:
https://gist.github.com/chrisvfritz/bc010e6ed25b802da7eb
EDIT
In Edge I can open the page but Chrome and Firefox load the download screen. But Firefox does show some more information:
So it looks like the content-type is a bit weird.
The root cause should be the content-type is incorrect.
As you have already mentioned, when manually upload a .html file, it's content type is "text/html". While use Set-AzureStorageBlobContent, it changes to "application/octet-stream".
The solution is that when use powershell cmdlet Set-AzureStorageBlobContent, you should specify the parameter -Properties, like below: -Properties #{"ContentType" = "text/html"}.
Sample code like below(It works at my side):
$context = New-AzureStorageContext -StorageAccountName "xxx" -StorageAccountKey "xxxxx"
#note that the last "\" is necessary
$path = "D:\temp\1\"
Get-ChildItem -Path $path -File -Recurse | `
%{ if($_.extension -eq ".html") {Set-AzureStorageBlobContent -File $_.FullName -Blob $_.FullName.Replace($path,'') -Container "test-1" -Context $context -Properties #{"ContentType" = "text/html"}} `
else {Set-AzureStorageBlobContent -File $_.FullName -Blob $_.FullName.Replace($path,'') -Container "test-1" -Context $context}}
The above code just change the content_type of the files whose extension is .html to "text/html". You can feel free to change the code to meet your need.

Azure Runbook Powershell move zip file from Blob Container to FileShare and Unzip the file once in destination

I am trying to move zip files from an Azure Blob container to an Azure fileShare. I want to add in the option to unzip the file on the destination side. I am using Powershell in an automation runbook. The following is a snippit from my function. I have not been able to figure out the Expand-Archive cmdlet where all activity is taking place in storage accounts (no local path c:\, etc)
if ($blob.name.StartsWith($blobMatch)) {
$destFile
$blobName = $blob.name
#Get-AzureStorageFileContent
#Expand-Archive -Path $blob -DestinationPath $destFile
Start-AzureStorageFileCopy `
-SrcBlobName $blob.name `
-SrcContainerName $sourceContainer `
-DestShareName $destinationShare `
-DestFilePath $destFile `
-Context $ctx `
-DestContext $ctx `
-Force
}
I think you won't be able to unzip the file in storage file share, you can put there zipped content, but you can't make the Storage Service unzip it. The workaround is downloading the file first, then unzip it.
Here is a similar issue, you could refer to it.

Using AZCopy to download blobs from all containers at one time

I need to copy all containers and blobs from an Azure storage account. I've figured out how to download one container at a time, but this is quite tedious. I would like to download all of them at one time.
Anyone out there have information on this?
Thanks,
James
As far as I know, we couldn't use azcopy to copy all the container's file to local at one time.
As you say, we could download one container to local at one time.
Here is a work around:
We could firstly list all the container by using command-line tool(e.g powershell)(Get-AzureStorageContainer), then use foreach to download the file to local.
The powershell script like this:
$SourceStorageAccountName = "<SourceStorageAccountName>"
$SourceStorageKey = Get-AzureStorageKey -StorageAccountName $SourceStorageAccountName
$StorageContext = New-AzureStorageContext -StorageAccountName $SourceStorageKey.StorageAccountName -StorageAccountKey $SourceStorageKey.Primary
$containers = Get-AzureStorageContainer -Context $StorageContext
foreach ($c in $containers) {
"Transfer container " + $c.Name
$cmd = "C:\'Program Files (x86)'\'Microsoft SDKs'\Azure\AzCopy\AzCopy.exe /Source:" + $c.CloudBlobContainer.Uri.AbsoluteUri + " /Dest: +"youfile path"
Invoke-Expression $cmd
}

Resources