Azure Batch - OutputFiles not working - Files not uploaded to Blob - azure

I'm trying to follow this article to persist task data to Azure Storage.
Here is my code to set the OutputFiles:
function Get-TaskOutputFiles
{
param(
[Parameter(Mandatory)]
[ValidateNotNullOrEmpty()]
[System.Object] $LongLivingWriteSasFullUrl,
[Parameter(Mandatory)]
[ValidateNotNullOrEmpty()]
[string] $TaskName
)
#public PSOutputFileBlobContainerDestination(string containerUrl, string path = null)
$blobContainerDestination = [PSOutputFileBlobContainerDestination]::New(
$LongLivingWriteSasFullUrl,
$TaskName
)
$completion_fileUploadOption = [PSOutputFileUploadOptions]::New("TaskCompletion")
$outputFileList = [Collections.Generic.List[PSOutputFile]]::New()
#public PSOutputFile(string filePattern, PSOutputFileDestination destination, PSOutputFileUploadOptions uploadOptions)
$outputFileList.Add([PSOutputFile]::New("../*.txt", $blobContainerDestination, $completion_fileUploadOption))
$outputFileList
}
When the task is completed, I can see this file fileuploadout.txt with this content:
2022-07-20 10:55:49,418 System default encoding=utf-8, filesystem
encoding=utf-8, default locale encoding=cp1252 System default
encoding=utf-8, filesystem encoding=utf-8, default locale
encoding=cp1252 2022-07-20 10:55:49,418 Resolved mappings: [[]]
Resolved mappings: [[]] 2022-07-20 10:55:49,418 Done uploading,
exiting... Done uploading, exiting...
However, files are not uploaded to the storage account...
Variables values:
destination url = https://STORAGE_ACCOUNT_NAME.blob.core.windows.net/batch?###sp=rwdl&sig=###
Output files file pattern = ..\*.txt
I'll really appreciate any help with this issue, thanks!

Related

Azure Datalake Storage list first level of directories in container

I have a container named foo and a couple of directories in it with the following hierarchy:
foo\dir1
foo\dir2
...
How can I retrieve only the dir1 & dir2 directories? Currently I'm using Azure.Storage.Blobs (12.9.1) library.
What I've tried:
var blobContainerClient = blobServiceClient.GetBlobContainerClient("foo");
var resultSegment = blobContainerClient.GetBlobs().AsPages();
IList<string> blobs = new List<string>();
foreach(Azure.Page<BlobItem> blobPage in resultSegment)
{
foreach(BlobItem blobItem in blobPage.Values)
{
blobs.Add(blobItem.Name);
}
}
return blobs;
}
This returns recursively all files that I have in the foo container. I need to mention that this is a hierarchical namespace storage, and I've tried this solution but it doesn't work because each directory is considered to be a blob I think
I found a solution, but I do not know if this is the best one. When another solution will appear I will delete this.
So in a hierarchical storage account (data lake) even a directory is considered to be a blob if I got it correctly. In that case, I observed that the directories have the contentLength = 0 and contentHash is a byte[0]. With these assumptions in mind I managed to do the following:
var blobContainerClient = blobServiceClient.GetBlobContainerClient("foo");
return blobContainerClient.GetBlobs()
.Where(b => b.Properties.ContentLength == 0 && b.Properties.ContentHash.Length == 0)
.Select(b => b.Name)
.ToList();

terraform resource s3 upload file is not updated

I am using terraform to upload a file with contents to s3.However, when the content changes, I need to update the s3 file as well. But since the state file stores that the s3 upload was completed, it doesn't upload a new file.
resource "local_file" "timestamp" {
filename = "timestamp"
content = "${timestamp()}"
}
resource "aws_s3_bucket_object" "upload" {
bucket = "bucket"
key = "date"
source = "timestamp"
}
expected:
aws_s3_bucket_object change detected
aws_s3_bucket_object.timestamp Creating...
result:
aws_s3_bucket_object Refreshing state...
When you give Terraform the path to a file rather than the direct content to upload, it is the name of the file that decides whether the resource needs to be updated, rather than the file's contents.
For a short piece of data as shown in your example, the easiest solution is to specify the data directly in the resource configuration:
resource "aws_s3_bucket_object" "upload" {
bucket = "bucket"
key = "date"
content = "${timestamp()}"
}
If your file is actually too large to reasonably load into a string variable, or if it contains raw binary data that cannot be loaded into a string, you can set the etag of the object to an MD5 hash of the content so that the provider can see when the content has changed:
resource "aws_s3_bucket_object" "upload" {
bucket = "bucket"
key = "date"
source = "${path.module}/timestamp"
etag = "${filemd5("${path.module}/timestamp")}"
}
By setting the etag, any change to the content of the file will cause this hash result to change and thus allow the provider to detect that the object needs to be updated.

Why doesn't Azure API list the blobs that is named /folder/folder/file?

I wanted to create folders and sub-folders, I found this workaround:
but when I listed them: using this code (source):
foreach (IListBlobItem item in Container.ListBlobs(null, false))
{
if (item.GetType() == typeof(CloudBlockBlob))
{
CloudBlockBlob blob = (CloudBlockBlob)item;
Console.WriteLine("Block blob of length {0}: {1}", blob.Properties.Length, blob.Uri);
}
else if (item.GetType() == typeof(CloudPageBlob))
{
CloudPageBlob pageBlob = (CloudPageBlob)item;
Console.WriteLine("Page blob of length {0}: {1}", pageBlob.Properties.Length, pageBlob.Uri);
}
else if (item.GetType() == typeof(CloudBlobDirectory))
{
CloudBlobDirectory directory = (CloudBlobDirectory)item;
Console.WriteLine("Directory: {0}", directory.Uri);
}
}
It only shows parent folders and blobs in the root container.
I was expecting to get them all as blobs since this is virtual directory not real,
for example I have this file
https://account.blob.core.windows.net/container/Accounts/Images/1/acc.jpg
but it doesn't show, it just show:
https://account.blob.core.windows.net/container/Accounts
and
https://account.blob.core.windows.net/container/anyfile
Do I have to request sub-folders inside parent folders to reach the file?
Please try by specifying 2nd parameter as true in ListBlobs method. This parameter indicates if you want flat blob listing (true) or hierarchical blob listing (false).
From the documentation link:
useFlatBlobListing
Type: System.Boolean
A boolean value that specifies whether to list blobs in a flat
listing, or whether to list blobs hierarchically, by virtual
directory.
If someone is looking for such flat file listing with the latest Azure.Storage.Blobs SDK, this is how you do it:
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
BlobContainerClient blobContainerClient = blobServiceClient.GetBlobContainerClient(resource.Name);
var blobItemList = blobContainerClient.GetBlobs(); // This will get blobs without any hierarchy

Azure storage account backup (tables and blobs)

I need to periodically backup all blobs and tables in an Azure storage account so that we can restore all that data at a later time if we for any reason corrupt our data.
While I trust that data that we store in Azure is durable and recoverable in case of data center failures, we still need data in our storage accounts to be backed up to prevent from accidental overwrites and deletions (the human error factor).
We have implemented a solution for this that periodically lists all blobs and copies them over to a backup storage account. When a blob has been modified or deleted we simply create a snapshot of the old version in the backup account.
This approach has worked OK for us. But it only handles blobs, not table entities. We now need to support backing up table entities too.
Faced with this task now, I'm thinking that someone else probably have had this requirement before and come up with a smart solution. Or maybe there are commercial products that will do this?
It is not a requirement that the backup target is another Azure storage account. All we need is a way to recover all blobs and tables as they were at the time we ran the backup.
Any help is appreciated!
There are a variety of ways this can be handled.
If you want to do this on your own you can use the storage libraries and write code to just run through the table and pull down the data.
There are also a few services that can do this for you as well (FULL Disclosure: I work for a company that provides this as a service). Here is an article by Troy Hunt talking about our option: http://www.troyhunt.com/2014/01/azure-will-save-you-from-unexpected_28.html. We also have PowerShell Cmdlets that can pull table data down for you (cerebrata.com). To be fair we are not the only players in this space and there are others who have similar services.
Finally, at Tech Ed they announced that the AZCopy tool will be updated later this year so that it can pull down entire tables, which is just automating the reading through tables and pulling them down. There is currently no way to "Snapshot" a table so all of the methods above will result in a copy as the data is copied over, it might have changed in the source table by the time the copy is completed.
I've recently put together a simple solution to backup table storage. It uses the AzCopy tool and the Storage Rest Api to pull down a list of all the tables and do a backup to JSON.
Hope it's useful!
param(
[parameter(Mandatory=$true)]
[string]$Account,
[parameter(Mandatory=$true)]
[string]$SASToken,
[parameter(Mandatory=$true)]
[string]$OutputDir
)
$ErrorActionPreference = "Stop"
##Example Usage
#.\Backup-TableStorage.ps1 -OutputDir "d:\tablebackup" -Account "examplestorageaccount" -SASToken "?sv=2015-04-05&ss=t&srt=sco&sp=rl&st=2016-04-08T07%3A44%3A00Z&se=2016-04-09T07%3A55%3A00Z&sig=CNotAREALSIGNITUREBUTYOURESWOUDLGOHERE3D"
if (-not (Test-Path "${env:ProgramFiles(x86)}\Microsoft SDKs\Azure\AzCopy\AzCopy.exe"))
{
throw "Azcopy not installed - get it from here: https://azure.microsoft.com/en-gb/documentation/articles/storage-use-azcopy/"
}
Write-host ""
Write-Host "Starting backup for account" -ForegroundColor Yellow
Write-host "--------------------------" -ForegroundColor Yellow
Write-Host " -Account: $Account"
Write-Host " -Token: $SASToken"
$response = Invoke-WebRequest -Uri "https://$Account.table.core.windows.net/Tables/$SASToken"
[xml]$tables = $response.Content
$tableNames = $tables.feed.entry.content.properties.TableName
Write-host ""
Write-host "Found Tables to backup" -ForegroundColor Yellow
Write-host "--------------------------" -ForegroundColor Yellow
foreach ($tableName in $tableNames)
{
Write-Host " -Table: $tableName"
}
foreach ($tableName in $tableNames)
{
$url = "https://$Account.table.core.windows.net/$tableName"
Write-host ""
Write-Host "Backing up Table: $url"-ForegroundColor Yellow
Write-host "--------------------------" -ForegroundColor Yellow
Write-host ""
& "${env:ProgramFiles(x86)}\Microsoft SDKs\Azure\AzCopy\AzCopy.exe" /Source:$url /Dest:$OutputDir\$account\ /SourceSAS:$SASToken /Z:"$env:temp\$([guid]::NewGuid()).azcopyJournal"
Write-host ""
Write-host "Backup completed" -ForegroundColor Green
Write-host ""
Write-host ""
}
For more details on usage have a look here:
https://gripdev.wordpress.com/2016/04/08/backup-azure-table-storage-quick-powershell-script/
You can backup any Azure Table Storage table (not blobs though) with free software like Slazure Light. The following C# code backup all your Azure Tables to json files:
Download NuGet packages first:
Install-Package Azure.Storage.Slazure.Light
Create a console application in Visual Studio and add the following code:
using System;
using System.Linq;
using Microsoft.WindowsAzure.Storage.Table;
using Newtonsoft.Json;
using SysSurge.Slazure.AzureTableStorage;
namespace BackupAzureTableStore
{
class Program
{
/// <summary>
/// Usage: BackupAzureTableStore.exe "UseDevelopmentStorage=true"
/// </summary>
/// <param name="args"></param>
static void Main(string[] args)
{
var storage = new DynStorage(args.Length == 0 ? "UseDevelopmentStorage=true" : args[0]);
foreach (var cloudTable in storage.Tables)
{
var tableName = cloudTable.Name;
var fileName = $"{tableName}.json";
using (var file = new System.IO.StreamWriter(fileName))
{
var dynTable = new DynTable(storage.StorageAccount, tableName);
TableContinuationToken token = null; // Continuation token required if > 1,000 rows per table
do
{
var queryResult =
dynTable.TableClient.GetTableReference(tableName)
.ExecuteQuerySegmented(new TableQuery(), token);
file.WriteLine("{{{0} : [", JsonConvert.SerializeObject(tableName));
var cntr = 0;
foreach (var entity in queryResult.Results)
{
var dynEntity = dynTable.Entity(entity.PartitionKey, entity.RowKey);
dynEntity.LoadAll().ToList(); // Force pre-downloading of all properties
file.WriteLine("{0}{1}", cntr++ > 0 ? "," : string.Empty,
JsonConvert.SerializeObject(dynEntity));
}
file.WriteLine("]}");
token = queryResult.ContinuationToken;
} while (token != null);
}
}
Console.WriteLine("Done. Press a key...");
Console.ReadKey();
}
}
}

Size of a sharepoint web application

How do you figure out the current size of the sharepoint web application? Better yet, the size of a site collection or a subsite.
I am planning to move a site collection from one farm to another. I need to plan the storage capacity first.
All content for SharePoint is stored in Content Database (unless you are using some sort of 3rd party external BLOB provider).
A site collection (aka top level site) is stored in a single content database but each content database can have multiple site collections.
You can work out the site of the content databases using SQL Management Studio, stored procedures (though beware that these can include overhead like logfiles or allocated but unused space)
You can use the open source SPUsedSpaceInfo utility
You can use free tools like BLOBulator.
Programatically you can loop through the folders and subwebs of an SPWeb and add up the size of all the contents
These are going to give slightly different results -e.g. one is looking at the size of documents stored, the other is the size of the content database storing those documents. None of these is going to include the files in C:\Inetpub\wwwroot\wss\VirtualDirectories\80 or C:\Program Files\Common Files\Microsoft Shared\web server extensions\12 but these are nearly always insignificant compared to the size of the documents stored in SharePoint.
You can see the size (in bytes) opening Sharepoint 2010 Management Shell (run it with as Administrator) and execute:
> Start-SPAssignment -Global
> (Get-SPSiteAdministration -Identity http://YourSharePointURL/urlToYourSite/).DiskUsed
Also, you would like to know each subsite size. To do that, run the following script under the Sharepoint Management Shell:
function GetWebSizes ($StartWeb)
{
$web = Get-SPWeb $StartWeb
[long]$total = 0
$total += GetWebSize -Web $web
$total += GetSubWebSizes -Web $web
$totalInMb = ($total/1024)/1024
$totalInMb = "{0:N2}" -f $totalInMb
$totalInGb = (($total/1024)/1024)/1024
$totalInGb = "{0:N2}" -f $totalInGb
write-host "Total size of all sites below" $StartWeb "is" $total "Bytes,"
write-host "which is" $totalInMb "MB or" $totalInGb "GB"
$web.Dispose()
}
function GetWebSize ($Web)
{
[long]$subtotal = 0
foreach ($folder in $Web.Folders)
{
$subtotal += GetFolderSize -Folder $folder
}
write-host "Site" $Web.Title "is" $subtotal "KB"
return $subtotal
}
function GetSubWebSizes ($Web)
{
[long]$subtotal = 0
foreach ($subweb in $Web.GetSubwebsForCurrentUser())
{
[long]$webtotal = 0
foreach ($folder in $subweb.Folders)
{
$webtotal += GetFolderSize -Folder $folder
}
write-host "Site" $subweb.Title "is" $webtotal "Bytes"
$subtotal += $webtotal
$subtotal += GetSubWebSizes -Web $subweb
}
return $subtotal
}
function GetFolderSize ($Folder)
{
[long]$folderSize = 0
foreach ($file in $Folder.Files)
{
$folderSize += $file.Length;
}
foreach ($fd in $Folder.SubFolders)
{
$folderSize += GetFolderSize -Folder $fd
}
return $folderSize
}
Then:
GetWebSizes -StartWeb <startURL>
I hope this will help you... :)
Source: http://get-spscripts.com/2010/08/check-size-of-sharepoint-2010-sites.html

Resources