What happens to files downloaded in WebJob - azure

I am working with some sensitive files (mostly images) in my WebJob. My WebJob downloads the files from Azure Blob (container 1), does some processing and uploads to Azure Blob (container 2).
Because these files are sensitive in nature, I want to be 100% sure that WebJob deletes them once the Job is completed running.
Can someone tell me what happens to files downloaded in WebJob?
My download code looks like this ...
var stream = new MemoryStream();
using (StorageService storage = CreateStorageClient())
{
var bucketname = "container1";
var objectToDownload = storage.Objects.Get(bucketname, "files/img1.jpg").Execute();
var downloader = new MediaDownloader(storage);
downloader.Download(objectToDownload.MediaLink, stream);
}
Here CreateStorageClient() is my utility method which creates a StorageService object.

Solved using #lopezbertoni comment.
Also found relevant question which also helped - Azure Webjob - accessing local file system

Related

C# Azure Storage download issue

I'm working on moving our files from the database to Azure Storage for the files themselves. We are keeping the folder structure and file information in our SQL DB.
The control we are using is an ASPxFileManager from Dev Express which does not allow you do to async functionality because it isn't built in.
The code I'm using to pull the blob from Azure and return a stream is below:
using (var ms = new MemoryStream())
{
var result = blobItem.DownloadStreaming().GetRawResponse();
result.ContentStream.CopyTo(ms);
return ms.ToArray();
}
It appears on large files, it is downloading it to the server to get the stream AND then sending that stream to the client. So I think it is processing twice.
BlobContainerClient container = GetBlobContainer(containerInfo);
BlobClient blobItem = container.GetBlobClient(fileSystemFileDataId);
using (var ms = new MemoryStream())
{
blobItem.DownloadTo(ms);
return ms.ToArray();
}
I looked at using their CloudFileSystemProviderBase to use GetDownloadURL, but it only allows you to download single files. This works fine for single files as we can return a url with a SAS token etc.
We still need to support downloading multiple files though.
Is there a way in azure to NOT download to the file system and just obtain ONE stream to send back for the Dev Express ASPxFileManager to process?
I liked the GetDownloadUrl call from their CloudFileSystemProviderBase, because it doesn't block the main UI thread and allows the user to continue to work in the app while large files are downloading.
Main question: Is there a way to return a stream from azure where it does not have to download to the server first?
(Note: I've already been talking to DevExpress about this issue)
UPDATE 2:
The code below obtains it to a stream, but does it download it on the server and then sends it to the client? Or just does the obtain the stream once? I think the code above I was using does it twice?
Also, this code uses the WindowsAzure.Storage, which is depecated.
So what would be the nuget/C# code for the correct way now days?
if (!String.IsNullOrWhiteSpace(contentDisposition.FileName))
{
string connectionString = ConfigurationManager.AppSettings["my-connection-string"];
string containerName = ConfigurationManager.AppSettings["my-container"];
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer blobContainer = blobClient.GetContainerReference(containerName);
CloudBlockBlob blob = blobContainer.GetBlockBlobReference(contentDisposition.FileName);
stream = blob.OpenWrite();
}

How to upload a large file in chunks with parallelism in Azure SDK v12?

In Azure SDK v11, we had the option to specify the ParallelOperationThreadCount through the BlobRequestOptions. In Azure SDK v12, I see that the BlobClientOptions does not have this, and the BlockBlobClient (previously CloudBlockBlob in Azure SDK v11), there is only mention of parallelism in the download methods.
We have three files: one 200MB, one 150MB, and one 20MB. For each file, we want the file to be split into blocks and have those uploaded in parallel. Is this automatically done by the BlockBlobClient? If possible, we would like to do these operations for the 3 files in parallel as well.
You also can take use of StorageTransferOptions in v12.
The sample code below:
BlobServiceClient blobServiceClient = new BlobServiceClient(conn_str);
BlobContainerClient containerClient= blobServiceClient.GetBlobContainerClient("xxx");
BlobClient blobClient = containerClient.GetBlobClient("xxx");
//set it here.
StorageTransferOptions transferOptions = new StorageTransferOptions();
//transferOptions.MaximumConcurrency or other settings.
blobClient.Upload("xxx", transferOptions:transferOptions);
By the way, for uploading large files, you can also use Microsoft Azure Storage Data Movement Library for better performance.
Using Fiddler, I verified that BlockBlobClient does indeed upload the files in chunks without needing to do any extra work. For doing each of the major files in parallel, I simply had a task for each one, added it to a list tasks and used await Task.WhenAll(tasks).

How to get lease on Azure storage file?

I have multi instance web application in azure cloud. I want to make sure that only one of them read and process file from azure storage file. I thought using lease option I can make it happen.
However I am unable to find proper methods for this operation. I found this REST API option but I am looking for something like this.
This example shows for blob storage. What are file storage options? I am using latest file storage nuget.
EDIT:
This is how I am getting data from storage file and processing the data.
var storageAccount = CloudStorageAccount.Parse("FileStorageConnectionString");
var fileClient = storageAccount.CreateCloudFileClient();
var share = fileClient.GetShareReference("StorageFileShareName");
var file = this.share.GetRootDirectoryReference().GetFileReference("file.txt");
---processing file and renaming the file after processing---
How to implement lease on this file so that no other cloud instance get the access to this file? After processing the file I will also rename it.
Please try something like below:
string connectionString = "DefaultEndpointsProtocol=https;AccountName=<account-name>;AccountKey=<account-key>;EndpointSuffix=core.windows.net;";
var shareClient = new ShareClient(connectionString, "test");
shareClient.CreateIfNotExists();
var fileClient = shareClient.GetRootDirectoryClient().GetFileClient("test.txt");
var bytes = Encoding.UTF8.GetBytes("This is a test");
fileClient.Create(bytes.Length);
fileClient.Upload(new MemoryStream(bytes));
var leaseClient = fileClient.GetShareLeaseClient();
Console.WriteLine("Acquiring lease...");
var leaseId = leaseClient.Acquire();
Console.WriteLine("Lease Id: " + leaseId);
Console.WriteLine("Breaking lease...");
leaseClient.Break();
Console.WriteLine("Lease broken...");
It makes use of Azure.Storage.Files.Shares Nuget package.

Automating App Deployment in Azure with LocalResource

I'm currently attempting to automate the deployment of an application to an Azure Worker role by pulling a file into the role from blob storage and working with it via a batch script, also located in blob storage. I'm using onStart to accomplish this. Here's a reduced version of my onStart method:
Getting ready to pull the files down:
public override bool OnStart()
{
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
container.CreateIfNotExist();
CloudBlob file = container.GetBlobReference("file.bat");
Actually getting the files into the role:
LocalResource localResource = RoleEnvironment.GetLocalResource("localStore");
string filePath = System.IO.Path.Combine(localResource.RootPath, "file.bat");
using (var fileStream = System.IO.File.OpenWrite(#filePath))
{
file.DownloadToStream(fileStream);
}
This is how I get the batch file and the dependencies into the role. My problem now is - originally, I built the batch file with the assumption that the other files would be dropped right on C:\. For example - C:\installer.exe, C:\archive.zip, etc. But now the files are in localStorage.
I'm thinking I can either A) Somehow tell the batch file where localStorage is by dynamically writing the script onStart, or B) change localStorage to use C:\.
I'm not sure how to do either, or what the best thing to do here would be. Thoughts?
I would not change the LocalStorage to use C: (how would you do this anyways?). Take a look at Steve's blogpost: Using a Local Storage Resource From a Startup Task. He explains how you can get a LocalResource using powershell (and even call that script from a batch file).
And why not use the Windows Azure Bootstrapper? This is a little tool that can help you with the configuration of your role without having to write any code, you simply call it from a startup task and it can download files (also from blob storage like you're doing), work with local resources, ...
bootstrapper.exe -get http://download.microsoft.com/download/F/3/1/F31EF055-3C46-4E35-AB7B-3261A303A3B6/AspNetMVC3ToolsUpdateSetup.exe -lr $lr(temp) -run $lr(temp)\AspNetMVC3ToolsUpdateSetup.exe -args /q
Note: Instead of using absolute references in your batch file, make it use relative paths using %~dp0

Running native code on Azure

I am trying to run a C executable on Azure. I have many workerRoles and they continuously check a Job Queue. If there is a job in the queue, a worker role runs an instance of the C executable as a process according to the command line arguments stored in a job class. The C executable creates some log files normally. I do not know how to access those created files. What is the logic behind it? Where are the created files stored? Can anyone explain me? I am new to azure and C#.
One other problem is that all of the working instances of the C executable need to read a data file. How can I distribute that required file?
First, realize that in Windows Azure, your worker role is simply running inside a Windows 2008 Server environment (either SP2 or R2). When you deploy your app, you would deploy your C executable as well (or grab it from blob storage, but that's a bit more advanced). To find out where your app lives on disk, call Environment.GetEnvironmentVariable("RoleRoot") - that returns a path. You'd typically have your app sitting in a folder called AppRoot under the role root. You'd find your C executable there.
Next, you'll want your app to write its files to an output directory you specify on the command line. You can set up storage in your local VM with your role's properties. Look at the Local Storage tab, and configure a named local storage area:
Now you can get the path to that storage area, in code, and pass it as a command line argument:
var outputStorage = RoleEnvironment.GetLocalResource("MyLocalStorage");
var outputFile = Path.Combine(outputStorage.RootPath, "myoutput.txt");
var cmdline = String.Format("--output {0}", outputFile);
Here's an example of launching your myapp.exe process, with command line arguments:
var appRoot = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot")
+ #"\", #"approot");
var myProcess = new Process()
{
StartInfo = new ProcessStartInfo(Path.Combine(appRoot, #"myapp.exe"), cmdline)
{
CreateNoWindow = false,
UseShellExecute = false,
WorkingDirectory = appRoot
}
};
myProcess.WaitForExit();
Normally you'd set CreateNoWindow to true, but it's easier to debug if you can see the command shell window.
Last thing: Once your app is done creating the file, you'll want to either:
Process it and delete it (it's not in a durable place so eventually it'll disappear)
Change your storage to use a Cloud Drive (durable storage)
Copy your file to a blob (durable storage)
In production, you'll want to add exception-handling, and you can re-route stdout and stderr to be captured. But this sample code should be enough to get you started.
OOPS - one more 'one more thing': When adding your 'myapp.exe' to your project, be SURE to go to its Properties, and set 'Copy to Output Directory' to 'Copy Always' - otherwise your myapp.exe file won't end up in Windows Azure and you'll wonder why things don't work.
EDIT: Pushing results to a blob - a quick example
First get set up a storage account and add to your role's Settings. Say you called it 'AzureStorage' - now set it up in code, get a reference to a blob container, get a reference to a blob within that container, and then perform a file upload to the blob:
CloudStorageAccount storageAccount = CloudStorageAccount.FromConfigurationSetting("AzureStorage");
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer outputfiles = blobClient.GetContainerReference("outputfiles");
outputfiles.CreateIfNotExist();
var blobname = "myoutput.txt";
var blob = outputfiles.GetBlobReference(blobname);
blob.UploadFile(outputFile);
In Azure land you shouldn't write to the file system. You should write to SQL Azure, Table Storage or most likely in this case Blob storage (basically, I think you should think of blob storage as the old file system)
This is because:
You could have multiple instances running and you will end up having different files on different instances (which are just virtual machines)
Your instance could potentially be moved at any moment and you would lose the info on the file system as it's not part of your deployment package.
Using one of the three storage options will provide a central repository for all of your instances to access and it will be persisted over a redeployment.

Resources