How to unit test azure functions blob storage trigger option? - azure

I have an azure function which is triggered when a zip file is uploaded to an azure blob storage container. I unzip the file in memory and process the contents and add/update the result into a database. While for the db part I can use the in memory db option. Somehow am not too sure how to simulate the blob trigger for unit testing this azure function.
All the official samples and some blogs mostly talk about Http triggers(mocking httprequest) and queue triggers (using IAsynCollection).
[FunctionName("AzureBlobTrigger")]
public void Run([BlobTrigger("logprocessing/{name}", Connection = "AzureWebJobsStorage")]Stream blobStream, string name, ILogger log)
{
log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {blobStream.Length} Bytes");
//processing logic
}

There is a project about Unit test/ Integration test about azure function including blob trigger in github, please take a try at your side. Note that the unit test code is in FunctionApp.Tests folder.
Some code snippet about blob trigger from github:
unit test code of BlobFunction.cs
namespace FunctionApp.Tests
{
public class BlobFunction : FunctionTest
{
[Fact]
public async Task BlobFunction_ValidStreamAndName()
{
Stream s = new MemoryStream();
using(StreamWriter sw = new StreamWriter(s))
{
await sw.WriteLineAsync("This is a test");
BlobTrigger.Run(s, "testBlob", log);
}
}
}
}

Related

storage account - export blob size and date [duplicate]

I want to create an Azure function that deletes files from azure blob storage when last modifed older than 30 days.
Can anyone help or have a documentation to do that?
Assuming your storage account's type is either General Purpose v2 (GPv2) or Blob Storage, you actually don't have to do anything by yourself. Azure Storage can do this for you.
You'll use Blob Lifecycle Management and define a policy there to delete blobs if they are older than 30 days and Azure Storage will take care of deletion for you.
You can learn more about it here: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts.
You can create a Timer Trigger function, fetch the list of items from the Blob Container and delete the files which does not match your criteria of last modified date.
Create a Timer Trigger function.
Fetch the list of blobs using CloudBlobContainer.
Cast the blob items to proper type and check LastModified property.
Delete the blob which doesn't match criteria.
I hope that answers the question.
I have used HTTP as the trigger as you didn't specify one and it's easier to test but the logic would be the same for a Timer trigger etc. Also assumed C#:
[FunctionName("HttpTriggeredFunction")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
[Blob("sandbox", Connection = "StorageConnectionString")] CloudBlobContainer container,
ILogger log)
{
log.LogInformation("C# HTTP trigger function processed a request.");
// Get a list of all blobs in your container
BlobResultSegment result = await container.ListBlobsSegmentedAsync(null);
// Iterate each blob
foreach (IListBlobItem item in result.Results)
{
// cast item to CloudBlockBlob to enable access to .Properties
CloudBlockBlob blob = (CloudBlockBlob)item;
// Calculate when LastModified is compared to today
TimeSpan? diff = DateTime.Today - blob.Properties.LastModified;
if (diff?.Days > 30)
{
// Delete as necessary
await blob.DeleteAsync();
}
}
return new OkObjectResult(null);
}
Edit - How to download JSON file and deserialize to object using Newtonsoft.Json:
public class MyClass
{
public string Name { get; set; }
}
var json = await blob.DownloadTextAsync();
var myClass = JsonConvert.DeserializeObject<MyClass>(json);

How to trigger Synapse Analytics Workspace pipeline using C# or any other language?

Is there any way through which we can trigger the Azure Synapse Analytics Pipeline (built-in Azure Data Factory) using C# or any language?
Using the following URL and code I am able to trigger general (not part of Synapse) Azure Data Factory successfully. But when I call the same method I am not sure what will go under name of Data Factory (property: dataFactoryName)? I tried giving workspace name but it does not work.
Built-in ADF can be triggered using Blob trigger but thing is that I have many parameters and those cannot be passed through files stored in Blob.
URL: https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
Code: client.Pipelines.CreateRunWithHttpMessagesAsync(resourceGroup, dataFactoryName, pipelineName, parameters)
According to the ADF team, there is a different SDK for Synapse Analytics. I'm in the same position, but haven't had a chance to generate a code sample yet.
It looks like you'll need the PipelineClient class to create a run, and the PipelineRunClient class to monitor it.
If you get this working, please post the sample code for future searchers.
I have taken guidance from the following URL
https://techcommunity.microsoft.com/t5/azure-synapse-analytics/how-to-start-synapse-pipeline-from-rest-api/ba-p/1684836
and created a C# version of it here -
https://github.com/pankajsingh23/TriggerSynapsePipeline
public async Task<string> ExecuteSynapsePipeline(IDictionary<string, object?> parameters, string? synapseEndPoint = null, string? pipeline = null)
{
synapseEndPoint ??= _Settings.SynapseExtractEndpoint;
pipeline ??= _Settings.SynapseExtractPipeline;
_Logger.LogInformation($"!!Execute synapse pipeline called with parameters: {parameters}");
Response<CreateRunResponse?>? result = null;
try
{
PipelineClient client = new(new Uri(synapseEndPoint), new DefaultAzureCredential());
_Logger.LogInformation($"!!Pipeline {client} client resolved");
result = await client.CreatePipelineRunAsync(
pipeline,
parameters: parameters);
_Logger.LogInformation($"!!{pipeline} {result?.Value?.RunId ?? "Failed to get RunId"} called");
}
catch(Exception e)
{
string errorToLog = $"{e.Message} \n";
if(e.InnerException is not null && !string.IsNullOrEmpty(e.InnerException.Message))
{
errorToLog += e.InnerException.Message;
}
_Errors.LogError(options.OrchestratorId, errorToLog);
_Logger.LogError(errorToLog);
}
return result?.Value?.RunId ?? "";
}

Azure BlobTrigger is firing > 5 times for each new blob

The following trigger removes exif data from blobs (which are images) after they are uploaded to azure storage. The problem is that the blob trigger fires at least 5 times for each blob.
In the trigger the blob is updated by writing a new stream of data to it. I had assumed that blob receipts would prevent further firing of the blob trigger against this blob.
[FunctionName("ExifDataPurge")]
public async System.Threading.Tasks.Task RunAsync(
[BlobTrigger("container/{name}.{extension}", Connection = "")]CloudBlockBlob image,
string name,
string extension,
string blobTrigger,
ILogger log)
{
log.LogInformation($"C# Blob trigger function Processed blob\n Name:{name}");
try
{
var memoryStream = new MemoryStream();
await image.DownloadToStreamAsync(memoryStream);
memoryStream.Position = 0;
using (Image largeImage = Image.Load(memoryStream))
{
if (largeImage.Metadata.ExifProfile != null)
{
//strip the exif data from the image.
for (int i = 0; i < largeImage.Metadata.ExifProfile.Values.Count; i++)
{
largeImage.Metadata.ExifProfile.RemoveValue(largeImage.Metadata.ExifProfile.Values[i].Tag);
}
var exifStrippedImage = new MemoryStream();
largeImage.Save(exifStrippedImage, new SixLabors.ImageSharp.Formats.Jpeg.JpegEncoder());
exifStrippedImage.Position = 0;
await image.UploadFromStreamAsync(exifStrippedImage);
}
}
}
catch (UnknownImageFormatException unknownImageFormatException)
{
log.LogInformation($"Blob is not a valid Image : {name}.{extension}");
}
}
Triggers are handled in such a way that they track which blobs have been processed by storing receipts in container azure-webjobs-hosts. Any blob not having a receipt, or an old receipt (based on blob ETag) will be processed (or reprocessed).
since you are calling await image.UploadFromStreamAsync(exifStrippedImage); it gets triggered (assuming its not been processed)
When you call await image.UploadFromStreamAsync(exifStrippedImage);, it will update blob so the blob function will trigger again.
You can try to check the existing CacheControl property on the blob to not update it if it has been updated to break the loop.
// Set the CacheControl property to expire in 1 hour (3600 seconds)
blob.Properties.CacheControl = "max-age=3600";
So I've addressed this by storing a Status in metadata against the blob as it's processed.
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-container-properties-metadata
The trigger then contains a guard to check for the metadata.
if (image.Metadata.ContainsKey("Status") && image.Metadata["Status"] == "Processed")
{
//an subsequent processing for the blob will enter this block.
log.LogInformation($"blob: {name} has already been processed");
}
else
{
//first time triggered for this blob
image.Metadata.Add("Status", "Processed");
await image.SetMetadataAsync();
}
The other answers pointed me in the right direction. I think it is more correct to use the metadata. Storing an ETag elsewhere seems redundant when we can store metadata. The use of "CacheControl" seems like too much of a hack, other developers might be confused as to what and why I have done it.

Delete files older than X number of days from Azure Blob Storage using Azure function

I want to create an Azure function that deletes files from azure blob storage when last modifed older than 30 days.
Can anyone help or have a documentation to do that?
Assuming your storage account's type is either General Purpose v2 (GPv2) or Blob Storage, you actually don't have to do anything by yourself. Azure Storage can do this for you.
You'll use Blob Lifecycle Management and define a policy there to delete blobs if they are older than 30 days and Azure Storage will take care of deletion for you.
You can learn more about it here: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts.
You can create a Timer Trigger function, fetch the list of items from the Blob Container and delete the files which does not match your criteria of last modified date.
Create a Timer Trigger function.
Fetch the list of blobs using CloudBlobContainer.
Cast the blob items to proper type and check LastModified property.
Delete the blob which doesn't match criteria.
I hope that answers the question.
I have used HTTP as the trigger as you didn't specify one and it's easier to test but the logic would be the same for a Timer trigger etc. Also assumed C#:
[FunctionName("HttpTriggeredFunction")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
[Blob("sandbox", Connection = "StorageConnectionString")] CloudBlobContainer container,
ILogger log)
{
log.LogInformation("C# HTTP trigger function processed a request.");
// Get a list of all blobs in your container
BlobResultSegment result = await container.ListBlobsSegmentedAsync(null);
// Iterate each blob
foreach (IListBlobItem item in result.Results)
{
// cast item to CloudBlockBlob to enable access to .Properties
CloudBlockBlob blob = (CloudBlockBlob)item;
// Calculate when LastModified is compared to today
TimeSpan? diff = DateTime.Today - blob.Properties.LastModified;
if (diff?.Days > 30)
{
// Delete as necessary
await blob.DeleteAsync();
}
}
return new OkObjectResult(null);
}
Edit - How to download JSON file and deserialize to object using Newtonsoft.Json:
public class MyClass
{
public string Name { get; set; }
}
var json = await blob.DownloadTextAsync();
var myClass = JsonConvert.DeserializeObject<MyClass>(json);

How dispose connections to services such as Azure Storage

My function stores data up Azure Data Lakta Storage Gen 1.
But I got bug An error occurred while sending the request.
When I investigated,I knowed that my connection in azure function overcome 8k then it's broken.
Here is my code(Append to file Azure DataLakeStorage Gen 1)
//This for authorizing azure data lake storage gen 1
await InitADLInfo(adlsAccountName);
DataLakeStoreFileSystemManagementClient _adlsFileSystemClient;
//Here is my code to append data lake storage gen 1
using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(buffer)))
{
await _adlsFileSystemClient.FileSystem.AppendAsync(_adlsAccountName, path, stream);
}
How to dispose that when every append ends.
I try to dispose
_adlsFileSystemClient.Dispose();
But it didn't dispose anything.My connection will up.
I read this
https://www.troyhunt.com/breaking-azure-functions-with-too-many-connections/1
and I have made connection down.Just use DO NOT create a new client with every function invocation.
Example Code :
// Create a single, static HttpClient
private static HttpClient httpClient = new HttpClient();
public static async Task Run(string input)
{
var response = await httpClient.GetAsync("http://example.com");
// Rest of function
}

Resources