Azure Durable orchestrator pass multiple zip files - azure

I am accepting multiple zip file which I want to process in orchestrator. My durable orchestrator is httptriggered.
I am able to access the file in http trigger as a multipartmemorystream but when I pass the same to durable orchestrator , orchestrator triggers but unable to get files for further processing.
Below is my http trigger function code to read the multiple files and pass to orchestrator
var data = req.Content.ReadAsMultipartAsync().Result;
string instanceId = await starter.StartNewAsync("ParentOrchestrator", data);
Orchestrator Trigger code:
public static async Task<List<string>> RunOrchestrator(
[OrchestrationTrigger] IDurableOrchestrationContext context
)
{
var files = context.GetInput<System.Net.Http.MultipartMemoryStreamProvider>();
To read the input I also tried to created class and pass the stream to the property so data can be serialized as JSON but did not work.
anything I am missing in code?
issue is how to get the zip files for processing.
I checked raw input under the orchestrator context , There I can see file name and other details

Passing files as input seems like a bad idea to me.
Those inputs will be loaded by the orchestrator from Table Storage/Blob Storage each time it replays.
Instead I would recommend that you upload the Zip files to Blob Storage and pass the blob URLs as input to the orchestrator.
Then you use the URLs as inputs to activities where the files are actually processed.

Orchestrator accept only the data which can be serialised. As memory stream is not serialisable it was not able to retrieve the data using GetInput<provider>().
I convert the memory stream to byte array as byte array can be serialised.
I read multiple article which was claiming that ,if we convert the file to byte array we loss the file metadata. Actually if you read file as stream and then to byte array then file data along with meta data get converted to byte array.
Here ,
read the httprequest message as multipartread this gives the object as multipartmemoryatreamprovider.
convert data to byte array
pass to orchestrator
4)receive the files as byte array by using GetInput<byte[]>()
In orchestrator convert byte array to stream MemoryStream ms = new MemoryStream(<input byte array>)

Related

add header at top in csv file of append blob

I am creating a pipeline in Azure data factory where I am using Function app as one of activity to transform data and store in append blob container as csv format .As I have taken 50 batches in for loop so 50 times my function app is to process data for each order.I am appending header in csv file with below logic.
//First I am creating file as per business logic //
//csveventcontent is my source data //
var dateAndTime = DateTime.Now.AddDays(-1);
string FileDate = dateAndTime.ToString("ddMMyyyy");
string FileName = _config.ContainerName + FileDate + ".csv";
StringBuilder csveventcontent = new StringBuilder();
OrderEventService obj = new OrderEventService();
//Now I am checking if todays file exists and if it doesn't we create it.//
if (await appBlob.ExistsAsync() == false)
{
await appBlob.CreateOrReplaceAsync(); //CreateOrReplace();
//Append Header
csveventcontent.AppendLine(obj.GetHeader());
}
Now the problem is header is appending so many times in csv file .Sometimes it is not appending at top.Probably due to parralel function app is running 50 times.
How I can fixed header at top only at one time.
I have tried with data flow and logic app also but unable to do it.If it can be handled through code that would be easier I guess.
I think you are right there. Its the concurrency of the function app that is causing the problem. The best approach would be to use a queue and process messages one by one. Or you could use a distributed lock to ensure only one function writes to the file at a time. You can use blob leases for this.
The Lease Blob operation creates and manages a lock on a blob for write and delete operations. The lock duration can be 15 to 60 seconds, or can be infinite.
Refer: Lease Blob Request Headers

What is the difference between BlobAttribute vs BlobTriggerAttribute?

Can anyone elaborate on the difference between BlobAttribute vs BlobTriggerAttribute?
[FunctionName(nameof(Run))]
public async Task Run(
[BlobTrigger("container/{name}")]
byte[] data,
[Blob("container/{name}", FileAccess.Read)]
byte[] data2,
string name)
{
}
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob?tabs=csharp#trigger
It seems BlobTrigger has all the functionality.
From the doc you could find the main difference is the blob contents are provided as input with BlobTrigger. It means it could only read the blob, not able to write the blob.
And the the BlobAttribute supports binding to single blobs, blob containers, or collections of blobs and supports read and write.
Also the BlobTrigger could only be used to read the blob when a new or updated blob is detected. And the Blob binding could be used in every kind function.
Further more information about these two binding you could check the binding code: BlobAttribute and BlobTriggerAttribute.

Optionally generate output with an Azure Function

I currently have a Timer triggered Azure Function that checks a data endpoint to determine if any new data has been added. If new data has been added, then I generate an output blob (which I return).
However, returning output appears to be mandatory. Whereas I'd only like to generate an output blob under specific conditions, I must do it all of the time, clogging up my storage.
Is there any way to generate output only under specified conditions?
If you have the blob output binding set to your return value, but you do not want to generate a blob, simply return null to ensure the blob is not created.
You're free to execute whatever logic you want in your functions. You may need to remove the output binding from your function (this is what is making the output required) and construct the connection to blob storage in your function instead. Then you can conditionally create and save the blob.

Is it possible to generate a unique BlobOutput name from an Azure WebJobs QueueInput item?

I have a continuous Azure WebJob that is running off of a QueueInput, generating a report, and outputting a file to a BlobOutput. This job will run for differing sets of data, each requiring a unique output file. (The number of inputs is guaranteed to scale significantly over time, so I cannot write a single job per input.) I would like to be able to run this off of a QueueInput, but I cannot find a way to set the output based on the QueueInput value, or any value except for a blob input name.
As an example, this is basically what I want to do, though it is invalid code and will fail.
public static void Job([QueueInput("inputqueue")] InputItem input, [BlobOutput("fileoutput/{input.Name}")] Stream output)
{
//job work here
}
I know I could do something similar if I used BlobInput instead of QueueInput, but I would prefer to use a queue for this job. Am I missing something or is generating a unique output from a QueueInput just not possible?
There are two alternatives:
Use IBInder to generate the blob name. Like shown in these samples
Have an autogenerated in the queue message object and bind the blob name to that property. See here (the BlobNameFromQueueMessage method) how to bind a queue message property to a blob name
Found the solution at Advanced bindings with the Windows Azure Web Jobs SDK via Curah's Complete List of Web Jobs Tutorials and Videos.
Quote for posterity:
One approach is to use the IBinder interface to bind the output blob and specify the name that equals the order id. The better and simpler approach (SimpleBatch) is to bind the blob name placeholder to the queue message properties:
public static void ProcessOrder(
[QueueInput("orders")] Order newOrder,
[BlobOutput("invoices/{OrderId}")] TextWriter invoice)
{
// Code that creates the invoice
}
The {OrderId} placeholder from the blob name gets its value from the OrderId property of the newOrder object. For example, newOrder is (JSON): {"CustomerName":"Victor","OrderId":"abc42"} then the output blob name is “invoices/abc42″. The placeholder is case-sensitive.
So, you can reference individual properties from the QueueInput object in the BlobOutput string and they will be populated correctly.

Azure Table Storage access time - inserting/reading from

I'm making a program that stores and reads from Azure tables some that are stored in CSV files. What I got are CSV files that that can have various number of columns, and between 3k and 50k rows. What I need to do is upload that data in Azure table. So far I managed to both upload data and retrieve it.
I'm using REST API, and for uploading I'm creating XML batch request, with 100 rows per request. Now that works fine, except it takes a bit too long to upload, ex. for 3k rows it takes around 30seconds. Is there any way to speed that up? I noticed that it takes most time when proccessing response ( for ReadToEnd() command ). I read somewhere that setting proxy to null could help, but it doesn't do much in my case.
I also found somewhere that it is possible to upload whole XML request to blob and then execute it from there, but I couldn't find any example for doing that.
using (Stream requestStream = request.GetRequestStream())
{
requestStream.Write(content, 0, content.Length);
}
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
Stream dataStream = response.GetResponseStream();
using (var reader = new StreamReader(dataStream))
{
String responseFromServer = reader.ReadToEnd();
}
}
As for retrieving data from azure tables, I managed to get 1000 entities per request. As for that, it takes me around 9s for CS with 3k rows. It also takes most time when reading from stream. When I'm calling this part of the code (again ReadToEnd() ):
response = request.GetResponse() as HttpWebResponse;
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
string result = reader.ReadToEnd();
}
Any tips?
As you mentioned you are using REST API you may have to write extra code and depend on your own methods to implement performance improvement differently then using client library. In your case using Storage client library would be best as you can use already build features to expedite insert, upsert etc as described here.
However if you were using Storage Client Library and ADO.NET you can use the article below which is written by Windows Azure Table team as supported way to improve Azure Access Performance:
.NET and ADO.NET Data Service Performance Tips for Windows Azure Tables

Resources