I want to do the same functionality (with few changes based on message data) from two different eventhubs.
Is it possible to attach two consumer group to a single function.
It did not work even though I add it to function.json.
The short answer is no. You cannot bind multiple input triggers to the same function:
https://github.com/Azure/azure-webjobs-sdk-script/wiki/function.json
A function can only have a single trigger binding, and can have multiple input/output bindings.
However, you can call the same "shared" code from multiple functions by either wrapping the shared code in a helper method, or using Precompiled Functions.
Recommended practice here is to share business logic between functions by using the fact that a single function app can be composed of multiple functions.
MyFunctionApp
| host.json
|____ business
| |____ logic.js
|____ function1
| |____ index.js
| |____ function.json
|____ function2
|____ index.js
|____ function.json
In "function1/index.js" and "function2/index.js"
var logic = require("../business/logic");
module.exports = logic;
The function.json of function1 and function2 can be configured to different triggers.
In "business/logic.js
module.exports = function (context, req) {
// This is where shared code goes. As an example, for an HTTP trigger:
context.res = {
body: "<b>Hello World</b>",
status: 201,
headers: {
'content-type': "text/html"
}
};
context.done();
};
Is it possible to attach two consumer group to a single function.
Assuming you're looking for trigger and don't want to do your own polling using EventProcessorClient inside your function. Because you can schedule a function to periodically use API to fetch messages from multiple event hubs and process them. But you need to implement all the built-in logic (polling, handling multiple partitions, check-pointing, scaling, ...) you get when you use triggers.
Couple of work arounds:
Capture: If event hubs are in same namespace, you can enable capture on all your event hubs. Then create an event grid trigger for your function. You'll get a message with path of capture file. E.g.
{
"topic": "/subscriptions/9fac-4e71-9e6b-c0fa7b159e78/resourcegroups/kash-test-01/providers/Microsoft.EventHub/namespaces/eh-ns",
"subject": "eh-1",
"eventType": "Microsoft.EventHub.CaptureFileCreated",
"id": "b5aa3f62-15a1-497a-b97b-e688d4368db8",
"data": {
"fileUrl": "https://xxx.blob.core.windows.net/capture-fs/eh-ns/eh-1/0/2020/10/28/21/39/01.avro",
"fileType": "AzureBlockBlob",
"partitionId": "0",
"sizeInBytes": 8011,
"eventCount": 5,
"firstSequenceNumber": 5,
"lastSequenceNumber": 9,
"firstEnqueueTime": "2020-10-28T21:40:28.83Z",
"lastEnqueueTime": "2020-10-28T21:40:28.908Z"
},
"dataVersion": "1",
"metadataVersion": "1",
"eventTime": "2020-10-28T21:41:02.2472744Z"
}
Obviously this is not real-time, min capture time you can set is 1 minute and there might be a little delay between the time captured avro file is written and your function is invoked.
At least in Java there is no restriction that you must have a separate class for each function. So you can do this:
public class EhConsumerFunctions {
private void processEvent(String event) {
// process...
}
#FunctionName("eh1_consumer")
public void eh1_consumer(
#EventHubTrigger(name = "event", eventHubName = "eh-ns", connection = "EH1_CONN_STR") String event,
final ExecutionContext context) {
processEvent(event);
}
#FunctionName("eh2_consumer")
public void eh2_consumer(
#EventHubTrigger(name = "event", eventHubName = "eh-ns", connection = "EH2_CONN_STR") String event,
final ExecutionContext context) {
processEvent(event);
}
}
and define EH1_CONN_STR and EH2_CONN_STR in your app settings.
Related
I am creating an architecture to process our orders from an ecommerce website who gets 10,000 orders or more every hour. We are using an external third party order fulfillment service and they have about 5 Steps/APIs that we have to run which are dependent upon each other.
I was thinking of using Fan in/Fan Out approach where we can use durable functions.
My plan
Once the order is created on our end, we store in a table with a flag of Order completed.
Run a time trigger azure function that runs the durable function orchestrator which calls the activity functions for each step
Now if it fails, timer will pick up the order again until it is completed. But my question is should we put this order in service bus and pick it up from there instead of time trigger.
Because there can be more than 10,000 records each hour so we have to run a query in the time trigger function and find orders that are not completed and run the durable orchestrator 10,000 times in a loop. My first question - Can I run the durable function parallelly for 10,000 records?
If I use service bus trigger to trigger durable orchestrator, it will automatically run azure function and durable 10,000 times parallelly right? But in this instance, I will have to build a dead letter queue function/process so if it fails, we are able to move it to active topic
Questions:
Is durable function correct approach or is there a better and easier approach?
If yes, Is time trigger better or Service bus trigger to start the orchestrator function?
Can I run the durable function orchestrator parallelly through time trigger azure function. I am not talking about calling activity functions because those cannot be run parallelly because we need output of one to be input of the next
This usecase fits function chaining. This can be done by
Have the ordering system put a message on a queue (storage or servicebus)
Create an azure function with storage queue trigger or service bus trigger. This would also be the client function that triggers the orchestration function
Create an orchestration function that invokes the 5 step APIs, one activity function for each (similar to as given in function chaining example.
Create five activity function, one f for each API
Ordering system
var clientOptions = new ServiceBusClientOptions
{
TransportType = ServiceBusTransportType.AmqpWebSockets
};
//TODO: Replace the "<NAMESPACE-NAME>" and "<QUEUE-NAME>" placeholders.
client = new ServiceBusClient(
"<NAMESPACE-NAME>.servicebus.windows.net",
new DefaultAzureCredential(),
clientOptions);
sender = client.CreateSender("<QUEUE-NAME>");
var message = new ServiceBusMessage($"{orderId}");
await sender.SendMessageAsync(message);
Client function
public static class OrderFulfilment
{
[Function("OrderFulfilment")]
public static string Run([ServiceBusTrigger("<QUEUE-NAME>", Connection = "ServiceBusConnection")] string orderId,
[DurableClient] IDurableOrchestrationClient starter)
{
var logger = context.GetLogger("OrderFulfilment");
logger.LogInformation(orderId);
return starter.StartNewAsync("ChainedApiCalls", orderId);
}
}
Orchestration function
[FunctionName("ChainedApiCalls")]
public static async Task<object> Run([OrchestrationTrigger] IDurableOrchestrationContext fulfillmentContext)
{
try
{
// .... get order with orderId
var a = await context.CallActivityAsync<object>("ApiCaller1", null);
var b = await context.CallActivityAsync<object>("ApiCaller2", a);
var c = await context.CallActivityAsync<object>("ApiCaller3", b);
var d = await context.CallActivityAsync<object>("ApiCaller4", c);
return await context.CallActivityAsync<object>("ApiCaller5", d);
}
catch (Exception)
{
// Error handling or compensation goes here.
}
}
Activity functions
[FunctionName("ApiCaller1")]
public static string ApiCaller1([ActivityTrigger] IDurableActivityContext fulfillmentApiContext)
{
string input = fulfillmentApiContext.GetInput<string>();
return $"API1 result";
}
[FunctionName("ApiCaller2")]
public static string ApiCaller2([ActivityTrigger] IDurableActivityContext fulfillmentApiContext)
{
string input = fulfillmentApiContext.GetInput<string>();
return $"API2 result";
}
// Repeat 3 more times...
Can I use an output binding argument in a foreach, multiple times?
[FunctionName("OnClientConnectedDisconnected")]
public async Task Run(
[EventGridTrigger] EventGridEvent eventGridEvent,
[SignalR(HubName = "Lobby")] IAsyncCollector<SignalRMessage> signalRMessage,
[SignalR(HubName = "Lobby")] IAsyncCollector<SignalRGroupAction> signalRGroupMessage,
ILogger log)
{
...
...
foreach (var player in onlineFriends)
{
await signalRGroupMessage.AddAsync(
new SignalRGroupAction
{
GroupName = $"Group_{player}",
Action = GroupAction.Add,
UserId = eventGridData.UserId
}
);
}
}
Whether you can call an output binding multiple times in a function depends on the binding type and how it is written. For example, the table storage output binding will add multiple entities to a table (once for each call), but the queue binding will only post one message. I'm not seeing that the SignalR output binding allows multiple calls.
Remember that binding are optional--an alternate strategy would be to write your own code to create your group and exit the function after the loop.
Yes, you can.
In C#, you can use SignalR output binding multiple times and the invocation just happens at where you call await signalRGroupMessage.AddAsync()
I am trying to track down some occasional Non-Deterministic workflow detected: TaskScheduledEvent: 0 TaskScheduled ... errors in a durable function project of ours. It is infrequent (3 times in 10,000 or so instances).
When comparing the orchestrator code to the constraints documented here there is one pattern we use that I am not clear on. In an effort to make the orchestrator code more clean/readable we use some private async helper functions to make the actual CallActivityWithRetryAsync call, sometimes wrapped in an exception handler for logging, then the main orchestrator function awaits on this helper function.
Something like this simplified sample:
[FunctionName(Name)]
public static async Task RunPipelineAsync(
[OrchestrationTrigger]
DurableOrchestrationContextBase context,
ILogger log)
{
// other steps
await WriteStatusAsync(context, "Started", log);
// other steps
await WriteStatusAsync(context, "Completed", log);
}
private static async Task WriteStatusAsync(
DurableOrchestrationContextBase context,
string status,
ILogger log
)
{
log.LogInformationOnce(context, "log message...");
try
{
var request = new WriteAppDocumentStatusRequest
{
//...
};
await context.CallActivityWithRetryAsync(
"WriteAppStatus",
RetryPolicy,
request
);
}
catch(Exception e)
{
// "optional step" will log errors but not take down the orchestrator
// log here
}
}
In reality these tasks are combined and used with Task.WhenAll. Is it valid to be calling these async functions despite the fact that they are not directly on the context?
Yes, what you're doing is perfectly safe because it still results in deterministic behavior. As long as you aren't doing any custom thread scheduling or calling non-durable APIs that have their own separate async callbacks (for example, network APIs typically have callbacks running on a separate thread), you are fine.
If you are ever unsure, I highly recommend you use our Durable Functions C# analyzer to analyzer your code for coding errors. This will help flag any coding mistakes that could result in Non-deterministic workflow errors.
UPDATE
Note: the current version of the analyzer will require you to add a [Deterministic] attribute to your private async function, like this:
[Deterministic]
private static async Task WriteStatusAsync(
DurableOrchestrationContextBase context,
string status,
ILogger log
)
{
// ...
}
This lets it know that the private async method is being used by your orchestrator function and that it also needs to be analyzed. If you're using Durable Functions 1.8.3 or below, the [Deterministic] attribute will not exist. However, you can create your own custom attribute with the same name and the analyzer will honor it. For example:
[Deterministic]
private static async Task WriteStatusAsync(
DurableOrchestrationContextBase context,
string status,
ILogger log
)
{
// ...
}
// Needed for the Durable Functions analyzer
class Deterministic : Attribute { }
Note, however, that we are planning on removing the need for the [Deterministic] attribute in a future release, as we're finding it may not actually be necessary.
We have a simple ETL process to extract data from an API to a Document DB which we would like to implement using functions. In brief, the process is to take a ~16,500 line file, extract an ID from each line (Function 1), build a URL for each ID (Function 2), hit an API using the URL (Function 3), store the response in a document DB (Function 4). We are using queues for inter-function communication and are seeing problems with timeouts in the first function while doing this.
Function 1 (index.js)
module.exports = function (context, odsDataFile) {
context.log('JavaScript blob trigger function processed blob \n Name:', context.bindingData.odaDataFile, '\n Blob Size:', odsDataFile.length, 'Bytes');
const odsCodes = [];
odsDataFile.split('\n').map((line) => {
const columns = line.split(',');
if (columns[12] === 'A') {
odsCodes.push({
'odsCode': columns[0],
'orgType': 'pharmacy',
});
}
});
context.bindings.odsCodes = odsCodes;
context.log(`A total of: ${odsCodes.length} ods codes have been sent to the queue.`);
context.done();
};
function.json
{
"bindings": [
{
"type": "blobTrigger",
"name": "odaDataFile",
"path": "input-ods-data",
"connection": "connecting-to-services_STORAGE",
"direction": "in"
},
{
"type": "queue",
"name": "odsCodes",
"queueName": "ods-org-codes",
"connection": "connecting-to-services_STORAGE",
"direction": "out"
}
],
"disabled": false
}
Full code here
This function works fine when the number of ID's is in the 100's but times out when it is in the 10's of 1000's. The building of the ID array happens in milliseconds and the function completes but the adding of the items to the queue seems to take many minutes and eventually causes a timeout at the default of 5 mins.
I am surprised that the simple act of populating the queue seems to take such a long time and that the timeout for a function seems to include the time for tasks external to the function (i.e. queue population). Is this to be expected? Are there more performant ways of doing this?
We are running under the Consumption (Dynamic) Plan.
I did some testing of this from my local machine and found that it takes ~200ms to insert a message into the queue, which is expected. So if you have 17k messages to insert and are doing it sequentially, the time will take:
17,000 messages * 200ms = 3,400,000ms or ~56 minutes
The latency may be a bit quicker when running from the cloud, but you can see how this would jump over 5 minutes pretty quickly when you are inserting that many messages.
If message ordering isn't crucial, you could insert the messages in parallel. Some caveats, though:
You can't do this with node -- it'd have to be C#. Node doesn't expose the IAsyncCollector interface to you so it does it all behind-the-scenes.
You can't insert everything in parallel because the Consumption plan has a limit of 250 network connections at a time.
Here's an example of batching up the inserts 200 at a time -- with 17k messages, this took under a minute in my quick test.
public static async Task Run(string myBlob, IAsyncCollector<string> odsCodes, TraceWriter log)
{
List<Task> tasks = new List<Task>();
string[] lines = myBlob.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
int skip = 0;
int take = 200;
IEnumerable<string> batch = lines.Skip(skip).Take(take);
while (batch.Count() > 0)
{
await AddBatch(batch, odsCodes);
skip += take;
batch = lines.Skip(skip).Take(take);
}
}
public static async Task AddBatch(IEnumerable<string> lines, IAsyncCollector<string> odsCodes)
{
List<Task> tasks = new List<Task>();
foreach (string line in lines)
{
tasks.Add(odsCodes.AddAsync(line));
}
await Task.WhenAll(tasks);
}
As other answers have pointed out, because Azure Queues does not have a batch API you should consider an alternative such as Service Bus Queues. But if you are sticking with Azure Queues you need to avoid outputting the queue items sequentially, i.e. some form of constrained parallelism is necessary. One way to achieve this is to use the TPL Dataflow library.
One advantage Dataflow has to using batches of tasks and doing a WhenAll(..) is that you will never have a scenario where a batch is almost done and you are waiting for one slow execution to complete before starting the next batch.
I compared inserting 10,000 items with task batches of size 32 and dataflow with parallelism set to 32. The batch approach completed in 60 seconds, while dataflow completed in almost half that (32 seconds).
The code would look something like this:
using System.Threading.Tasks.Dataflow;
...
var addMessageBlock = new ActionBlock<string>(async message =>
{
await odscodes.AddAsync(message);
}, new ExecutionDataflowBlockOptions { SingleProducerConstrained = true, MaxDegreeOfParallelism = 32});
var bufferBlock = new BufferBlock<string>();
bufferBlock.LinkTo(addMessageBlock, new DataflowLinkOptions { PropagateCompletion = true });
foreach(string line in lines)
bufferBlock.Post(line);
bufferBlock.Complete();
await addMessageBlock.Completion;
I have a Node.js timerTrigger Azure function that processes a collection and queues the processing results for further processing (by a Node.js queueTrigger function).
The code is something like the following:
module.exports = function (context, myTimer) {
collection.forEach(function (item) {
var items = [];
// do some work and fill 'items'
var toBeQueued = { items: items };
context.bindings.myQueue = toBeQueued;
});
context.done();
};
This code will only queue the last toBeQueued and not each one I'm trying to queue.
Is there any way to queue more than one item?
Update
To be clear, I'm talking about queueing a toBeQueued in each iteration of forEach, not just queueing an array. Yes, there is an issue with Azure Functions because of which I cannot queue an array, but I have a workaround for it; i.e., { items: items }.
Not yet, but we'll address that within the week, stay tuned :) You'll be able to pass an array to the binding as you're trying to do above.
We have an issue tracking this in our public repo here. Thanks for reporting.
Mathewc's answer is the correct one wrt Node.
For C# you can today by specifying ICollector<T> as the type of your output queue parameter.
Below is an example I have of two output queues, one of which I add via a for loop.
public static void Run(Item inbound, DateTimeOffset InsertionTime, ICollector<Item> outbound, ICollector<LogItem> telemetry, TraceWriter log)
{
log.Verbose($"C# Queue trigger function processed: {inbound}");
telemetry.Add(new LogItem(inbound, InsertionTime));
if(inbound.current_generation < inbound.max_generation)
{
for(int i = 0; i < inbound.multiplier; i++) {
outbound.Add(Item.nextGen(inbound));
}
}
}