I have one requirement to do file merging based on event driven architecture. I am having two blob containers and i need to merge files as soon as they are available in their respective containers. Correlation will happen based on file name.
That means suppose i have two containers, container A and container B. When file comes to container A then it should wait for the file to come in container B and then event should trigger which will get subscribed by ADF or logic app for further processing. Please suggest some way to achieve this.
Event Grid Microsoft.Storage.BlobCreated event will be raised per container and will not wait for another container to raise an event.
One option I could think of is to handle your events using Durable Function where you could use Correlation value as Durable Function instance ID to dentify an existing function or start a new one. If a function instance with a given ID is already running, you'd be able to either perform the merge or raise a new custom event and handle it separately.
another option is creating a simple Distributed Event Aggregator, like is shown in the following screen snippet:
The concept is based on the Lease Blob where is stored the State of EventAggregator with all received event messages. The HttpTrigger function has a responsibility for handling and managing received event messages, create and update the State and handling a retry delivery for reliable updating a State. In this case, the dead-lettering is used as a Watchdog Timer.
Creating or Updating a Lease Blob will generate an event for subscriber and its logic can see the State of EventAggregator with an array of received event messages.
In the case when we are not using an eventing for Lease Blob (separate storage account), the finally event message with an EventAggregator State can be sent to the Storage queue by HttpTrigger function - EventAggregator.
Related
I know that there is EventGrid trigger for Azure Functions and I successfully used it in the past.
What I need now is to trigger another Azure Function only after TWO different events occurred (the order of events might be different).
In other words what I need is:
Event A occurred -> After some time Event B occurred -> Immediately execute Azure Function X
or
Event B occurred -> After some time Event A occurred -> Immediately execute Azure Function X
Is there an Azure Services that allows to merge or combine events from two different sources and convert it to one event? Or should I somehow persist the information what events occurred? Any suggestions?
This can be achieved via various services. Eg
Azure Logic Apps
You can execute your workflow based on conditions. More Reference
Using Service Bus
Consider a message is an event, based on the type of property inside the message you can determine what to do with your event.
Another way for aggregation pattern of the events with a generic and flexible solution (more events, etc.) is to ingest the AEG events to the stream pipeline (Event Hub event handler) and using the Azure Stream Analytics job for generating an event interest for outputting to the Azure Function.
The following screen snippet shows this full declarative solution:
I have configured an EventGrid subscription to initiate a web hook call for events in a resource group when a resource is created.
The web hook call is successfully handled, and I return a 200 OK. To maintain idempotency, I store all events that have occurred in a webhook_events table with the id of the event. Any new events are checked to see if they exist in that table by their id.
Azure EventGrid attempts to remove the event from the retry queue after returning a 200 OK. No matter how quickly I respond with a 200 OK, EventGrid reliably retries sending.
I am receiving the same event multiple times (as I said, EventGrid always retries, as it cannot remove the event from the retry queue fast enough). This however is not the focus of my question; rather, the issue exists in the fact that each of these retries presents me with a different id for the event. This means that I cannot logically determine the uniqueness of an event, and my application code is not being executed in an idempotent fashion.
How can I maintain idempotency between my application and Azure despite there being no unique identifier between event retries?
It's the way EventGrid is implemented if you look at the documentation
If the endpoint responds within 3 minutes, Event Grid will attempt to
remove the event from the retry queue on a best effort basis but
duplicates may still be received.
you can use back-end code to clean up logs and stored data, using event and message IDs to identify duplicates.
The id field is in fact unique per event and kept identical between retries & therefore can be used for dedupe.
What you're running into is a specific issue with some events generated by Azure Resource Manager (ARM). Specifically, the two events you are seeing are in fact distinct events, not duplicates, generated by ARM at different stages of the creative flow for some resource types.
ARM is acting as the API front door to the various Azure services and emits a set of events for that are generalized and often to get the details of what has occurred, you need to look in the data payload. For example, ARM will emit a success event for each 2xx status code it receives from an Azure service, so a 202 accepted and a 201 created can result in two events being emitted and the only way to see the difference would be in the data payload.
This is a known pain point, and we are working to emit more high-fidelity events that will be clearer and easier to react to in these scenarios. The ideal state will be a change-feed of sorts for the Azure control plane.
For event hub if we face a fault and the consumer crashes, then next time when it comes up how does it get to query what checkpoint it was on for the partition it gets hold of from the storage so that it can compare the reference sequence id of that message and incoming messages and process only the ones that come after that sequence id?
To save the checkpoint there is an API, but how to retrieve it?
As you know that Event Hub Check pointing is purely client side,i.e., you can store the current offset in the storage account linked with your event hub using the method
await context.CheckpointAsync();
in your client code. This will be converted to a storage account call. This is not related to any EventHub Service call.
Whenever there is a failure in your Event hub, you can read the latest(updated) offset from the storage account to avoid duplication of events.This must be handled by you on your client side code and it will not be handled by the event hub on its own.
If a reader disconnects from a partition, when it reconnects it begins reading at the checkpoint that was previously submitted by the last reader of that partition in that consumer group. When the reader connects, it passes the offset to the event hub to specify the location at which to start reading. In this way, you can use checkpointing to both mark events as "complete" by downstream applications, and to provide resiliency if a failover between readers running on different machines occurs. It is possible to return to older data by specifying a lower offset from this checkpointing process. Through this mechanism, checkpointing enables both failover resiliency and event stream replay.
Moreover, failure in an event hub is rare and duplicate events are less frequent. For more details on building a work flow with no duplicate events refer this stack overflow answer
The details of the checkpoint will be saved in the storage account linked to event hub in the format give below. This can be read using WindowsAzure.Storage client to do custom validation of sequence number of the last event received.
Is there a way to make an Azure function triggerable by multiple Service Bus event queues? For example, if there is a function which logic is valid for multiple cases(event start, event end- each inserted into a different Service Bus queue) and I want to reuse it for these events can I subscribe to both of them in the Service Bus from the same function?
I was looking for an answer to this question, but so far everywhere I checked it seems to be impossible.
Azure Functions can be triggered by a single source queue or subscription.
If you'd like to consolidate multiple sources to serve as a trigger for a single function, you could forward messages to a single entity (let's assume a queue) and configure Function to be triggered by messages in that queue. Azure Service Bus support Auto-Forwarding natively.
Note that there cannot be more than 3 hops and you cannot necessarily know what the source was if message was forwarded from a queue. For subscriptions, there's a possible workaround to stamp messages.
If your goal is to simply reuse code, what about refactoring that Function to create a class which is then used in multiple functions.
If your goal is implementing events aggregation, you could probably create an Azure Durable Function Workflow that would do a fan-in on multiple Events.
Excerpt from https://github.com/Azure/azure-functions-durable-extension/issues/166:
Processing Azure blobs in hourly batches.
New blob notifications are sent to a trigger function using Event Grid trigger.
The event grid trigger uses the singleton pattern to create a single orchestration instance of a well-known name and raises an event to the instance containing the blob payload(s).
To protect against race conditions in instance creation, the event grid trigger is configured as a singleton using SingletonAttribute.
Blob payloads are aggregated into a List and sent to another function for processing - in this case, aggregated into a single per-batch output blob.
A Durable Timer is used to determine the one-hour time boundaries.
You might want to consider switching the pattern around by using only one queue but multiple Topics/Subscriptions for the clients.
Then the Function in question can be triggered by the Start-End Topic.
Some other Function can be triggered by the Working Topic, etc.
I can get the messages from all the partitions of event hub in azure function but I want to get messages from a particular event hub partition in azure function. Is there a way to do that ? And one other thing I want to do is to increase (scale out) the number of azure functions to process messages if there are large number of backlogs messages to process. How can I do that ? Is there any formulae to solve my second problem ?
In the Azure Functions Consumption plan, scale out is handled automatically for you. If we see that your function is not keeping up with the event stream, we'll add new instances. Those instances will cooperate to process the event stream in parallel.
For reading of the event stream, we rely on the Event Hubs EventProcessorHost as described in their documentation here. This host manages coordination of partition leases with other instances when the Function App starts - this isn't something you can (or should want to) control.