Which Azure service to use for processing data from Event Hub? - azure

I would appreciate some help picking out the best suited Azure services for my scenario - I am just beginning with Azure services and my knowledge is pretty limited.
I have data from multiple sources, and of different shapes, coming into an Event Hub. I need to subscribe to the events from the Event Hub and, based on their format, process them and ultimately save them into an SQL Database. All components - events consumers, the SQL Database - need to be hosted in the cloud.
How would I implement this in an "Azure Orientated Architecture"?
In an off cloud application, I would have competing consumers subscribing to the Event Hub. They would be some console applications or Windows services, and each would be processing the events asynchronously (this is further simplified by the event processing being idempotent).
Ideally, the Azure equivalent of the above consumers would scale up and down automatically, so I would like to not have to use VMs that host console applications (where I would need to keep an eye on the VM's resources myself). Scaling and deployment wise they would have to behave like App Services, however I'm under the impression that those are just for web applications. I've also briefly looked at Web Jobs, but those seem to be polling data at various intervals, whereas I need a proper event subscriber that the Event Hub pushes data into.
Any help will be greatly appreciated!
Thank you.
Later Edit:
I've looked into Web Jobs and they do allow continuous
processing, hence looks like they can be used as automatically
scaling subscribers.
Ideally I would like to write the code for
the subscribers in F#. C# is the other option if that is not
available.

You can see my post regarding IoT Hub. Its basically the same for Event Hub.
(each of the examples in the post can be used on Event Hubs).
https://stackoverflow.com/a/38682324/6659347
In addition, For Event Hub you can also use Azure Function which has an Event Hub trigger - a function that will run whenever an event hub receive a new event. And it will also answer your requirement of scaling.
Make sure that if you are working with multiple consumers make use of the Event Hub Consumer Groups so each consumer can read the stream independently.

I'd say use a WebJob in combination with an EventProcessor. I wrote some demo code that can easily be transferred to a WebJob: https://github.com/DeHeerSoftware/SemanticLogging.EventHub/tree/master/SemanticLogging.EventHub.Processor
See https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/#receive-messages-with-eventprocessorhost for official documentation.
I've created a WebJob myself using this approach. Works like a charm.

Related

Capture SEE (Server-Sent Events) in Azure

I am having trouble to identify the best "tool" to solve the problem. I am using a python library which publishes to its data via Sever-Sent-Events (SSE) (see https://github.com/wattsight/wapi-python/blob/development/wapi/events.py).
I would like to constantly listen for new events. However, I am not sure which tool in Azure in appropriate. An Azure Function would have to run continously which seems like a misuse, SignalR requires control over the "sender" of events and I don't know if EventHub would be able to manage that job.
Thank you for letting my learn from your experience.
The azure Eventhub is the right service to receive new events. Besides that, it can also provide other benefits like scalability, events storage etc.
And you can also consider using azure function with eventhub trigger. But it has a limitation about the incoming message max size.

EventGrid vs EventHub

I am working on a service fabric application and want to publish few events from this application and subscribe or process those publish events in another application.
I have tried EventGrid concept and observed that there is a delay while publishing and processing the events. So, now I am looking for other alternatives like EventHub or Queues, etc..
If anyone had already used EventGrid, EventHud or Queues, etc.. , Please do suggest which one will give more performance when we deal with more events.
Design Approach
We have migrated the tables from SQL service to Service Fabric. There is a view in SQL Service, and we are planning to implement that as a service in service fabric.
The implementation logic follows below.
Table 1 implemented service and we publish an event for each CRUD operation to EventGrid/ EventHud.
Table 2 implemented service and we publish an event for each CRUD operation to EventGrid/ EventHud.
We have created a view service where it listens to the events when any event sent to EventGrid/ EventHud, it will perform required calculations and store in the ViewService( It is a background job)
We are looking for a messaging service which gives more performance.
Have you seen this comparison and this one?
Anyway, can you clarify your requirements in terms of throughput and performance? It depends on a lot of factors including, but not limited to, the message size and the amount of messages.
Having used both Event Grid and Event Hub I'd say Event Hub works very well for many messages per second, say data streams from iot devices, but the performance of the downstream processing can be a bottleneck. You have to process them very fast in order to receive new events. Then there are partitions and consumer groups that can be of help to balance the load and have different processors for the same data but with different view of the data stream. (A fast processor for live displaying of sensor data and a slower one for storing the data for later analysis)
If you're talking about a few events generated by an application that triggers other apps to start doing some work based on those events Event Grid is a good fit. I haven't experienced much delay in receiving those events.
But bottom line, I think all services (Event Grid, Event Hub, Service Bus etc) support different use cases and that should be your first decision point.
Can you describe your publisher, subscriber, etc. and show your metrics of the Azure Event Grid usage?
You can use the portal screen snippets on the topic (publisher) and subscription (subscriber).
The following screen snippets are from my tester when manually have been fired few events.
Publisher side:
Subscriber side:
Metrics on the portal:
As you can see, the delivery destination processing time is ~1ms. The latency time on the publisher side (custom topic) is between 2-4ms.
Note, that the AEG is a PUSH->PUSH-ACK or PUSH->PULL-ACK eventing loosely decupled Pub/Sub model instead of the Event Hub model which is based on the PUSH->PULL mechanism, in other words, the Event Hub needs to host a listener and receiver for pulling an event from the partition.

How to consume events delivered by Azure Event Grid to GCP

Basically what I understood from few Azure topics is as below:
Azure Event Hub - where data is received initially and converted into events
Service Bus- acting as a queue
Azure Event Grid - where events converted in hub are transferred here.
so the connection is like below:
Hub -> Service Bus -> Event Grid -> Pub Sub -> Storage
I understood this concept. My problem is I want data to be pushed from the event grid to GCP (subscription / topics). My question are:
How can I establish this using PUSH method?
What do I need to develop exactly?
How can I push things from grid to pubsub/subscriptions?
I found this link where data is getting published into Event Grid but I want to push data from the event grid to gcp. Can anybody explain me where am I going wrong or what exactly should I start with. I am new to this and its very confusing so I just need little bit of guidance over here.
I have below doubts:
Is there any direct subscriber option available with event grid listener? I mean can I directly link my google storage account with this listener so, whenever there is an event triggered it will be directly pushed to my GCP account(I don't have Azure account with me right now since access issue is in progress so I can't see it that's why I am asking here)
Suppose I have 20 columns in my data but I want only 16 columns to be pushed in GCP so is there any customization possible while sending data from event grid/event hub to pub/sub
If I write custom connectors code as per the links provided in the below answers then how can I run it? I mean where I can deploy those scripts on the cloud so that they will be triggered automatically whenever an event is triggered?
Can I implement webhooks in this scenario? (as an alternative to connectors), If yes then how can I do it and on which side do I need to create it?
Also, I read some articles and I came to know from a few guys that they experienced data loss in this entire process. So, what's the possibility over here and how can it be avoided
Can anybody explain me where am I going wrong or what exactly should I start with.
It's right here:
so the connection is like below:
Hub -> Service Bus -> Event Grid -> Pub Sub -> Storage
Although this might be the case, it sounds very much as if you're looking at one (very) specific scenario where data flows in this exact way.
Azure Event Hub, Azure Service Bus and Azure Event Grid can work together, but can also be used completely separate from each other.
Event Grid
The purpose of Event Grid is to enable Reactive programming. Use this when you want to react to (status) changes.
Event Hubs
Event Hubs facilitate a big data pipeline. Use this when you need telemetry and distributed data streaming.
Service Bus
The purpose of Service bus is to enable High-value enterprise messaging. Use this when you want to do something like Order processing and financial transactions.
In some cases, you use the services side by side to fulfill distinct roles. For example, an ecommerce site can use Service Bus to process the order, Event Hubs to capture site telemetry, and Event Grid to respond to events like an item was shipped.
In other cases, you link them together to form an event and data pipeline. You use Event Grid to respond to events in the other services. For an example of using Event Grid with Event Hubs to migrate data to a data warehouse, see Stream big data into a data warehouse.
Taken from the very interesting and important documentation article Choose between Azure messaging services - Event Grid, Event Hubs, and Service Bus
EDIT
My problem is I want data to be pushed from event grid to GCP (subscription / topics). So how can I establish this using PUSH method??
Possibly the simplest solution is to have an Event Grid Event trigger a webhook (which might run an Azure Function or a Google Cloud Function) which in turn puts the event/message on the GCP Topic.
Publishing messages is quite well documented. There are examples on how to do so with a REST call, command-line, C#, Go, JAVA, NodeJS, PHP, Python and Ruby.
EDIT 2
What you need to do is create an Event Grid Subscription to listen to and handle Event Grid Events.
Here's an example screenshot on how to listen for events for a specific Storage Account and call a WebHook whenever such an event occurs:
Pay attention to the "Endpoint Details": that's where you can specify to, for instance, call a webhook every time an event is triggered.
The easiest way to transfer the EventHub generated events would probably be to create an EventHub event receiver in Node.js (which you mentioned in your comments) as described here, which receives events and publishes them to Cloud Pub/Sub directly, as described in the Cloud Pub/Sub publisher documentation for Node.js.

How to forward a message from Azure estate to NodeJS app?

I'm creating an IoT solution. On my web app i want to have a table on the dashboard that updates alert events in realtime.
I currently have an API, written in NodeJS that receives the JSON and invokes sockets.io to update the table. This solution seems a bit clunky.
I'm wondering is there a more seamless way to do this, similar to how the different components within azure link together.
I've looked into Azure Queues and having nodeJS subscribe and consume from the queue but as far as I could find there's no way to persist a queue connection from NodeJS and I would have had to continually poll.
I've looked into Power BI as well but I need to be able to completely change and modify every aspect of the design too.
The following is what I have so far:
Devices send data to IoT Hub. This is then processed by an azure stream analytics job and if a certain criteria is hit it sends the message to a documentDB for storage, and also to a service bus queue. I have a logic app which is triggered by a message arriving on the service bus queue and then POSTs the data to my API.

Subscribe and process events from RabbitMQ in Azure

For my new project every component is going to be deployed in Azure. I have a 3rd party application that processes events using RabbitMQ and I want to subscribe to these events and process them to store the data in the events in my own database.
What would be the best way to go? Using webjobs and Writing my own Custom Trigger/ Binder for RabbitMQ?
Thanks for the advice in advance
Based on your requirement, I assume that Azure WebJob is an ideal approach to achieve your purpose. In that case, you could use a WebJob as a consumer client to subscribe the events and process the data. Please try to create a WebJob and following the link provided by Mitra to subscribe the event and implement your logic processes in the WebJob.
Please pay attention that WebJob run as background processes in the context of an Azure Web App. In order to keep your WebJob running continuously, you need to be running in standard mode or highly and enable the "Always On" setting.
Consideration of scaling, you could use the Azure Websites scale feature to scale extra WebJobs instances. For scaling, you could refer to this tutorial.
For having subscription based routing, you can use Topics in Rabbitmq. Using topics you can push events to specific queues and then consumers at those queues can do processing to write data into the database. The only thing to take care of is to have a correct routing key for each queue.
That way you can have subscription based mechanism. The only thing with this approach will be that for each event there will be one queue.
The benefit of having one queue per event is it will be easy to keep track of events and so easy debugging.
If the number of events are very large then you can have only one queue but after consuming the message you have to trigger the event.
Here is the link for the reference:
https://www.rabbitmq.com/tutorials/tutorial-five-python.html

Resources