I'm evaluating the use of Azure Event Hub vs Kafka as a Service Broker. I was hoping I would be able to create two local apps side by side, one that consumes messages using Kafka with the other one using Azure Event Hub. I've got a docker container set up which is a Kafka instance and I'm in the process of setting up Azure Event hub using my Azure account (as far as I know there's no other way to create a local/development instance for Azure Event Hub).
Does anyone have any information regarding the two that might be useful when comparing their features?
Can't add a comment directly, but the currently top rate answer has the line
Kafka can have multiple topics each Azure Event Hub is a single topic.
This is misleading as it makes it sound like you can't have multiple topics, which you can.
As per https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview#kafka-and-event-hub-conceptual-mapping an "Event Hub" is a topic while an "Event Hub Namespace" is the Kafka cluster.
This decision usually is driven by a broader architectural choice if you are choosing azure as your iaas and paas solution then event hub provides a great integration within the azure ecosystem but if you don't want a vendor lock in kafka is better option.
Operationally also if you want fully managed service then with event hub it's out of the box but with kafka you also get this with confluent platform.
Maturity wise kafka is older and with large community you have a larger support.
Feature wise what kafka ecosystem provides azure ecosystem has those things but if you talk about only event hub then it lacks few features compared to kafka
I think this link can help you extend your understanding https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
While Apache Kafka is software you typically need to install and operate, Event Hubs is a fully managed, cloud-native service. There are no servers, disks, or networks to manage and monitor and no brokers to consider or configure, ever. You create a namespace, which is an endpoint with a fully qualified domain name, and then you create Event Hubs (topics) within that namespace. For more information about Event Hubs and namespaces, see Event Hubs features. As a cloud service, Event Hubs uses a single stable virtual IP address as the endpoint, so clients don't need to know about the brokers or machines within a cluster. Even though Event Hubs implements the same protocol, this difference means that all Kafka traffic for all partitions is predictably routed through this one endpoint rather than requiring firewall access for all brokers of a cluster. Scale in Event Hubs is controlled by how many throughput units you purchase, with each throughput unit entitling you to 1 Megabyte per second, or 1000 events per second of ingress and twice that volume in egress. Event Hubs can automatically scale up throughput units when you reach the throughput limit if you use the Auto-Inflate feature; this feature work also works with the Apache Kafka protocol support.
You can find more on feature comparison here - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
Kafka can have multiple topics each Azure Event Hub is a single topic. Kafka running inside a container means you have to manage it. Azure Event Hub is a PaaS which means they managed the platform side. If you don't know how to make Kafka redundant, reliable, and scalable you may want to go with Azure Event Hubs or any PaaS that offers a similar pub/sub model. Event Hub platform is already scalable, reliable, and redundant.
You should compare
the administration capabilites / effort (as previously said)
the functional capabilities such as competing customer and pub/sub patterns
the performance : you should consider kafka if you plan to exceed the event hub quotas
Related
Right now, on IoT Hub there is an information that limit for messages per day 8000. I would like to ask you about any patterns which are being used in Azure.
I am curious if I am able to hit to Azure with some service outside Messages in order to prevent it from being overloaded by big amount of data, or save some confidentiality for this service.
For example, I would like to store some data from given service to Messages that are not being confidential and other data by using some WebSocket or any Rest protocol. I think that there are some patterns that serve that scenarios.
Does anyone has experience with that kind of situation?
Not everything needs to go through IoT Hub. IoT Hub is great for two way communication to/from IoT devices. You could also look at Event Hubs for ingestion from devices that don't need two way comms. We have a write up on the differences here Connecting IoT Devices to Azure: IoT Hub and Event Hubs.
I am working on a service fabric application and want to publish few events from this application and subscribe or process those publish events in another application.
I have tried EventGrid concept and observed that there is a delay while publishing and processing the events. So, now I am looking for other alternatives like EventHub or Queues, etc..
If anyone had already used EventGrid, EventHud or Queues, etc.. , Please do suggest which one will give more performance when we deal with more events.
Design Approach
We have migrated the tables from SQL service to Service Fabric. There is a view in SQL Service, and we are planning to implement that as a service in service fabric.
The implementation logic follows below.
Table 1 implemented service and we publish an event for each CRUD operation to EventGrid/ EventHud.
Table 2 implemented service and we publish an event for each CRUD operation to EventGrid/ EventHud.
We have created a view service where it listens to the events when any event sent to EventGrid/ EventHud, it will perform required calculations and store in the ViewService( It is a background job)
We are looking for a messaging service which gives more performance.
Have you seen this comparison and this one?
Anyway, can you clarify your requirements in terms of throughput and performance? It depends on a lot of factors including, but not limited to, the message size and the amount of messages.
Having used both Event Grid and Event Hub I'd say Event Hub works very well for many messages per second, say data streams from iot devices, but the performance of the downstream processing can be a bottleneck. You have to process them very fast in order to receive new events. Then there are partitions and consumer groups that can be of help to balance the load and have different processors for the same data but with different view of the data stream. (A fast processor for live displaying of sensor data and a slower one for storing the data for later analysis)
If you're talking about a few events generated by an application that triggers other apps to start doing some work based on those events Event Grid is a good fit. I haven't experienced much delay in receiving those events.
But bottom line, I think all services (Event Grid, Event Hub, Service Bus etc) support different use cases and that should be your first decision point.
Can you describe your publisher, subscriber, etc. and show your metrics of the Azure Event Grid usage?
You can use the portal screen snippets on the topic (publisher) and subscription (subscriber).
The following screen snippets are from my tester when manually have been fired few events.
Publisher side:
Subscriber side:
Metrics on the portal:
As you can see, the delivery destination processing time is ~1ms. The latency time on the publisher side (custom topic) is between 2-4ms.
Note, that the AEG is a PUSH->PUSH-ACK or PUSH->PULL-ACK eventing loosely decupled Pub/Sub model instead of the Event Hub model which is based on the PUSH->PULL mechanism, in other words, the Event Hub needs to host a listener and receiver for pulling an event from the partition.
I am evaluating message streaming services on Azure. I want real time message processing service (Most reliable) where message carrying high degree of importance and data must not be lost. Basically I want to make available real time data transmitted from some third party cloud to the API I have hosted on Azure (I have exposed API to the third party so that they can send data).
Following are the options I worked up on.
Event Hub and IOT Hub are used mostly for telemetry data/events. So I am excluding those. Here message is carrying great value in my use case.
Service Bus or Kafka on HDInsight I am thinking to use.
Now, service bus is offering more features as compared to Kafka and also providing very good documentation about how to use it.
But on the documentation I couldn't find anywhere that service bus is used for real time processing. Where as documentation is available stating use kafka for real time processing.
https://learn.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/real-time-ingestion
Which should be the best service among above for my use case? Any other better option which I have not thought of?
Let's say I've got an azure service bus in a microservice scenario.
One microservice pushes master data changes to the other services with a subscription.
Now let's say a new service is introduced and subscribes to the master data service. How can I make sure that the new service receives all neccessary data?
Do I have to resend all master data on the master data service or does the azure service bus (or alternatives) provide some features for that?
As far as I know there is no way to achieve what you want within the capabilities of Azure Service Bus. Also, I don't think this what Service Bus is there for.
Of course there is a configurable "time to live" value for messages within queues and topics, which could probably be set to some really high value, but this would still not make your master data be infinitely available for future services. And - but this is just my opinion and I'm far from being an expert - I wouldn't want to load up my service bus with potentially thousands or even millions of messages (depending on what you're doing) without them being processed quickly.
For your specific concern I'd rather implement something like a "master data import service" without any service bus integration. Details of this, however, depend on your environment and specific requirements.
Couple of points:
1) This is not possible with Azure Service bus. Even If you set TTL at Topic level, the messages will only be delivered to available subscriptions at that point of time. you cant read messages directly from Topic.
2) you can consider Eventhub option where you can create new consumer group with offset from when you want to start reading messages but Eventhub has maximum retention period as 7 days. If you need message retention beyond 7 days, enabling Event Hubs Capture on your event hub pulls the data from your event hub to the Storage account. But in this case you would require additional logic to read from this storage account to replay the messages.
I'm currently building a hybrid-cloud solution that needs to write messages to a queue for later processing. It is absolutely imperative that the queue is highly available (99.999+% uptime).
My options are to read/write messages to a local ZeroMQ high availability pair, or an Azure Service Bus. I would prefer to go the Azure Service Bus route, but can't find any documentation regarding high availability configuration for Azure Service Bus.
Has anyone had success setting up Azure Service Bus for high availability? I understand that the SLA for a single instance of any Azure service cannot be changed. I'm thinking more along the lines of the failover capabilities of Azure Web Apps.
The main thing you can do for consuming a service at a higher than SLA value is to ensure you are handling retry logic. The key here will be the temporal nature of any outage, and tuning a retry backoff to handle edge cases. Some use linear or exponential backoffs to wait even longer for the service to come back up.
Also, you can have more than one service bus in a different region for georedundancy, and either load balancing messages across the two or use one as a hot backup. This can get you around any regional outages and keep your service up when one data center is not meeting its local SLA.
You can find the for SLA for Azure Service Bus here: legal/sla/service-bus/v1_0/
For Service Bus Relays, we guarantee that at least 99.9% of the time,
properly configured applications will be able to establish a
connection to a deployed Relay. For Service Bus Queues and Topics, we
guarantee that at least 99.9% of the time, properly configured
applications will be able to send or receive messages or perform other
operations on a deployed Queue or Topic. For Service Bus Basic and
Standard Notification Hub tiers, we guarantee that at least 99.9% of
the time, properly configured applications will be able to send
notifications or perform registration management operations with
respect to a Notification Hub. For Event Hubs Basic and Standard
tiers, we guarantee that at least 99.9% of the time, properly
configured applications will be able to send or receive messages or
perform other operations on the Event Hub.
We've had Service Bus Relay up and running for 5+ years and have had one outage. It was an outage at the specific data center the relay was provisioned in and touched many services. After that we implemented redundancy by implementing a secondary Service Bus Relay namespace in a different data center location. The reconfigured code was set to check the connectivity on every connection and switch the primary and secondary connections. We treated them as equals so once we "failed over" that namespace would become primary.
Service Bus now supports Geo-disaster recovery and Geo-replication at the namespace level.
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-geo-dr