Upon reading on AZURE Event Hub,
I note that we can send data via
http(s)
AMQP
KAFKA
As I am not an integration (messaging) expert, the following then:
Can I use both AMQP and http(s) to write to the same Event Hub Topic
and subsequently can a single AZURE Function read from that same single Event Hub Topic regardless of how written to?
For KAFKA, this will need to be always a separate Event Hub (Topic) is my understanding.
The AZURE EVENT HUB KAFKA look-like API means that, if you, say, all send a JSON format using all 3 protocols, they can be mapped to the same Event Hub (= Topic), and one can read the Event Hub in KAFKA mode, say.
This is a good read https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-exchange-events-different-protocols but I checked with a more experienced person to confirm.
I'm evaluating the use of Azure Event Hub vs Kafka as a Service Broker. I was hoping I would be able to create two local apps side by side, one that consumes messages using Kafka with the other one using Azure Event Hub. I've got a docker container set up which is a Kafka instance and I'm in the process of setting up Azure Event hub using my Azure account (as far as I know there's no other way to create a local/development instance for Azure Event Hub).
Does anyone have any information regarding the two that might be useful when comparing their features?
Can't add a comment directly, but the currently top rate answer has the line
Kafka can have multiple topics each Azure Event Hub is a single topic.
This is misleading as it makes it sound like you can't have multiple topics, which you can.
As per https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview#kafka-and-event-hub-conceptual-mapping an "Event Hub" is a topic while an "Event Hub Namespace" is the Kafka cluster.
This decision usually is driven by a broader architectural choice if you are choosing azure as your iaas and paas solution then event hub provides a great integration within the azure ecosystem but if you don't want a vendor lock in kafka is better option.
Operationally also if you want fully managed service then with event hub it's out of the box but with kafka you also get this with confluent platform.
Maturity wise kafka is older and with large community you have a larger support.
Feature wise what kafka ecosystem provides azure ecosystem has those things but if you talk about only event hub then it lacks few features compared to kafka
I think this link can help you extend your understanding https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
While Apache Kafka is software you typically need to install and operate, Event Hubs is a fully managed, cloud-native service. There are no servers, disks, or networks to manage and monitor and no brokers to consider or configure, ever. You create a namespace, which is an endpoint with a fully qualified domain name, and then you create Event Hubs (topics) within that namespace. For more information about Event Hubs and namespaces, see Event Hubs features. As a cloud service, Event Hubs uses a single stable virtual IP address as the endpoint, so clients don't need to know about the brokers or machines within a cluster. Even though Event Hubs implements the same protocol, this difference means that all Kafka traffic for all partitions is predictably routed through this one endpoint rather than requiring firewall access for all brokers of a cluster. Scale in Event Hubs is controlled by how many throughput units you purchase, with each throughput unit entitling you to 1 Megabyte per second, or 1000 events per second of ingress and twice that volume in egress. Event Hubs can automatically scale up throughput units when you reach the throughput limit if you use the Auto-Inflate feature; this feature work also works with the Apache Kafka protocol support.
You can find more on feature comparison here - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
Kafka can have multiple topics each Azure Event Hub is a single topic. Kafka running inside a container means you have to manage it. Azure Event Hub is a PaaS which means they managed the platform side. If you don't know how to make Kafka redundant, reliable, and scalable you may want to go with Azure Event Hubs or any PaaS that offers a similar pub/sub model. Event Hub platform is already scalable, reliable, and redundant.
You should compare
the administration capabilites / effort (as previously said)
the functional capabilities such as competing customer and pub/sub patterns
the performance : you should consider kafka if you plan to exceed the event hub quotas
How to setup Spark structured streaming session for Azure service bus?
I'm currently using azure databricks as consumer for one of the subscription to Service Bus Topic.
I have looked into couple of things though but seeing issues around it -
https://github.com/elastacloud/servicebusreceiver
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/service-bus-messaging/service-bus-python-how-to-use-topics-subscriptions.md
I am evaluating message streaming services on Azure. I want real time message processing service (Most reliable) where message carrying high degree of importance and data must not be lost. Basically I want to make available real time data transmitted from some third party cloud to the API I have hosted on Azure (I have exposed API to the third party so that they can send data).
Following are the options I worked up on.
Event Hub and IOT Hub are used mostly for telemetry data/events. So I am excluding those. Here message is carrying great value in my use case.
Service Bus or Kafka on HDInsight I am thinking to use.
Now, service bus is offering more features as compared to Kafka and also providing very good documentation about how to use it.
But on the documentation I couldn't find anywhere that service bus is used for real time processing. Where as documentation is available stating use kafka for real time processing.
https://learn.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/real-time-ingestion
Which should be the best service among above for my use case? Any other better option which I have not thought of?
We are developing an application where IoT devices will be publishing events to azure IoT hub using MQTT protocol (by using one topic to push message). We want to consume these message using Stream Analytic service. And to scale Stream analytic services, it is recommended to use partitionBy clause.
Since, we are not using Azure Event hub SDK, can we somehow attached partitionId with events?
Thanks In Advance
As Rita mentioned in the comments, Event Hub will automatically associate each device to a particular partition.
Then, when you can use PARTITION BY PartitionId for steps closer to the input to efficiently parallelize processing of the input and reduce/aggregate the data.
Then, you can have another non-partitioned step to output to SQL sending some aggregate data.
Doing that you will be able to assign more thank 6 SUs, even with an output to SQL.
We will update our documentation to give more info about scaling ASA jobs and describe the different possible scenarios.
Thanks,
JS - Azure Stream Analytics