How to setup Spark structured streaming session for Azure service bus?
I'm currently using azure databricks as consumer for one of the subscription to Service Bus Topic.
I have looked into couple of things though but seeing issues around it -
https://github.com/elastacloud/servicebusreceiver
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/service-bus-messaging/service-bus-python-how-to-use-topics-subscriptions.md
Related
I have a use case in IoT streaming, in our current architecture data from the IoT hub is consumed by our stream analytics jobs for realtime reporting on powerBi dashboards. I want to be able to expand this to additional tenants now. From what i have gathered this seems to be possible with dedicated azure stream analytics clusters and i dont seem to understand how the ingestion to the clusters would occur? Would it mean i will need to have a load balancer between my IoT hub and stream analytics jobs? Or is there a better way i could achieve this?
Could you please help me by providing some suggestions on consuming the Azure Service Bus streaming message using Python.
As I found there is no spark structured streaming source for Azure Service Bus then in this case can I read the Azure Service Bus message using provided Python client and from Python client I read the each message and write it into Kafka topic and on this Kafka topic I will apply the spark structured streaming programing.
My use case is to consume the Azure Service Bus streaming message and write each message by transforming it into a timestream database InfluxDb or Pramethoues and show the real time dashboard on business metrics in Grafana.
I am thinking of reading the Azure Service Bus streaming message using python kafka producer like program and write this data into Kafka topic and then consume this data into Spark structures streaming with Kafka topic.
Please suggest am I going in the right direction?Any suggestion will be appreciated....
Looks like there is no readily available connector since Service Bus is not designed with this in mind, unlike Event Hubs (which provides the Kafka Protocol). But it should be possible to write your own receiver (like this one).
Another alternative would be to immediately forward messages from Service Bus to a compatible source like Event Hubs (or Kafka) using something simple like Azure Functions.
Azure Functions along with bindings for both Serice Bus and Event Hubs / Kafka, you could implement this forwarding service with almost no code. But if you prefer, using the Python SDK for both in your own client will do the trick as well which itself could be an Azure Function as well.
-- From my original answer on Microsoft Q&A
Apart from Azure service bus uses Topics and Azure Event Hub is based on Events - is there any Fundamental difference between Azure Event Hub and Azure Service Bus?
To me, there is no real difference between events and messages as both are just a different type of Json.
Even though you are dealing with JSON in both services, there is fundamentally difference between the two.
Azure Event Hubs focuses more on event streaming whereas Azure Service Bus is more focused on high-value enterprise messaging, which denotes Azure service is focused on messages rather than events.
In azure service bus With a topic, every consumer that subscribed to the topic will get each message which means each message is picked up by only 1 consumer. In case of event hub you can have multiple consumers.
You can read from the docs page here
I'm evaluating the use of Azure Event Hub vs Kafka as a Service Broker. I was hoping I would be able to create two local apps side by side, one that consumes messages using Kafka with the other one using Azure Event Hub. I've got a docker container set up which is a Kafka instance and I'm in the process of setting up Azure Event hub using my Azure account (as far as I know there's no other way to create a local/development instance for Azure Event Hub).
Does anyone have any information regarding the two that might be useful when comparing their features?
Can't add a comment directly, but the currently top rate answer has the line
Kafka can have multiple topics each Azure Event Hub is a single topic.
This is misleading as it makes it sound like you can't have multiple topics, which you can.
As per https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview#kafka-and-event-hub-conceptual-mapping an "Event Hub" is a topic while an "Event Hub Namespace" is the Kafka cluster.
This decision usually is driven by a broader architectural choice if you are choosing azure as your iaas and paas solution then event hub provides a great integration within the azure ecosystem but if you don't want a vendor lock in kafka is better option.
Operationally also if you want fully managed service then with event hub it's out of the box but with kafka you also get this with confluent platform.
Maturity wise kafka is older and with large community you have a larger support.
Feature wise what kafka ecosystem provides azure ecosystem has those things but if you talk about only event hub then it lacks few features compared to kafka
I think this link can help you extend your understanding https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
While Apache Kafka is software you typically need to install and operate, Event Hubs is a fully managed, cloud-native service. There are no servers, disks, or networks to manage and monitor and no brokers to consider or configure, ever. You create a namespace, which is an endpoint with a fully qualified domain name, and then you create Event Hubs (topics) within that namespace. For more information about Event Hubs and namespaces, see Event Hubs features. As a cloud service, Event Hubs uses a single stable virtual IP address as the endpoint, so clients don't need to know about the brokers or machines within a cluster. Even though Event Hubs implements the same protocol, this difference means that all Kafka traffic for all partitions is predictably routed through this one endpoint rather than requiring firewall access for all brokers of a cluster. Scale in Event Hubs is controlled by how many throughput units you purchase, with each throughput unit entitling you to 1 Megabyte per second, or 1000 events per second of ingress and twice that volume in egress. Event Hubs can automatically scale up throughput units when you reach the throughput limit if you use the Auto-Inflate feature; this feature work also works with the Apache Kafka protocol support.
You can find more on feature comparison here - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
Kafka can have multiple topics each Azure Event Hub is a single topic. Kafka running inside a container means you have to manage it. Azure Event Hub is a PaaS which means they managed the platform side. If you don't know how to make Kafka redundant, reliable, and scalable you may want to go with Azure Event Hubs or any PaaS that offers a similar pub/sub model. Event Hub platform is already scalable, reliable, and redundant.
You should compare
the administration capabilites / effort (as previously said)
the functional capabilities such as competing customer and pub/sub patterns
the performance : you should consider kafka if you plan to exceed the event hub quotas
I'm working in IoT enterprise application, where we have created all resources in South Central US. Recently (9/4/18) I noticed South Central US was down for long business hours due to any reason.
Now I'm trying to find best possible solution for high availability when a complete region down.
We are using following Azure resources.
EventHub (telematic data ingestion)
Azure Functions (EventHub, CosmosDB, ServicesBus Trigger)
Web App & WebJob (Schedule and continuous)
ServiceBus (Queue & Topic)
Application Insight (Application logs)
Storage Account (EventHub checkpointing and other data)
Cosmos DB
VSTS (CI/CD)
For Cosmos DB I know the solution, what should I do for other resources?
I don't see any way to create EventHub or ServiceBus multi-region cluster.
There's no cluster arrangement for ServiceBus & EventHub but can set up a fail-over flow for both.
Please refer to these articles on MS Docs:
Azure Event Hubs Geo-disaster recovery
Best practices for insulating applications against Service Bus outages and disasters
Let me know if that helps!
Azure provides Availability Zones and Geo Disaster Recovery support for both Service Bus and Event Hubs.
Here is the link for Availability Zones for Service Bus and Event Hubs
For Geo Disaster Recovery, look into Service Bus DR, Event Hubs DR