Choosing real time azure services hdinsight Kafka or service bus? - azure

I am evaluating message streaming services on Azure. I want real time message processing service (Most reliable) where message carrying high degree of importance and data must not be lost. Basically I want to make available real time data transmitted from some third party cloud to the API I have hosted on Azure (I have exposed API to the third party so that they can send data).
Following are the options I worked up on.
Event Hub and IOT Hub are used mostly for telemetry data/events. So I am excluding those. Here message is carrying great value in my use case.
Service Bus or Kafka on HDInsight I am thinking to use.
Now, service bus is offering more features as compared to Kafka and also providing very good documentation about how to use it.
But on the documentation I couldn't find anywhere that service bus is used for real time processing. Where as documentation is available stating use kafka for real time processing.
https://learn.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/real-time-ingestion
Which should be the best service among above for my use case? Any other better option which I have not thought of?

Related

How read from 2 Azure Service Buses in different regions and treat it as one

I have nearly identical service buses in 2 separate regions. I am trying to make them be more region agnostic for consuming applications.
While looking into things like Azure service bus geo-disaster recovery and message replication and cross-region federation and how complicated they are, I was thinking instead that I could create a service bus client that would just read from the same topic/subscription name in separate regions and treat them as if they came from the same region.
While I'm sure this can be implemented, I was wondering, does this functionality exists in any current Microsoft libraries? Basically, if message A get published to the east topic/subscription and message B gets published to the Central US topic/subscription, then the client would receive A and B. The order is not important.
Thanks!
Some sort of functionality has existed in the track 0 SDK of Azure Service Bus SDK for failover but not concurrent execution. As it was a client-side feature, it did not get much traction and was very confusing and complicated.
NServiceBus had a legacy Azure Service Bus transport that supported using more than one namespace concurrently. The feature was deprecated as it was also more of a trouble than good. Not to mention the fact that Service Bus has introduced the Premium tier which would handle availability better than multiple standard namespaces together. On top of that, add availability zones and it's hands down a better option than the complexity of setting up multiple receivers.
In case your namespaces are identical, I would suggest consolidating them. One of the strategies would be to "forward" messages from one namespace to another using some processor as there's no cross-namespace forwarding.

Azure Event Hub vs Kafka as a Service Broker

I'm evaluating the use of Azure Event Hub vs Kafka as a Service Broker. I was hoping I would be able to create two local apps side by side, one that consumes messages using Kafka with the other one using Azure Event Hub. I've got a docker container set up which is a Kafka instance and I'm in the process of setting up Azure Event hub using my Azure account (as far as I know there's no other way to create a local/development instance for Azure Event Hub).
Does anyone have any information regarding the two that might be useful when comparing their features?
Can't add a comment directly, but the currently top rate answer has the line
Kafka can have multiple topics each Azure Event Hub is a single topic.
This is misleading as it makes it sound like you can't have multiple topics, which you can.
As per https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview#kafka-and-event-hub-conceptual-mapping an "Event Hub" is a topic while an "Event Hub Namespace" is the Kafka cluster.
This decision usually is driven by a broader architectural choice if you are choosing azure as your iaas and paas solution then event hub provides a great integration within the azure ecosystem but if you don't want a vendor lock in kafka is better option.
Operationally also if you want fully managed service then with event hub it's out of the box but with kafka you also get this with confluent platform.
Maturity wise kafka is older and with large community you have a larger support.
Feature wise what kafka ecosystem provides azure ecosystem has those things but if you talk about only event hub then it lacks few features compared to kafka
I think this link can help you extend your understanding https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
While Apache Kafka is software you typically need to install and operate, Event Hubs is a fully managed, cloud-native service. There are no servers, disks, or networks to manage and monitor and no brokers to consider or configure, ever. You create a namespace, which is an endpoint with a fully qualified domain name, and then you create Event Hubs (topics) within that namespace. For more information about Event Hubs and namespaces, see Event Hubs features. As a cloud service, Event Hubs uses a single stable virtual IP address as the endpoint, so clients don't need to know about the brokers or machines within a cluster. Even though Event Hubs implements the same protocol, this difference means that all Kafka traffic for all partitions is predictably routed through this one endpoint rather than requiring firewall access for all brokers of a cluster. Scale in Event Hubs is controlled by how many throughput units you purchase, with each throughput unit entitling you to 1 Megabyte per second, or 1000 events per second of ingress and twice that volume in egress. Event Hubs can automatically scale up throughput units when you reach the throughput limit if you use the Auto-Inflate feature; this feature work also works with the Apache Kafka protocol support.
You can find more on feature comparison here - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
Kafka can have multiple topics each Azure Event Hub is a single topic. Kafka running inside a container means you have to manage it. Azure Event Hub is a PaaS which means they managed the platform side. If you don't know how to make Kafka redundant, reliable, and scalable you may want to go with Azure Event Hubs or any PaaS that offers a similar pub/sub model. Event Hub platform is already scalable, reliable, and redundant.
You should compare
the administration capabilites / effort (as previously said)
the functional capabilities such as competing customer and pub/sub patterns
the performance : you should consider kafka if you plan to exceed the event hub quotas

Is there any specific drawbacks or issues when using Azure Service Bus on Kubernetes cluster?

I'm trying to create a table of comparison of different messaging queue, from opensource to proprietary. and I'm trying to identify the issues and disadvantages of Azure Service Bus without availing the standard and premium tier. I would like to ask this question to those who experienced implementing it on their own application.
I tried researching for related topics but i cant find reliable resource.
I'm expecting a list of possible issue and disadvantages in general in any of this areas; features, limitations, experience, maturity, community, and performance.
Service bus is just a medium to deliver messages from issuer to the receiver. it doesnt matter when they both are as long as they can talk to the service bus. your application can talk to service bus from inside the container just fine.

Monitoring Azure Event Hub

I have been researching on Microsoft Azure Event Hubs. My goal is to figure out a way to provide automatic scalability. This is a experimental work and I am really only trying to know what can I do with Azure event hubs. I do not have access to the Azure platform to test test anything :( .
Well, so far, I found that through REST API and Service Bus Powershell I can add Throughput Units (to increase performance - I am relying on this: Scale Azure Service Bus through Powershell or API) and increase or decrease Event's Expiration time (which might influence capacity - https://msdn.microsoft.com/en-us/library/azure/dn790675.aspx).
The problem is that, presuming that the previous techniques work and I am able to scale event hubs' performance automatically, I still need a way to know when to trigger scalability mechanisms. To know when and how to trigger scalability, I need to work on some functions that rely upon the event hub's metrics (or a way to monitoring it). The problem is that I can't really find any metrics. The only thing that I find is this: https://azure.microsoft.com/en-us/documentation/articles/cloud-services-how-to-monitor/ - Which actually does not solve my problem because although it may present some interesting metrics, it does not serve the purposes of my "application" (which will come if I can prove that I can successfully scale Azure automatically); and this Azure service bus statistics/Monitoring - which's links are not working.
Surely I can find more information about Service Bus Explorer, and surely it may provide some interesting insights over the event hub metrics, I am just wondering if there is something like this: https://github.com/HBOCodeLabs/incubator-storm/blob/master/STORM-UI-REST-API.md that allow me to access some kind of metrics, rather than creating my own metrics
Thanks in advance
Best regards
You can retrieve metrics about Event Hubs (an Event Hub is a Service Bus Entity) using the Service Bus Entity Metrics REST APIs(https://msdn.microsoft.com/library/azure/dn163589.aspx). Using this you can retrieve the same metrics displayed in the portal such as:
Number of incoming messages
Incoming throughput
Outgoing throughput
These should help you determine when you need to scale your application up or down.
This video is useful for getting started https://channel9.msdn.com/Blogs/Subscribe/Service-Bus-Namespace-Management-and-Analytics
If 3rd party services are an option, look into CloudMonix # http://cloudmonix.com
It can monitor Event Hubs (among gazillion other Azure-related things) and execute Azure Automation runbooks (among gazillion other actions) as a reaction to load conditions/throughout of a whole hub or individual partitions and optionally based on any other metrics in your environment.
Your Azure Automation runbooks could have the logic to execute increases in your EH's throughout, etc.
Disclaimer: I'm affiliated with the product.
HTH
Service Bus Explorer is great. I actually use this.
ServiceBus Explorer

Background Worker or Worker with Service Bus for SQL Database access?

I'm building a game for Windows Phone 8 and would like to use Windows Azure SQL Database for storing my users' data (mostly scores and rankings).
I have been reading Azure's documentation on SQL Database and found this link which describes just the scenario I'm looking for (it's Scenario B in the picture): I want my clients (the game running in a user's windows phone) to get data from an SQL Server through a middle application also hosted on Windows Azure.
By reading further the documentation (personally I think it's really messy and hard to find what you're looking for in there), I've learned that I could use Cloud Services for this middle application, however I'm not sure if I should use a background worker which provides an HTTP API or a worker with a Service Bus Relay (I discovered that I can use service bus in WP8 in this link).
I've got a few questions that I couldn't find an answer to:
1) What would be the "standard" way to go in this case?
2) If both ways are acceptable, are there other advantages to using a Service Bus other than an easier way to connect and send messages to my middle application? What are the disadvantages?
3) Is a cloud service really what I'm looking for (and not just a VM with the middle application code running in it)?
Its difficult to answer these sort of question as there are lots of considerations. I don't believe there is a necessarily 'standard way'.
The Service Bus' relay service's purpose is to help traverse firewalls and NATs, not something that directly relates to your scenario, I suspect.
The Service Bus, though, also includes a messaging capability which provides queues, topics and subscriptions to use to exchange messages between clients or client/server.
You could use the phone client to write and read messages to/from queues. you would then have a worker role hosting your application logic and accessing the database as needed.
Some of the advantages of using messaging include being load leveller, helping handling peaks in traffic (at the expense of latency), helping separating concerns and allowing you to accept requests from the clients when the backend is down as so can help with resiliency.
In theory they can also help you deliver messages to the client in the same fashion, by using a queue or subscription per client, but for a large number of clients this may become a management issue.
On the downside you would have to work with what is a proprietary protocol, and will need to understand the characteristics and limitations of the service bus. you will need to manage the queues and topics over time. there will also be some increased latency, although typically not an issue and, finally, you will have to implement asynchronous messaging on the client side which has advantages but is also harder to implement.
I would imagine that many architectures follow the WEB API route by using a web role cloud service exposing the API. The web role can then perform any business logic and connect to the database in the background.
A third option, which you didn't mention, is to use Windows Azure Mobile Services and implement your business logic as a service API there

Resources