Azure Event Hub to Stream Analytics with Partitions - azure

Azure documentation states that:
Partitions are a data organization mechanism and are more related to the degree of downstream parallelism required in consuming applications than to Event Hubs throughput.
Assuming that the only consumer of the EventHubClient is Azure Stream Analytics, is it relevant to configure a series of Partitions as input to the Stream Analytics job?
For example, if the Stream Analytics job is configured to scale to 6 Streaming Units, will configuring the EventHubClient, that loads the events, to leverage 6 Partitions, effect 6 parallel streams of input?
Or, are Partitions even relevant when the only consuming client is a Stream Analytics job?

The 6 Streaming units has nothing to do with the EventHubclient, it is relevant to the #partitions you configured in ASA job

Related

multi-tenant azure stream analytics

I have a use case in IoT streaming, in our current architecture data from the IoT hub is consumed by our stream analytics jobs for realtime reporting on powerBi dashboards. I want to be able to expand this to additional tenants now. From what i have gathered this seems to be possible with dedicated azure stream analytics clusters and i dont seem to understand how the ingestion to the clusters would occur? Would it mean i will need to have a load balancer between my IoT hub and stream analytics jobs? Or is there a better way i could achieve this?

Azure Event Hub vs Kafka as a Service Broker

I'm evaluating the use of Azure Event Hub vs Kafka as a Service Broker. I was hoping I would be able to create two local apps side by side, one that consumes messages using Kafka with the other one using Azure Event Hub. I've got a docker container set up which is a Kafka instance and I'm in the process of setting up Azure Event hub using my Azure account (as far as I know there's no other way to create a local/development instance for Azure Event Hub).
Does anyone have any information regarding the two that might be useful when comparing their features?
Can't add a comment directly, but the currently top rate answer has the line
Kafka can have multiple topics each Azure Event Hub is a single topic.
This is misleading as it makes it sound like you can't have multiple topics, which you can.
As per https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview#kafka-and-event-hub-conceptual-mapping an "Event Hub" is a topic while an "Event Hub Namespace" is the Kafka cluster.
This decision usually is driven by a broader architectural choice if you are choosing azure as your iaas and paas solution then event hub provides a great integration within the azure ecosystem but if you don't want a vendor lock in kafka is better option.
Operationally also if you want fully managed service then with event hub it's out of the box but with kafka you also get this with confluent platform.
Maturity wise kafka is older and with large community you have a larger support.
Feature wise what kafka ecosystem provides azure ecosystem has those things but if you talk about only event hub then it lacks few features compared to kafka
I think this link can help you extend your understanding https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
While Apache Kafka is software you typically need to install and operate, Event Hubs is a fully managed, cloud-native service. There are no servers, disks, or networks to manage and monitor and no brokers to consider or configure, ever. You create a namespace, which is an endpoint with a fully qualified domain name, and then you create Event Hubs (topics) within that namespace. For more information about Event Hubs and namespaces, see Event Hubs features. As a cloud service, Event Hubs uses a single stable virtual IP address as the endpoint, so clients don't need to know about the brokers or machines within a cluster. Even though Event Hubs implements the same protocol, this difference means that all Kafka traffic for all partitions is predictably routed through this one endpoint rather than requiring firewall access for all brokers of a cluster. Scale in Event Hubs is controlled by how many throughput units you purchase, with each throughput unit entitling you to 1 Megabyte per second, or 1000 events per second of ingress and twice that volume in egress. Event Hubs can automatically scale up throughput units when you reach the throughput limit if you use the Auto-Inflate feature; this feature work also works with the Apache Kafka protocol support.
You can find more on feature comparison here - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
Kafka can have multiple topics each Azure Event Hub is a single topic. Kafka running inside a container means you have to manage it. Azure Event Hub is a PaaS which means they managed the platform side. If you don't know how to make Kafka redundant, reliable, and scalable you may want to go with Azure Event Hubs or any PaaS that offers a similar pub/sub model. Event Hub platform is already scalable, reliable, and redundant.
You should compare
the administration capabilites / effort (as previously said)
the functional capabilities such as competing customer and pub/sub patterns
the performance : you should consider kafka if you plan to exceed the event hub quotas

Is stopping the azure stream analytics will stop the costing?

I would need to know that,
Is stopping Azure stream analytics service will stop the costing.
As per the answer from MSFT: For Azure Stream Analytics, there is no charge when the job is stopped.
But for Azure Stream Analytics on IoT Edge: Billing starts when an ASA job is deployed to devices, no matter what the job status is (running/failed/stopped).
Welcome to Stackoverflow!
Note: There is no charges for the stopped jobs. It will be billed on basis on steaming units in Cloud and jobs/devices in Edge.
Detailed explanation:
As a cloud service, Stream Analytics is optimized for cost. There are no upfront costs involved - you only pay for the streaming units you consume, and the amount of data processed. There is no commitment or cluster provisioning required, and you can scale the job up or down based on your business needs.
While creating stream Analytics Job, if you created a Stream Analytics job with streaming units = 1, it will be billed $0.11/hour.
Pricing:
Azure Stream Analytics on Cloud: If you created a Stream Analytics job with streaming units with N, it will be billed $0.11 * N/hour.
Azure Stream Analytics on Edge: Azure Stream Analytics on IoT Edge is priced by the number of jobs that have been deployed on a device. For instance, if you have two devices and the first device has one job whereas the second device has two jobs your monthly charge will be (1 job)(1 device)($1/job/device)+(2 jobs)(1 device)($1/job/device) = $1+$2 = $3 per month.
Hope this helps. If you have any further query do let us know.

Azure Stream Analytics job expensive for small data?

In order to write sensor data from an IoT device to a SQL database in the cloud I use an Azure Streaming Analytics job. The SA job has an IoT Hub input and a SQL database output. The query is trivial; it just sends all data through).
According to the MS price calculator, the cheapest way of accomplishing this (in western Europe) is around 75 euros per month (see screenshot).
Actually, only 1 message per minute is send through the hub and the price is fixed per month (regardless of the amount of messages). I am surprised by the price for such a trivial task on small data. Would there be a cheaper alternative for such low capacity needs? Perhaps an Azure function?
If you are not processing the data real-time then SA is not needed, you could just use an Event Hub to ingest your sensor data and forward it on. There are several options to move data from the Event Hub to SQL. As you mentioned in your question, you could use an Azure Function or if you want a no-code solution, you could us a Logic App.
https://learn.microsoft.com/en-us/azure/connectors/connectors-create-api-azure-event-hubs
https://learn.microsoft.com/en-us/azure/connectors/connectors-create-api-sqlazure
In addition to Ken's answer, the "cold path" can be your solution, when the telemetry data are stored in the blob storage by Azure IoT Hub every 720 seconds (such as a maximum batch frequency).
Using the Azure Event Grid on the blob storage, it will trigger an EventGridTrigger subscriber when we can handle starting a streaming process for this batch (or for a group of batches within an one hour). After this batch process is done, the ASA job can be stopped.
Note, that the ASA job is billed based on the active processing time (that's the time between the Start/Stop) which your cost using an ASA job can be significantly dropped down.

how many stream analytic jobs can you have for one iot hub

I have created two stream analytic jobs for one iothub having multiple devices.
But, the data is being received only to the first created stream analytic job. Even if I stop that, no data is being sent to the second stream analytic job.
Is that a bug or am I missing something. Or is it simply that one iothub can have only one stream analytic job.
It looks like your both stream analytic jobs are using the same Consumer Group such as $Default from the same IoT Hub.
So, create for Azure IoT Hub two Consumer Groups dedicated for each ASA job, in other words, each ASA job will have own Consumer Group.
the following screen snippet shows an example of the plugin ASA job to the IoT Hub, where the Consumer group can be selected specifically for each job.

Resources