In Azure stream analytics trying to pull a data from event hub - azure

In Azure stream analytics while trying to pull a data from event hub I am receiving the following error
Diagnostics: Source '<unknown_location>' had 1 occurrences of kind
'InputDeserializerError.InvalidData' between processing times
'2020-11-19T04:08:35.3436931Z' and '2020-11-19T04:08:35.3686240Z'.
Unable to create records from the given Avro record schema
I want to know what could be the reason?
Is there a way to find what kind of data file is streaming in EVENTHUB?

Check the event serialization format of the input(your event hub). Potentially, what you set there, and what you are sending indeed does not match. Could be that you are sending JSON, but you specified AVRO as the event serialization or vice versa.
You can download Service bus explorer, and connect it to your EventHub and that way inspect what you are getting. I advise adding an additional consumer group for your EventHub just to avoid competing consumers and connecting Service bus explorer to that particular consumer group.

Related

How Azure Stream Analytics find schema of data coming from Event Hub

I am following this tutorial to understand how Stream Analytics work on Azure. https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-real-time-fraud-detection?toc=%2Fazure%2Fsynapse-analytics%2Fsql-data-warehouse%2Ftoc.json&bc=%2Fazure%2Fsynapse-analytics%2Fsql-data-warehouse%2Fbreadcrumb%2Ftoc.json
A cmd event generator application is sending data to event hub to which a stream analytics job is connected. I don't understand two things
I have not specified data schema anywhere, yet I can query the data. How?
The tuturial recommends that I create a consumer group in event hub. A consumer group MyConsumerGroup is created in the tutorial but never used. What is the purpose of the consumer group?

Azure - ingesting data into IoT Hub and sending notifications via slack/email for certain messages

I've got data coming into IoTHub and want to filter on them.
Relevant data I want to forward to slack as notification.
I've got the IoT Hub and a slack subscription in place and am having trouble connecting the two.
In order to do a rather complex time-based query, I figure to use Stream Analytics and configure the IoT Hub as input. From research I found Logic Apps can send messages to Slack over a webhook. Using a Service Bus Queue as output for Stream Analytics, I can get the data into Logic Apps.
So it's:
IoT Hub (ingest all data) => Stream Analytics (filter) => Service Bus Queue (queue up the data) => Logic Apps (send to Slack)
Looks a bit bulky but that seems to be one way of doing it (is there a better one?!?)
Doing this I ran into issues. I selected my IoT Hub as input for Stream Analytics and the simple query SELECT * INTO [Queue] FROM [Hub] fails, saying there was no data.
It does make sense if the IoT Hub just pushes new data to its endpoints and then discards it. So I created a test set in the Stream Analytics Job and the query runs fine.
However I do get data into the Hub which is not (all) picked up nor forwarded by the job to the service bus queue. I do see some activity on the queue but not nearly enough to be the data I receive.
This seems to be a very common scenario, ingesting data in IoT Hub and sending notifications to email or slack if they are of a certain type. Can you explain the steps to take or point me to a resource that does it. Maybe I'm on the wrong path as I cannot find anything that describes this.
Thanks

AZURE Event Hub processing multiple protocols to same Topic

Upon reading on AZURE Event Hub,
I note that we can send data via
http(s)
AMQP
KAFKA
As I am not an integration (messaging) expert, the following then:
Can I use both AMQP and http(s) to write to the same Event Hub Topic
and subsequently can a single AZURE Function read from that same single Event Hub Topic regardless of how written to?
For KAFKA, this will need to be always a separate Event Hub (Topic) is my understanding.
The AZURE EVENT HUB KAFKA look-like API means that, if you, say, all send a JSON format using all 3 protocols, they can be mapped to the same Event Hub (= Topic), and one can read the Event Hub in KAFKA mode, say.
This is a good read https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-exchange-events-different-protocols but I checked with a more experienced person to confirm.

Using partitionId or partitionKey with Iot Hub Azure

We are developing an application where IoT devices will be publishing events to azure IoT hub using MQTT protocol (by using one topic to push message). We want to consume these message using Stream Analytic service. And to scale Stream analytic services, it is recommended to use partitionBy clause.
Since, we are not using Azure Event hub SDK, can we somehow attached partitionId with events?
Thanks In Advance
As Rita mentioned in the comments, Event Hub will automatically associate each device to a particular partition.
Then, when you can use PARTITION BY PartitionId for steps closer to the input to efficiently parallelize processing of the input and reduce/aggregate the data.
Then, you can have another non-partitioned step to output to SQL sending some aggregate data.
Doing that you will be able to assign more thank 6 SUs, even with an output to SQL.
We will update our documentation to give more info about scaling ASA jobs and describe the different possible scenarios.
Thanks,
JS - Azure Stream Analytics

Azure Function: IoTHub as Input and Output

I've developed an azure function to handle decompression of messages as they enter the IoTHub.
The Function is connected to the IoTHub's built in Messaging Endpoint, so it can function like an EventHub.
What I would like to do it have the Function output the decompressed content back into the IoTHub so the Stream Analytics and other Jobs that I have running will not have to be connected to a different Endpoint to continue receiving telemetry.
There seems to be a fair amount of documentation surrounding the Azure Functions and hooking them up to IoTHubs, but some of it is from last year and I know things have changed quite a bit.
This is my current connection string to read and write to the same IoTHub:
Endpoint=sb://iothub-ns-34997-5db385cb1f.servicebus.windows.net/;SharedAccessKeyName=iothubowner;SharedAccessKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=;EntityPath=IoTHub
Right now I've setup the Output to go to the IoTHub endpoint and I'm getting an error
Exception while executing function: Functions.DecompressionJS. Microsoft.Azure.WebJobs.Host: Error while handling parameter _binder after function returned:. Microsoft.ServiceBus: Unauthorized access. 'Send' claim(s) are required to perform this operation. Resource: 'sb://iothub-ns-34997-5db385cb1f.servicebus.windows.net/iothub'. TrackingId:e85de1ed565243bcb30bc622a2cab252_G4, SystemTracker:gateway6, Timestamp:6/22/2017 9:20:16 PM.
So I figured there was something wrong with the connection string and so I modified it to include the /iothub that the exception was telling me to use, since the rest of the endpoint matched the current connection string.
Once I updated the connection string and reran the function I got a different exception:
Exception while executing function: Functions.DecompressionJS. Microsoft.Azure.WebJobs.Host: Error while handling parameter _binder after function returned:. Microsoft.ServiceBus: Invalid EventHub address. It must be either of the following. Sender: <EventHubName>. Partition Sender: <EventHubName>/Partitions/<PartitionNumber>. Partition Receiver: <EventHubName>/ConsumerGroups/<ConsumerGroupName>/Partitions/<PartitionNumber>. TrackingId:ecb290822f494a86a61c21712656ea4c_G0, SystemTracker:gateway6, Timestamp:6/22/2017 8:44:14 PM.
So at this point I'm thinking that the IoTHub endpoint is only for reading messages and there is no way to get the decompressed content back into the IoTHub.
I'm hoping someone can prove me wrong and help me to configure my connection strings so I can have a closed loop and retrieve and send messages to and from the IoTHub without an intermediary.
The Azure IoT Hub is a bidirectional gateway between the devices and Azure cloud back end solutions. The communications with the Azure IoT Hub is done via its device-facing and service-facing endpoints. See more details here.
Your scenario requires to decompress a device event before its passing to the telemetry stream pipeline. Basically this telemetry pre-processing in the typical Azure stream pipeline can be done in the Azure Function (or worker role) and/or Azure Stream Analytics (ASA) job like is shown in the following picture:
As you can see, the AF and/or ASA job are changing a real-time telemetry data in the stream pipeline and their state are stored in the next entity such as Event Hub. That's the common and recommended pattern of the real-time stream pipeline and push model.
Your scenario also requires to keep the same telemetry path (source) as you have it for uncompressed device events, so than there is a "non-standard" solution. The following screen snippet shows an example of this solution:
The concept of the above solution is based on the device emulator on the backend side. The Azure IoT Hub Routes will forward all events for their preprocessing to the custom endpoint such as Event Hub.
Behind that, the Azure Function will have a responsibility to decompress an ingested event and create new one for that device such as emulated device. Now, this emulated device can send a D2C message to the Azure IoT Hub like others real devices.
Note, that the emulated device is using a Https protocol (connection less) and Azure IoT Hub Authorization.
The events from the emulated devices in the Azure IoT Hub are routed to the Default Event Hub such as a default telemetry path.
Note, that the above solution allows to select an event preprocessing based on the Routes/Rules and its usage is depended from the business model.

Resources