In Azure Eventhub how to send incoming data to a sql database - azure

I have some data being collected that is in an xml format. Something that looks like
<OLDI_MODULE xmlns="">
<StStoHMI_IBE>
<PRack>0</PRack>
<PRackSlotNo>0</PRackSlotNo>
<RChNo>0</RChNo>
<RChSlotNo>0</RChSlotNo>
This data is sent to Azure Eventhub. I wanted to send this data to a SQL database. I created a stream in Azure Stream Analytics that takes this input and puts it in a SQL database. But when the input format is asked for the input stream, there are only JSON,CVS and Avro. Which of these formats can I use? Or which of the azure services should I use to move data from Eventhub to sql database?

By far the easiest option is to use Azure Stream Analytics as you intended to do. But yes, you will have to convert the xml to json or another supported format before you can use the data.
The other options is more complex, requires some code and a way to host the code (using a worker role or web job for instance) but gives the most flexibility. That option is to use an EventProcessor to read the data from the Event Hub and put it in a database.
See https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/ for how to set this up.
The main work is done in the Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable messages) method. Based on the example it will be something like:
async Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
{
foreach (EventData eventData in messages)
{
string xmlData = Encoding.UTF8.GetString(eventData.GetBytes());
// Parse the xml and store the data in db using Ado.Net or whatever you're comfortable with
}
//Call checkpoint every 5 minutes, so that worker can resume processing from 5 minutes back if it restarts.
if (this.checkpointStopWatch.Elapsed > TimeSpan.FromMinutes(5))
{
await context.CheckpointAsync();
this.checkpointStopWatch.Restart();
}
}

JSON would be a good data format to be used in Azure Event Hub. Once you receive the data in Azure Event Hub. You can use Azure Stream Analytics to move the data SQL DB.
Azure Stream Analytics consists of 3 parts : input, query and output. Where input is the event hub , output is the SQL DB. The query should be written by you to select the desired fields and output it.
Check out the below article:
https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-define-outputs/
Stream Analytics would be Azure resource you should look into for moving the data from Event Hub

Related

Processing Azure Data Factory Event Trigger Properties

I have a data factory which triggers based on storage blob event. In the triggered event, I see two properties TriggerTime and EventPayload. As I have need to read the Storage Blob related information I am trying to process the EventPayload in the Data Factory. I would like access a property like 'url' from the data tag.
A sample payload looks like this:
{
"topic":"/subscriptions/7xxxxe5bbccccc85/resourceGroups/das00/providers/Microsoft.Storage/storageAccounts/datxxxxxx61",
"subject":"/blobServices/default/containers/raw/blobs/sample.parquet",
"eventType":"Microsoft.Storage.BlobCreated",
"id":"a1c320d7-501f-0047-362c-xxxxxxxxxxxx",
"data":{
"api":"FlushWithClose",
"requestId":"5010",
"eTag":"0x8D82743B5D86E72",
"contentType":"application/octet-stream",
"contentLength":203665463,
"contentOffset":0,
"blobType":"BlockBlob",
"url":"https://mystorage.dfs.core.windows.net/raw/sample.parquet",
"sequencer":"000000000000000000000000000066f10000000000000232",
"storageDiagnostics":{
"batchId":"89308627-6e28-xxxxx-96e2-xxxxxx"
}
},
"dataVersion":"3",
"metadataVersion":"1",
"eventTime":"2020-07-13T15:45:04.0076557Z"
}
Is there any short hand for processing the EventPayload in the Data Factory? For example, the filename and folderpath of an event can be accessed using #triggerBody() in the Data Factory. Does this require custom code like Azure function?

Azure Storage Queue message to Azure Blob Storage

I have access to a Azure Storage Queue using a connection string which was provided to me (not my created queue). The messages are sent once every minute. I want to take all the messages and place them in Azure Blob Storage.
My issue is that I haven't been succesful in getting the message from the attached Storage Queue. What is the "easiest" way of doing this data storage?
I've tried accessing the external queue using Logic Apps and then tried to place it in my own queue before moving it to Blob Storage, however without luck.
If you want to access and external storage in the logic app, you will need the name of the storage account and the Key.
You have to choose the trigger for an azure queues and then click in the "Manually enter connection information".
And in the next step you will be able to choose the queue you want to listen for.
I recomend you to use and azure function, something like in this article:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-output?tabs=csharp
Firts you can try only reading the messages, and then add the output that create your blob:
[FunctionName("GetMessagesFromQueue")]
public IActionResult GetMessagesFromQueue(
[QueueTrigger("%ExternalStorage.QueueName%", Connection = "ExternalStorage.StorageConnection")ModelMessage modelmessage,
[Blob("%YourStorage.ContainerName%/{id}", FileAccess.Write, Connection = "YourStorage.StorageConnection")] Stream myBlob)
{
//put the modelmessage into the stream
}
You can bind to a lot of types not only Stream. In the link you have all the information.
I hope I've helped

Azure Event Hub - Can't understand Java flow

from Microsoft EventHub Java SDK examples (https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-java-get-started-send), these are the steps that needs to be taken to be able to consume messages from an Even-Hub via the java SDK :
1.Create a storage account
2.Create a new class called EventProcessorSample. Replace the placeholders with the values used when you created the event hub and storage account:
3.
String consumerGroupName = "$Default";
String namespaceName = "----NamespaceName----";
String eventHubName = "----EventHubName----";
String sasKeyName = "----SharedAccessSignatureKeyName----";
String sasKey = "----SharedAccessSignatureKey----";
String storageConnectionString = "----AzureStorageConnectionString----";
String storageContainerName = "----StorageContainerName----";
String hostNamePrefix = "----HostNamePrefix----";
ConnectionStringBuilder eventHubConnectionString = new ConnectionStringBuilder()
.setNamespaceName(namespaceName)
.setEventHubName(eventHubName)
.setSasKeyName(sasKeyName)
.setSasKey(sasKey);
There are several things i don't understand about this flow -
A. Why is a storage account required? Why does it needs to be created only when creating a consumer and not when creating the event hub itself?
B. What is 'hostNamePrefix' and why is it required?
C. More of a generalaztion of A, but i am failing to understand why is this flow so complicated and needs so much configuration. Event Hub is the default and only way of exporting metrics/monitoring data from Azure which is a pretty straightforward flow - Azure -> Event Hub -> Java Application. Am i missing a simpler way or a simpler client option?
All your questions are around consuming events from Event hub.
Why is a storage account required?
Read the event only once: Whenever your application will read event from event hub, you need to store the offset(identifier for the amount of event already read) value somewhere. The storing of this information is known as 'CheckPointing' and this information will be stored in Storage Account.
Read the events from starting everytime your app connects to it: In this case, your application will keep on reading the event from very beginning whenever it will start.
So, the storage account is required to store the offset value while consuming the events from event hub in case if you want to read event only once.
Why does it needs to be created only when creating a consumer and not
when creating the event hub itself?
As it depends upon the scenario, whether you want to read your events only once or every time your app starts and that's why storage account is not required while creating event hub.
What is 'hostNamePrefix' and why is it required?
As the name states 'hostNamePrefix', this is name for your host. The host means the application which is consuming the events. And it's a good practice to make use of GUID as a hostNamePrefix. HostNamePrefix is required by the event hub to manage the connection with the host. In case, if you have 32 partitions, and you have deployed 4 instances of your same application then 8 partition each will be assigned to your 4 different instances and that's where the host name helps the event hub to manage the information about the connection of the respective partitions to their host.
I will suggest you to read this article on event hub for clear picture of the event processor host.

Create a nodejs web app that reads data from AZURE. (Stream analytics or Event Hubs or Log analytics)

I have connected several devices to Azure Stream Analytics that will send in various data. ( Temp, light, humidity and etc )
I am not sure how can I read data Azure Resources and display it on my web application that I've published on Azure. For example, reading device_name, device's data.
What I need is probably a sample code that reads some data from Azure and then display it on a simple 'h1' or 'p' tag.
PS: I've seen lots of tutorial that teaches how to publish web app to Azure. But there're hardly any tutorials that specifically teaches how to read and grab data from Azure Resources.
You can use Azure SDK for Node.js to manage Azure resources.
This is an example retrive information about an existing event hub. And here is the Azure Node SDK reference.
const msRestAzure = require('ms-rest-azure');
const EventHubManagement = require('azure-arm-eventhub');
const resourceGroupName = 'testRG';
const namespaceName = 'testNS';
const eventHubName = 'testEH';
const subscriptionId = 'your-subscription-id';
msRestAzure
.interactiveLogin()
.then(credentials => {
const client = new EventHubManagement(credentials, subscriptionId);
return client.eventHubs.get(resourceGroupName, namespaceName, eventHubName);
})
.then(zones => console.dir(zones, { depth: null, colors: true }))
.catch(err => console.log(err));
I assume that you are using some shortcuts.
And that you are sending events from devices to EventHub
So the architecture right now looks like this:
Device -> EventHub -> Azure StreamAnalytics
and AppService called my web application
Azure StreamAnalytics just help you to do some aggregation, calculation and so on.
On the other hand, you can use e.g. Azure Function
I would suggest to store data in storage e.g. in Azure Storage
This is proposed architecture:
Device -> EventHub -> Azure StreamAnalytics or Azure Function -> Azure Table Storage
AppService <-> Azure Table Storage
And later display data in your web app from storage.
Here is example from docs:
Retrieve an entity by key
tableSvc.retrieveEntity('mytable', 'hometasks', '1', function(error, result, response){
if(!error){
// result contains the entity
}
});
The easiest way to visualize the output of Azure Stream Analytics is to use Power BI, if you have access to it. In few minutes you can create a dashboard and show values or graph. More info here. Your dashboard can be also embedded in your own app using "Power BI embedded".
If you want to create your own application to visualize output, they are several possible ways depending of your latency requirements. E.g. you can output to Cosmos DB or SQL and then use their client library. You can also output to Azure Function and use Signal R to create dynamic page.
Let us know if you have any further question.

Blob Storage Input returns null in Stream Analytics Job output query

I'm beginner in Azure, I have created a Stream Analytics Job in Windows Azure. Here I'm using two Inputs in job, one is type of Event Hub and another is type of Blob Storage.
Below is the SQL Query for ASA Job (To store output in SQL Database) :
SELECT
IP.DeviceId
, IP.CaptureTime
, IP.Value
, [TEST-SAJ-DEMO-BLOB-Input].[DataType] AS TempVal -- Blob Storage Input
INTO
[Test-Output-Demo] -- SQL Table to store output
FROM
[TEST-SAJ-DEMO-Input] IP -- Event Hub Input
Below is JSON data in my Blob storage container (Blob Storage Input [TEST-SAJ-DEMO-BLOB-Input])
{"DataType":"DEMO"}
Everything is working fine except [TEST-SAJ-DEMO-BLOB-Input].[DataType] returns null instead of string 'DEMO'.
All data sent by Event Hub Input is storing into sql table and their is no error in process.
Any help is appreciated ...
I was trying possible changes to resolve this issue and finally it's resolved.
It was Blob Storage Input [TEST-SAJ-DEMO-BLOB-Input] configuration mistake, In configuration Path Pattern I was defined, {date}{time}/Test_Demo.json now I have just changed it with simply Test_Demo.json and it works.
So the issue was in Path Pattern of Blob Storage Input ...
But I'm still not clear about Path Pattern (How 'Path Pattern' works?), why "{date}{time}/Test_Demo.json" was not working
Is this just an alias issue? You've used IP as the FROM alias. But then used the full source name for the data type field. I know that in T-SQL this wouldn't matter.
Try:
SELECT
IP.DeviceId,
IP.CaptureTime,
IP.Value,
IP.DataType AS TempVal -- Blob Storage Input
INTO
[Test-Output-Demo] -- SQL Table to store output
FROM
[TEST-SAJ-DEMO-Input] IP -- Event Hub Input
Also check the input for the stream job is set for JSON encoding.

Resources