Combine static and real time data in Azure Stream Analytics - azure

I am looking into combining data (stored in Azure SQL) and real-time stream data (coming via IoT Hub) in Stream Analytics. One way I found is to use blob storage to copy the SQL Azure data and use it as Input type "Reference Data" and in Stream Analytics query editor JOIN with the streaming data which works fine. However, I am looking into whether it is possible to use JavaScript UDF function capability in stream analytics to get data from SQL Azure and combine with streaming IoT data? I also don't know which one is the suggested approach to combining these type of data together?
Thanks

UDFs in streaming analytics won't allow you to call out to external services like SQL. They're used for things like basic data manipulation, regex, Math, etc. If your SQL data is slow moving in nature, the approach you've outlined here of using something like Data Factory to move SQL information into Blob storage and then use it as a Reference data inside your Stream Analytics query is the correct way (and only way currently) to solve your problem.
If it's fast moving data in SQL you'd want to investigate hooking into the SQL database changes and then publishing them on to Event Hubs. You could then pull this into your query as a second Data Stream input type and do the appropriate joins in your query.

Related

Processing an Event Stream in Azure Databricks

I am looking to implement a solution to populate some tables in Azure SQL based on events that are flowing through Azure Event Hubs into Azure Data Lake Service (Gen2) using Data Capture.
The current ingestion architecture is attached:
Current Architecture
I need to find an efficient way of processing each event that lands in the ADLS and writing it into a SQL database whilst joining it with other tables in the same database using Azure Databricks. The flow in Databricks should look like this:
Read event from ADLS
Validate schema of event
Load event data into Azure SQL table (Table 1)
Join certain elements of Table 1 with other tables in the same database
Load joined data into a new table (Table 2)
Repeat steps 1-5 for each incoming event
Does anyone have a reference implementation that has delivered against a similar requirement? I have looked at using Azure data Factory to pick up and trigger a Notebook whenever an event lands in ADLS (note, there is very low throughput of events (~1 every 10 seconds), however that solution will be too costly.
I am considering the following options:
Using Stream Analytics to stream the data into SQL (however, the joining part is quite complex and requires multiple tables
Streaming from the Event Hub into Databricks (however this solution would require a new Event Hub, and to my knowledge would not make use of the existing data capture architecture)
Use Event Grid to trigger a Databricks Notebook for each Event that lands in ADLS (this could be the best solution, but I am not sure if it is feasible)
Any suggestions and working examples would be greatly appreciated.

Grafana as Azure Stream Analytics output

I am pushing events to my Event hub, then this data is being analyzed in Azure Stream Analytics. I'd like to visualize output from stream analytics in Grafana.
What is the easiest approach to achieve this?
Azure Stream Analytics job can natively ingest the data into Azure Data Explorer. https://techcommunity.microsoft.com/t5/azure-data-explorer/azure-data-explorer-is-now-supported-as-output-for-azure-stream/ba-p/2923654
You can then use the Azure Data Explorer plugin in Grafana. https://techcommunity.microsoft.com/t5/azure-data-explorer/azure-data-explorer-is-now-supported-as-output-for-azure-stream/ba-p/2923654
Another option is to use Power BI instead of Grafana. https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-power-bi-dashboard
If I remember correctly, Grafana doesn't store data locally, you need to define a data source on top of one of the compatible storage systems.
Azure Stream Analytics doesn't come with a storage layer either, it's compute only.
So if you want to use ASA and Grafana, you need to output data from ASA to a data source that is supported by Grafana in ingress.
Looking at both lists that leaves only MSSQL via Azure SQL (hopefully it's compatible) as a native option. It's not a bad option for narrow dashboards, or if you intend to store your data in a RDBMS anyway. You can store your entire payload in an NVARCHAR(MAX) if you don't plan to consume the data in SQL.
But being clever, we can actually use the Functions output to write to any other store, or call any API. I'm not sure if Grafana has a direct ingestion API, but Azure Monitor does and it's a supported data source in Grafana.
The other option would be to go through ADX as explained in the other answer.
Not straightforward but doable ;)

How good is Azure Data Lake for storing an SQL database used for Power BI visualizations?

We have an Azure SQL database where we collect a large amount of sensor data and we regularly extract the data from it and transform it a bit with a python script. The end result is a pandas DataFrame file. We would like to store the transformed data in an Azure database and use it as a source of a power BI dashboard.
On the one hand, we want to show the "almost" real-time data on a dashboard (the latency due to the transformation etc. is acceptable, but the dashboard needs to refresh very frequently, let's say once a minute), but we also want to store the transformed data and query it later e.g. to visualize the data only for a given day.
Is it possible to convert the pandas DataFrame into SQL and store it on Data Lake and stream the data from there? I read that it is possible to store structured data on Data Lake and even query it, but I am unsure if this would be the best solution.
(My current task is to choose the best database for storing the transformed data to enable both streaming and querying it later. I am very new in Azure products and I don't have a sandbox account yet to even try around and identify possible pitfalls. I've just figured out that PowerBI does not support DirectQuery for DataLake and I feel like this can be an issue - meaning we would have to query the data on DataLake at first and store it somewhere if we wanted to visualize a subset, is that correct?)
Azure Datalake is not a database, just a store for the data both structured and unstructured, so as mentioned you can't direct query it unless you have some compute capacity (Databricks, Azure Synapse, Azure DataLake Analytics, Power BI Premium with enhanced compute)
Depending on your approach, it may be best to move from Azure SQL Database and Pandas, to Azure Databricks, that can ingest the streaming data, transform, and provide an outputted table that is stored in the data lake. You will then connect Power BI to the Databricks instance and query that. The data will only be available while the cluster is running.
Moving to Databricks, will involve rewriting your Panda code to Koalas, or preferably Pyspark.
You do have the option of using Databricks to write the items back to a Azure SQL Database table. Depending on what transformations you are doing you could keep it all in Azure SQL, or if it is sensor data streaming, take the data through Azure Event Hubs, to Azure Streaming Analytics (does transformations), to Azure SQL Database (store Realtime and historical).

Azure IOT Suite how to create a dashboard to display sensor data

I am new to Azure. I have sensors and would like to send data from sensors to the Azure backend, preferably to a database. After collecting those sensor data I would like to display them on a dashboard. I wonder if there is a sample tutorial or source code to implement such a solution. Hope you can help me.
Thanks in advance & Best Regards.
The Azure IoT Suite is an accelerator that configures a solution using standard Azure services and each one comes with a dashboard. The source code is available on GitHub: Remote Monitoring and Predictive Maintenance
There are multiple ways of achieving this as Azure is not a final product but consists of different "modules" if you will.
If the idea is to create a dashboard that shows the sensor data, you don't necessarily need to store them into a database. You can create live streams and display them. If the storage of data is also of concern, then in parallel you can do that as well.
The redirection logic of the data here would be Stream Analytics, it works with the concepts of an input sink, a query and an output sink. Now in your case you might want to create an Event Hub/ IoT Hub, then use it as an input sink to the Stream Analytics, and use PowerBI as the output sink. This will get the data and display it in PBi. If you also want to store the data, you can add another output sink for different options like , blob storage, document db, azure sql database etc.
Also there is an Azure IoT suite remote monitoring solution, that automates most of those tasks with some extra modules, you can use that to create a solution and use it as a boilerplate.
Below are step-by-step tutorials for;
Event Hubs
Stream analytics
PowerBI
Hope this helps!
Mert

Azure Storm vs Azure Stream Analytics

Looking to do real time metric calculations on event streams, what is a good choice in Azure? Stream Analytics or Storm? I am comfortable with either SQL or Java, so wondering what are the other differences.
It depends on your needs and requirements. I'll try to lay out the strengths and benefits of both. In terms of setup, Stream Analytics has Storm beat. Stream Analytics is great if you need to ask a lot of different questions often. Stream Analytics can also only handle CSV or JSON type data. Stream Analytics is also at the mercy of only sending outputs to Azure Blob, Azure Tables, Azure SQL, PowerBI; any other output will require Storm. Stream Analytics lacks the data transformation capabilities of Storm.
Storm:
Data Transformation
Can handle more dynamic data (if you're willing to program)
Requires programming
Stream Analytisc
Ease of Setup
JSON and CSV format only
Can change queries within 4 minutes
Only takes inputs from Event Hub, Blob Storage
Only outputs to Azure Blob, Azure Tables, Azure SQL, PowerBI
If you are looking for versatility over flexibility. I'd go with Stream Analytics, if you require specific operations that are limited by Stream Analytics, it's worth looking into Spark, which gives you data persistence options. On the Stream Analytics outputs side, one interesting thing would be to output into an Event Hub and consume it from there giving you unlimited flexibility on how you want to consume the data.
Below is the output options for Stream Analytics and the link for Apache Spark on Azure
Hope this helps.

Resources