Azure Storm vs Azure Stream Analytics - azure

Looking to do real time metric calculations on event streams, what is a good choice in Azure? Stream Analytics or Storm? I am comfortable with either SQL or Java, so wondering what are the other differences.

It depends on your needs and requirements. I'll try to lay out the strengths and benefits of both. In terms of setup, Stream Analytics has Storm beat. Stream Analytics is great if you need to ask a lot of different questions often. Stream Analytics can also only handle CSV or JSON type data. Stream Analytics is also at the mercy of only sending outputs to Azure Blob, Azure Tables, Azure SQL, PowerBI; any other output will require Storm. Stream Analytics lacks the data transformation capabilities of Storm.
Storm:
Data Transformation
Can handle more dynamic data (if you're willing to program)
Requires programming
Stream Analytisc
Ease of Setup
JSON and CSV format only
Can change queries within 4 minutes
Only takes inputs from Event Hub, Blob Storage
Only outputs to Azure Blob, Azure Tables, Azure SQL, PowerBI

If you are looking for versatility over flexibility. I'd go with Stream Analytics, if you require specific operations that are limited by Stream Analytics, it's worth looking into Spark, which gives you data persistence options. On the Stream Analytics outputs side, one interesting thing would be to output into an Event Hub and consume it from there giving you unlimited flexibility on how you want to consume the data.
Below is the output options for Stream Analytics and the link for Apache Spark on Azure
Hope this helps.

Related

Grafana as Azure Stream Analytics output

I am pushing events to my Event hub, then this data is being analyzed in Azure Stream Analytics. I'd like to visualize output from stream analytics in Grafana.
What is the easiest approach to achieve this?
Azure Stream Analytics job can natively ingest the data into Azure Data Explorer. https://techcommunity.microsoft.com/t5/azure-data-explorer/azure-data-explorer-is-now-supported-as-output-for-azure-stream/ba-p/2923654
You can then use the Azure Data Explorer plugin in Grafana. https://techcommunity.microsoft.com/t5/azure-data-explorer/azure-data-explorer-is-now-supported-as-output-for-azure-stream/ba-p/2923654
Another option is to use Power BI instead of Grafana. https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-power-bi-dashboard
If I remember correctly, Grafana doesn't store data locally, you need to define a data source on top of one of the compatible storage systems.
Azure Stream Analytics doesn't come with a storage layer either, it's compute only.
So if you want to use ASA and Grafana, you need to output data from ASA to a data source that is supported by Grafana in ingress.
Looking at both lists that leaves only MSSQL via Azure SQL (hopefully it's compatible) as a native option. It's not a bad option for narrow dashboards, or if you intend to store your data in a RDBMS anyway. You can store your entire payload in an NVARCHAR(MAX) if you don't plan to consume the data in SQL.
But being clever, we can actually use the Functions output to write to any other store, or call any API. I'm not sure if Grafana has a direct ingestion API, but Azure Monitor does and it's a supported data source in Grafana.
The other option would be to go through ADX as explained in the other answer.
Not straightforward but doable ;)

Azure Data Explorer (ADX) vs Polybase vs Databricks

Question
Today I discovered another Azure service called Azure Data Explorer (ADX). Sorry for such comparison of services, I have good understanding of all except ADX. I feel like there is a big functionality overlay, so want to know the exact role of ADX in Azure infrastructure.
What is the use case when ADX is significantly better than Synapse/Databricks?
My understanding of ADX
AFAIK, ADX is a cluster (with per hour billing, like Databricks or Synapse, not like ADLA) that is handling database for you and is optimized for streaming ingestion and ad-hoc queries at scale. It also supports external tables, that has worse performance but cheaper (you pay for Blob/ADLS storage).
Details
I don't understand why do we need ADX if:
Azure Synapse has similar pricing model (cluster, per-hour), also it supports streaming ingestion and ad-hoc querying at scale. Azure Synapse support querying BlobStorage/ADLS through Polybase external tables.
Databricks is another service that is capable of doing it. Using Databricks Ingest and Delta Lake - you can ingest streaming data and consume them in both: streaming and batching way. Actually you can have interactive cluster that will handle ad-hoc queries for you.
Also if you want a real-time analytics - use Azure Stream Analytics. If you want Athena-like experience - use ADLA (still it doesn't support ADLS gen2).
Azure Data Explorer is focused on high velocity, high volume high variance (the 3 Vs of big data). It provides super fast interactive queries over such data that is streaming in. It supports json and text natively, including full text search and indexing.
It is used in a broad set of scenarios associated with sensing activity and time series in a large set of verticals: IoT, API logs, transaction monitoring and ad hoc data exploration.
Microsoft is offering ADX as a service as it is the major service that Microsoft is using for its own telemetry and all the analytical solutions as a service that we offer in Security, operational monitoring, game analytics, product insights usage analytics, Iot, Connected vehicles is built on ADX. You can find a full list in our docs. For clarity, SQL, Synapse, CosmosDB is storing its telemetry in Azure Data explorer...
SQL DW (AKA Synapse SQL pool) is an excellent data warehouse and implements the modern data warehouse pattern. ETL->Curated data model-> Load and serve via analysis services or power BI.
ADX is for real time analytics, enabling applying schema on read (SOR) on data as fresh as seconds old.
Consider ADX as a fully managed platform when replacing SOLR/Lucine based variants used for logs, time series databases and more.
Try it out in large workloads and you will see it is dramatically cheaper than the alternatives and much more powerful and performant.
Reach out to me if you need help.
Azure Data Explorer alias Kusto is focused on high volume data ingestion and almost real-time query and analytics. It is invented at Microsoft for log and telemetry analytics, but can be used for other purposes e.g. Iot, sensor data or web analytics. Same technology is used in Azure internal services like Azure Monitor and Log Analytics.
Similar capabilities could be build on Synapse or Databricks or HDInsight, but I see these as tools that fit much more broad use-cases. ADX has quite narrow focus. ADX does support queries (”KQL”) but has very limited SQL support. It is good for append only data, not for updates. It is not a data warehouse, database or data lake.
Microsoft material refers to the technology behind ADX with name Kusto. More info on this at https://learn.microsoft.com/en-us/azure/data-explorer/kusto/concepts/. A good comparison of services can be found in this blog post: https://vincentlauzon.com/2020/02/19/azure-data-explorer-kusto

Combine static and real time data in Azure Stream Analytics

I am looking into combining data (stored in Azure SQL) and real-time stream data (coming via IoT Hub) in Stream Analytics. One way I found is to use blob storage to copy the SQL Azure data and use it as Input type "Reference Data" and in Stream Analytics query editor JOIN with the streaming data which works fine. However, I am looking into whether it is possible to use JavaScript UDF function capability in stream analytics to get data from SQL Azure and combine with streaming IoT data? I also don't know which one is the suggested approach to combining these type of data together?
Thanks
UDFs in streaming analytics won't allow you to call out to external services like SQL. They're used for things like basic data manipulation, regex, Math, etc. If your SQL data is slow moving in nature, the approach you've outlined here of using something like Data Factory to move SQL information into Blob storage and then use it as a Reference data inside your Stream Analytics query is the correct way (and only way currently) to solve your problem.
If it's fast moving data in SQL you'd want to investigate hooking into the SQL database changes and then publishing them on to Event Hubs. You could then pull this into your query as a second Data Stream input type and do the appropriate joins in your query.

Azure IOT Suite how to create a dashboard to display sensor data

I am new to Azure. I have sensors and would like to send data from sensors to the Azure backend, preferably to a database. After collecting those sensor data I would like to display them on a dashboard. I wonder if there is a sample tutorial or source code to implement such a solution. Hope you can help me.
Thanks in advance & Best Regards.
The Azure IoT Suite is an accelerator that configures a solution using standard Azure services and each one comes with a dashboard. The source code is available on GitHub: Remote Monitoring and Predictive Maintenance
There are multiple ways of achieving this as Azure is not a final product but consists of different "modules" if you will.
If the idea is to create a dashboard that shows the sensor data, you don't necessarily need to store them into a database. You can create live streams and display them. If the storage of data is also of concern, then in parallel you can do that as well.
The redirection logic of the data here would be Stream Analytics, it works with the concepts of an input sink, a query and an output sink. Now in your case you might want to create an Event Hub/ IoT Hub, then use it as an input sink to the Stream Analytics, and use PowerBI as the output sink. This will get the data and display it in PBi. If you also want to store the data, you can add another output sink for different options like , blob storage, document db, azure sql database etc.
Also there is an Azure IoT suite remote monitoring solution, that automates most of those tasks with some extra modules, you can use that to create a solution and use it as a boilerplate.
Below are step-by-step tutorials for;
Event Hubs
Stream analytics
PowerBI
Hope this helps!
Mert

Azure blob storage and stream analytics

I read what in azure blob very nice save some data for statistics or something else, after it create requests for blob and show statistics to website (dashboard).
But I don't know how to use stream analytics for showing statistics. It is some SDK for create query to blob and generate josn data. Or ... I don't know.
And I have more question about it:
How to save data to blob (it is json data or something else). I don't
know format data for it issue.
How to use stream analytics for create request to blob and after it get data for showing in dashboard.
And maybe you know how to use this technology. Help me please. Thanks, and have a nice day.
#Taras - did you get a chance to toy with the Stream Analytics UI?
When you add a blob input you can either add an entire container - which means Stream Analytics will scan the entire container for new files or you can specify a path prefix pattern which will make Stream Analytics look in only that path.
You can also specify tokens such as {date}, {time} on the path prefix pattern to help guide Stream Analytics on the files to read.
Generally speaking - it is highly recommended to use Event Hub as input for the improved latency.
As for output - you can either use Power BI which would give you an interactive dashboard or you can output to some storage (blob, table, SQL, etc...) and build a dashboard on top of that.
You can also try to do one of the walkthroughs to get a feel for Stream Analytics: https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-twitter-sentiment-analysis-trends/
Thanks!
Ziv.

Resources