Good architecture for Azure for streaming analytics? [closed] - azure

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have JSON data coming from sensors per seconds to Azure IoT hub. Data is time series with 15 variables. I want to process this data real time using c# application which is quite complex and send outputs events to some other service(can be storage or PowerBI)
What do you think is the best architectural approach for it?
1. Try to process the data in stream analytics with c# code, I know there is .Net support for azure stream analytics but i think is very premature? Any experience in this approach?Does azure stream analytics support complex c# algorithms?
2. Store data to azure data lake and use data lake analytics to process the data?
Your experiences and recommendations are very much appreciated.
Many thanks

Try to process the data in stream analytics with c# code
Azure Stream Analytics uses Stream Analytics Query Language to perform transformations and computations over streams of events. The C# SDK is just a way to create and run a Stream Analytics job. All the transformations and computations work should be written in Stream Analytics Query Language.
Store data to azure data lake and use data lake analytics to process the data?
Stream Analytics is better in real-time data handle scenarios. I suggest you combine these 2 ways together. Use Azure Stream Analytics to do a preliminary and necessary data processing and conversion and output the data to azure data lake and use data lake analytics to further process the data.

If you're open to alternative solutions you could also use a HTTP API like Stride, which enables the creation of networks of continuous SQL queries, chained together, with the ability to subscribe to streams of changes as a means to streaming data out to applications.
If your computational needs fit within the confines of SQL this approach might work well for you. You can check out the Stride docs to see some examples.

Related

ADF, Azure Function or Hybrid [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 days ago.
Improve this question
I need to download data from several APIs some using basic REST some using GraphQL, parse the data and map some of the fields to various Azure SQL tables discarding the unneeded data (will be later visualised in PowerBI).
I started using Azure Data Factory but got frustrated with the lack of simple functions like converting json field containing html into text.
I then looked at Azure Functions, thought Python (although I’m open to NodeJS) however I’ve got a lot of data to download and upSert into the database and there is mentions on the internet the ADF is the most efficient to bulk upSert data.
Then I thought ADF using function to get the data and ADF to bill copy.
So my question is what should I be using for my use case? I’m open to any suggestions but it needs to be on Azure and cost sensitive. The ingestion needs to run daily upSerting around 300,000 records.
I think this pretty much comes down to taste, as you can probably solve this entirely only using ADF or an azure function, depending on more specific circumstances of your case. In my personal experience I've often ended up using the hybrid variant because I can be easier due to more flexibility compared to the standard API components of ADF, doing the extraction from an API using an azure function, storing the data in blob storage/data lake and then loading the data into a database using ADF. This setup can be pretty cost effective from my experience, depending on if you can use an azure function consumption plan (cheaper than alternatives) and/or can void using data flows in ADF (a significant cost driver in ADF)

Select best Azure storage for visualization and analysis

I am writing tool to analyze data coming from a race simulator, I have two use cases:
Display live telemetry on a chart - so mostly visualization of incoming stuff, to detect manually anomalies
Calculate own metrics, analyze data and suggest actions based on them - this can be done after a session, doesn't have to be calculated live. Now I am focusing solely on storing data but I have to keep in mind that later it needs to be analyzed.
I was thinking about utilizing Event Hub to handle streaming of events, question is how to visualize data in the easiest way and what's the optimal storage for second use case - it has to be big data solution I believe, there will be many datapoints to analyze.
For visualization you can use Power Bi or another visualization tool running on containers.
For storing, you can go with Azure Time Series Insights or just sink from Event Hubs to Azure Cosmos DB and then, connect power bi on it to create your charts.
https://learn.microsoft.com/en-us/azure/time-series-insights/overview-what-is-tsi

How can I decide, if I should use the Power BI API to push data into my streaming dataset or Azure Stream Analytics?

I am very new to Azure. I need to create a Power BI dashboard to visualize some data produced by a sensor. The dashboard needs to get updated "almost" real-time. I have identified that I need a push data set, as I want to visualize some historic data on a line chart. However, from the architecture point of view, I could use the Power BI REST APIs (which would be completely fine in my case, as we process the data with a Python app and I could use that to call Power BI) or Azure Stream Analytics (which could also work, I could dump the data to the Azure Blob storage from the Python app and then stream it).
Can you tell me generally speaking, what are the advantages/disadvantages of the two approaches?
Azure stream analytics lets you have multiple sources and define multiple targets and one of those targets could be Power-BI and Blob ... and at the same time you can use windowing function on the data as it comes in. It also provides you a visual way of managing your pipeline including windowing function.
In your case you are kind of replicating the incoming data to Blob first and secondly to power-BI. But if you have a use case to apply windowing function(1 minutes or so) as your data is coming in from multiple sources e.g. more than one sensor or a senor and other source, you have to fiddle around a lot to get it working manually, where as in stream analytics you can easily do it.
Following article highlights some of the pros and cons of Azure Analytics...
https://www.axonize.com/blog/iot-technology/the-advantages-and-disadvantages-of-using-azure-stream-analytics-for-iot-applications/
If possible, I would recommend streaming data to IoT Hub first, and then ASA can pick it up and render the same on Power BI. It will provide you better latency than streaming data from Blob to ASA and then Power BI. It is the recommended IoT pattern for remote monitoring, predictive maintenance etc , and provides you longer term options to add a lot of logic in the real-time pipelines (ML scoring, windowing, custom code etc).

Real time Streaming data into Azure Datawarehouse from sql server

I'm trying to build a real-time reporting service on top of Microsoft Azure Data Warehouse. Currently I have a SQL server with about 5 TB of data. I want to stream the data to the data warehouse and use Azure DW's computation power to generate real-time reporting based on data. Is there any ready to use/best practices to do that?
One approach I was considering is to load data into Kafka and then stream it into Azure DW via Spark streaming. However, this approach is more near-real-time than real-time. Is there any way to utilize SQL Server Change Data Capture to stream data into data warehouse?
I don't personally see Azure SQL Data Warehouse in a real-time architecture. It's a batch MPP system optimised for shredding billions of rows over multiple nodes. Such a pattern is not synonymous with sub-second or real-time performance, in my humble opinion. Real-time architectures tend to look more like Event Hubs > Stream Analytics in Azure. The low concurrency available (ie currently a max of 32 concurrent users) is also not a good fit for reporting.
As an alternative you might consider Azure SQL Database in-memory tables for fast load and then hand off to the warehouse at a convenient point.
You could Azure SQL Data Warehouse in a so-called Lambda architecture with a batch and real-time element, where is supports the batch stream. See here for further reading:
https://social.technet.microsoft.com/wiki/contents/articles/33626.lambda-architecture-implementation-using-microsoft-azure.aspx
If you’re looking for a SQL-based SaaS solution to power realtime reporting applications we recently released a HTTP API product called Stride, which is based on the open-source streaming-SQL database we build, PipelineDB, that can handle this type of workload.
The Stride API enables developers to run continuous SQL queries on streaming data and store the results of continuous queries in tables that get incrementally updated as new data arrives. This might be a simpler approach to adding the type of realtime analytics layer you mentioned above.
Feel free to check out the Stride technical docs for more detail.

Application insight -> export -> Power BI Data Warehouse Architecture

Our team have just recently started using Application Insights to add telemetry data to our windows desktop application. This data is sent almost exclusively in the form of events (rather than page views etc). Application Insights is useful only up to a point; to answer anything other than basic questions we are exporting to Azure storage and then using Power BI.
My question is one of data structure. We are new to analytics in general and have just been reading about star/snowflake structures for data warehousing. This looks like it might help in providing the answers we need.
My question is quite simple: Is this the right approach? Have we over complicated things? My current feeling is that a better approach will be to pull the latest data and transform it into a SQL database of facts and dimensions for Power BI to query. Does this make sense? Is this what other people are doing? We have realised that this is more work than we initially thought.
Definitely pursue Michael Milirud's answer, if your source product has suitable analytics you might not need a data warehouse.
Traditionally, a data warehouse has three advantages - integrating information from different data sources, both internal and external; data is cleansed and standardised across sources, and the history of change over time ensures that data is available in its historic context.
What you are describing is becoming a very common case in data warehousing, where star schemas are created for access by tools like PowerBI, Qlik or Tableau. In smaller scenarios the entire warehouse might be held in the PowerBI data engine, but larger data might need pass through queries.
In your scenario, you might be interested in some tools that appear to handle at least some of the migration of Application Insights data:
https://sesitai.codeplex.com/
https://github.com/Azure/azure-content/blob/master/articles/application-insights/app-insights-code-sample-export-telemetry-sql-database.md
Our product Ajilius automates the development of star schema data warehouses, speeding the development time to days or weeks. There are a number of other products doing a similar job, we maintain a complete list of industry competitors to help you choose.
I would continue with Power BI - it actually has a very sophisticated and powerful data integration and modeling engine built in. Historically I've worked with SQL Server Integration Services and Analysis Services for these tasks - Power BI Desktop is superior in many aspects. The design approaches remain consistent - star schemas etc, but you build them in-memory within PBI. It's way more flexible and agile.
Also are you aware that AI can be connected directly to PBI Web? This connects to your AI data in minutes and gives you PBI content ready to use (dashboards, reports, datasets). You can customize these and build new reports from the datasets.
https://powerbi.microsoft.com/en-us/documentation/powerbi-content-pack-application-insights/
What we ended up doing was not sending events from our WinForms app directly to AI but to the Azure EventHub
We then created a job that reads from the eventhub and send the data to
AI using the SDK
Blob storage for later processing
Azure table storage to create powerbi reports
You can of course add more destinations.
So basically all events are send to one destination and from there stored in many destinations, each for their own purposes. We definitely did not want to be restricted to 7 days of raw data and since storage is cheap and blob storage can be used in many analytics solutions of Azure and Microsoft.
The eventhub can be linked to stream analytics as well.
More information about eventhubs can be found at https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/
You can start using the recently released Application Insights Analytics' feature. In Application Insights we now let you write any query you would like so that you can get more insights out of your data. Analytics runs your queries in seconds, lets you filter / join / group by any possible property and you can also run these queries from Power BI.
More information can be found at https://azure.microsoft.com/en-us/documentation/articles/app-insights-analytics/

Resources