Using Azure Metrics Advisor with workout data - azure

So I saw Azure Metrics advisor in Microsoft Ignite this year and decided to try to using it against some workout data that I had from Endomondo.
I uploaded 4 years of running data (only about 400 records) into an Azure SQL instance.
Then, I ingested that as a data feed into an Azure Metrics Advisor instance.
Literally, I'm just taking metrics advisor for a spin, so now I want to use this historical data for new running data.
I'd like when I upload a new workout I can run it against this data and have anomalies in either my distance or duration detected.
Does that use case make sense

Related

Is Azure Synapse is a good choice for Time Series Data?

We are in the process of analyzing which database will be the best choices for Time Series data (like stock market data / trading data, market sentiments ..etc.)
Is Azure Synapse is a good choice for Time Series Data?
Azure Synapse data explorer (Preview) provides you with a dedicated query engine optimized and built for log and time series data workloads.
With this new capability now part of Azure Synapse's unified analytics platform, you can easily access your machine and user data to surface insights that can directly improve business decisions.
To complement the existing SQL and Apache Spark analytical runtimes, Azure Synapse data explorer is optimized for efficient log analytics, using powerful indexing technology to automatically index structured, semi-structured, and free-text data commonly found in telemetry data.
For more info please refer to below related articles:
https://learn.microsoft.com/en-us/azure/synapse-analytics/data-explorer/data-explorer-overview
Time series solution - Azure Architecture
Please note that the feature is in public preview.

batch processing in azure

We are planning to do batch processing on a daily basis. We generate 1 GB of CSV files every day and will manually put them into Azure Data Lake Store. I have read the Microsoft Azure documents regarding the batch processing and I have decided to use Spark as to batch processing. My question is that after we transfer the data using RDD/DF what would be the next step? how we can visualize the data? since this process is supposed to be run every day, once the data transformation done using Spark, do we need to push the data to any kind of data store like hive hdfs or cosmos before we could visualize it?
There are several options doing this on Azure. It really depends on your requirements (e.g. number of users, needed visualizations, etc). Examples for doing it:
Running Spark on Azure Databricks, you could use the Notebook capabilities to visualize your data
Use HDInsight with Jupyter or Zeppelin Notebooks
Define Spark tables on Azure Databricks and visualize them with Power BI
Load the data with Azure Data Factory V2 to Azure SQL DB or Azure SQL Data Warehouse and visualize it with Power BI.
For Time-Series-Data you could push the data via Spark to Azure EventHubs (see Example notebook with Eventhubs Sink in the following documentation) and consume it via Azure Time Series Insights. If you have an EventData-Stream this could also replace your batch oriented architecture in the future. Parquet files will be used by Azure Time Series Insights as Long-term Storage (see the following link). For Spark also have a look at Time Series Package which adds some time series capabilities to spark.

Batch processing with spark and azure

I am working for an energy provider company. Currently, we are generating 1 GB data in form of flat files per day. We have decided to use azure data lake store to store our data, in which we want to do batch processing on a daily basis. My question is that what is the best way to transfer the flat files into azure data lake store? and after the data is pushed into azure I am wondering whether it is good idea to process the data with HDInsight spark? like Dataframe API or SparkSQL and finally visualize it with azure?
For a daily load from a local file system I would recommend using Azure Data Factory Version 2. You have to install Integration Runtimes on Premise (more than one for High Avalibility). You have to consider several security topics (local firewalls, network connectivity etc.) A detailed documentation can be found here. There are also some good Tutorials available. With Azure Data Factory you can trigger your upload to Azure with a Get-Metadata-Activity and use e. g. an Azure Databricks Notebook Activity for further Spark processing.

Azure storage for saving tweets and related information

I am planning to write a windows service that comsumes the twitter streaming api to save tweets and related information (sentiment score, twitter-user, date-of-creation) of a specific topic into an azure storage. I need a way to query these information later like "show me the average sentiment score of tweets in the last 24h" therefore SQL or LINQ must be available.
Some numbers:
Number of tweets saved per day approx. 20.000
Save data for 3 month (20.000 tweets * 90 days)
Data saved: tweet text (140 chars), sentiment score, twitter user name, date (maybe some more properties)
Saving frequency: Since I am using the streaming api, I get tweets in real time which have to be saved into the storage.
Query frequency: About every 30 minutes.
I wonder which Azure Storage is suited for this purpose. I think I have to decide between Azure Table Storage and SQL database.
There are two things to consider choosing between these 2
1. Price:
SQL Azure: check the calculator: https://azure.microsoft.com/en-us/pricing/calculator/
Storage table: check the calculator: https://azure.microsoft.com/en-us/pricing/calculator/?service=storage
You should consider Capacity and Transactions and the service tier to see which one is cheaper...
2. Performance:
If you design it right, in many cases table storage should be faster than Sql Azure because of its no-sql/denormalized nature but it probably depends on the queries that you are going to write for it.
In SQL Azure you will use TSQL but in Table storage you will use C# and Linq to query the data...
if you look at #David's comment below there will be limitations based on the queries you are interested in if you use Table Storage, so you have to be aware of those limitations in Table Storage as well ...

How well does Azure SQL Data Warehouse scale?

I want to replace all my on-prem DW on SQL Server and use Azure SQL DW. My plan is to remove the spoke and hub model that I currently use for my on-prem SQL and basically have a large Azure SQL DW instance that scale with my client base (currently at ~1000). Would SQL DW scale or I need to retain my spoke and hub model?
Azure SQL Data Warehoue is a great choice for removing your hub and spoke model. The service allows you to scale storage and compute independently to meet the compute and storage needs of your data warehouse. For example, you may have a large number of data sets that are infrequently accessed. SQL Data Warehouse allows you to have a small number of compute resources (to save costs) and using SQL Server features like table partitioning access only the data in the "hot" partitions efficiently - say the last 6 months.
The service offers the ability to adjust the compute power by moving a slider up or down - with the compute change happening in about 60 seconds (during the preview). This allows you to start small - say with a single spoke - and add over time making the migration to the cloud easy. As you need more power, you can simply add DWU/Compute resources by moving the slider to the right.
As the compute model scales, the number of query and loading concurrency slots increase offering you the ability to support larger numbers of customers. You can read more about Elastic performance and scale with SQL Data Warehouse on Azure.com

Resources