I know that Application Insights store data for only some period of time.
What do I need to do if I want to do analysis on a long time-period, let's say a year?
I know we have continuous export thing and as suggested, we can use power BI on the stored data in BLOBs but that will have a cost associated with it. Another way is code and transform that JSON data in BLOBs to some excel representations.
Is there any other way apart from these two for doing analysis in long time-periods of AI data? Something that picks data stored in BLOBs and uses that to show analytics?
There are two things being developed - ability to specify different retention period (1 year will be more expensive) and ability to do analytics query on top of blobs. Unfortunately, both of them are not yet available.
Will be providing updates to this answer.
Update: It is possible to specify a retention for Application Insights resources.
Related
We are storing our Windows/Linux VM metrics and logs into Azure diagnostics storage account for long term retention. We keep this data in Log Analytics as well but being cost conscious we keep only the minimal essential set and for 1 month. However it seems there is no way to efficiently query the Table storage data when we need it - e.g. checking historical cpu usage for a particular machine over a specific period in the past, or checking the logs captured during that period. The partition key and row key is highly convoluted with some very basic help available for the WAD tables schema while none exist for LinuxsyslogVer2v0 table schema. I was curious if anyone else using the diagnostic logs table storage for any querying/reporting? If so how do you query these for a specific host and time period? I can do a querying using non primary/row key but besides being time consuming it will cost a hell eventually considering that will be a table scan. Really appreciate any advice.
You should consider using Azure Data Explorer (ADX) for your long-term storage solution. It allows for KQL queries on your long-term data and is the preferred method for keeping log/security data past the default for services like LogA and Sentinel.
The pricing page for ADX can be a bit confusing and there is a website to help you estimate costs here: https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
By default, logs ingested into Azure Sentinel are stored in Azure Monitor Log Analytics. This article explains how to reduce retention costs in Azure Sentinel by sending them to Azure Data Explorer for long-term retention.
Storing logs in Azure Data Explorer reduces costs while retains your ability to query your data, and is especially useful as your data grows. For example, while security data may lose value over time, you may be required to retain logs for regulatory requirements or to run periodic investigations on older data.
https://learn.microsoft.com/en-us/azure/sentinel/store-logs-in-azure-data-explorer?tabs=adx-event-hub
I am working on modernizing a reporting solution where the data sources are on prem on the customers' sql servers (2014) and the reports are displayed as Power BI reports on the customer's Power BI Service portal. Today I use SSIS to build a data warehouse, as well as an on premise data gateway to ensure the transport of data up to an Azure analysis services which in turn are used by the Power BI reports.
I have been wondering if I could use Azure Synapse to connect to customer data and in a most cost effective way transport data to Azure and link them to the Power BI workspace as a shared dataset. There are many possibilities, but it is important that the customer experiences that the reports are fast and stable, and if possible can cope with near real time.
I experience SSIS being cumbersome and expensive in azure. Are there mechanisms that make it cheap and fast to get data in azure? Do I need a data warehouse (Azure SQL database) or is it better to use data lake as storage for data? Needs to do incremental load too. And what if I need to do some transformations? Should I use Power BI dataflow or do I need to create Azure Data flows to achieve this?
Does anyone have good experience to use synapse (also with DevOps in mind) and get a good DEV, TEST and Prod environment for this? Or is using Synapse a cost driver and a simpler implementation will do? Give me your opinions and if you have links to good articles, please do so. Looking forward to hear from you
regards Geir
The honest answer is it depends on a lot of different things and I don't know that I can give you a solid answer. What I can do is try to focus down which services might be the best option.
It is worth noting that a Power BI dataset is essentially an Analysis Services database behind the scenes, so unless you are using a feature that is specifically only available in AAS and using a live connection, you may be able to eliminate that step. Refresh options are one of the things that are more limited in Power BI though, so the separate AAS DB might be necessary for your scenario.
There is a good chance that Power BI dataflows will work just fine for you if you can eliminate the AAS instance, and they have the added advantage of have incremental refresh as a core feature. In that case, Power BI will store the data in a data lake for you.
Synapse is an option, but probably not the best one for your scenario unless your dataset is large, SQL pools can get quite expensive, especially if you aren't making use of any of the compute options to do transformations.
Data Factory (also available as Synapse pipelines) without the SSIS integration is generally the least expansive option for moving large amounts of data. It allows you to use data flows to do some transformations and has things like incremental load. Outputting to a Data Lake is probably fine and the most cost effective, though in some scenarios something like an Azure SQL instance could be required if you specifically need some of those features.
If they want true real time, it can be done, but none of those tools really are built for it. In most cases the 48 refreshes per day (aka every 30 minutes) available on a Premium capacity are close enough to real time once you dig into the underlying purpose of a given report.
For true real time reporting, you would look at push and/or streaming datasets in Power BI and feed them with something like a Logic App or possibly Stream Analytics. There are a lot of limitations with push datasets though- more than likely you would want to set up a regular Power BI report and dataset and then add the real time dataset as a separate entity in addition to that.
As far as devops goes, pretty much any Azure service can be integrated with a pipeline. In addition to any code, any service or service settings can be deployed via an ARM template or CLI script.
Power BI has improved in the past couple years to have much better support for devops and dev/test/prod environments. Current best practices can be found in the Power BI documentation: https://learn.microsoft.com/en-us/power-bi/create-reports/deployment-pipelines-best-practices
We have legacy applications that currently write out various run time metrics (SQL calls run time, api / http request run times etc) to local SQL DB.
format:( source, event, data, executionduration)
We are moving away from storing those in local SQL DB, and are now publishing those same metrics to azure event hub.
Looking for a good place to store those metrics for the purpose of monitoring the health of the application. Simple solution would be to store in some DB and build custom application to visualize the data in custom ways.
We are also considering using Azure Monitor for this purpose via data collector API (https://learn.microsoft.com/en-us/azure/azure-monitor/platform/data-collector-api)
QUESTION: Are there any issues with azure monitor that would prevent us from achieving this type of health monitoring?
Details
each event is small (few hundred characters)
expecting ~ 10 million events per day
retention of 1-2 days is enough
ability to aggregate old events per source per event is important (to have historical run time information)
Thank you
You can do some simple graphs and with the Log Analytics query language, you can do just about any form of data analytics you need.
Here's a pretty good article on Monitor Visualizations.
learn.microsoft.com/en-us/azure/azure-monitor/log-query/charts
We run a software application on azure for one of our customers. The customer want to see the performance of the systems. This consist of two parts. One is the metric information of the servers and they also want to see some information I want to provide by custom logging.
My plan is to give the customer access to the portal and only allow him access to the metric information and the custom tables.
It seems to me that by assigning a role to the customer I should be able to block all the other possibilities.
Does someone can me tell which actions I have to allow/forbid to achieve this? Or were I can find the information for this?
Solution #1
Instead of giving Read access to the virtual machine which may breaks security policy, I'd recommend to go with Azure Log Analytics (ref: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-overview
) workspace. That said, you will need to create a workspace which collects and stores server metrics (ref: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-quick-collect-windows-computer) and other custom metrics.
Your customer will be given access to the workspace only which he can see all metrics in a dashboard. If there is a need for log filtering, you can use Log Analytics query language (ref: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-log-search-transition)
Log Analytics is a paid service. You are given free up to 10 workspaces per subscription. The workspace is considered an Azure resource so the limit follows by subscription limit, which means you can create up to 800 workspaces per a resource group. A subscription can allow 800 * 800 (for reference if you would like to do capacity planning for your workspace-based solution). For Log Analytics pricing, read here (https://azure.microsoft.com/en-us/pricing/details/log-analytics/).
Log Analytics is a good choice as its value proportion is to offer your customer intuitive dashboard to monitor their virtual machine performance, and to offer Near Real Time monitoring. And this solution is a cloud native compatibility.
There is a management solution which offers a bundle of VM capacity and performance monitoring which you can try now https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-capacity
Solution #2
Log Analytics might not be your choice because it might add more Azure service and operational cost. If you need a cheaper cost, you would need to collect your virtual machine by Performance Counter which is a built-in feature in Windows OS. With Performance Counter you can export to Excel file, or visualize into Power BI or some custom chart.
Other Solutions
You can utilize Azure Monitor and API to get data, For example, this API https://learn.microsoft.com/en-us/rest/api/monitor/metricdefinitions/list. You would certainly need to visualize or format in some intuitive way to satisfy your customer. It can be a custom front-end web, or Power BI or even Excel with chart.
You can just query to Azure Blob Storage and use Stream Analytics combining with Power BI to visualize your data (https://thuansoldier.net/?p=7187).
There is not a single solution. This really depends on your existing resource capacity, financial stuff or so on.
I am building a service on azure and wanted to know if there is any way to know how much resources (data downloaded or uploaded, time required to do the processing) a customer has used in a given session and what level of services they have used in order to bill them accordingly. We expose the whole framework as a service, this consists of various small levels of services, like reading the data from some external FTP server, downloading it to blob, reading the file downloaded and storing them in tables and performing some operations the data in the table, email some results from service required by the user, etc.
So, depending on what all services the customer has used, we would like to bill them accordingly.
Thanks!
The only Azure specific function that I can think of that will help you with what you want to track is Azure Storage Logging which will track each and every request to Azure Storage, I'm not sure how much that is going to help you though.
I think you will have to decide exactly what you want to bill your customers for and then start tracking that yourself. This might be items similar to what MS charges you for (tracking the size of incoming requests, counting the number of transactions and the size of data stored to Azure Storage) or maybe just some arbitrary values based on some of this information