I have Event logs of Application Insights where events are logged and stored as json in text files stored in a blob storage. I need to find those jsons where a customProperty meets a criteria. The number of hit jsons are very less (around 10 or 20), however the data logged is very large. Any suggestions how this can be accomplished efficiently?
I have read in Microsoft documentation that HDInsights understand blob storage and is efficient. Is this relevant in my scenario? If so, could someone provide some starting points.
HDInsight, being a Hadoop-compliant implementation, is a good technology for logs analysis. It is being stated on the AppInsighs page about telemetry as well.
"On larger scales, consider HDInsight - Hadoop clusters in the cloud. HDInsight provides a variety of technologies for managing and analyzing big data."
On the same page, you may find the information about continuous export of AppInsights telemetry into the Azure Blobs storage.
Next step could be to use HDInsight for the analysis of that, but it will need you to implement some kind of algorithm.
For uploading the data to the HDInsight from Azure Blob you may see that link (and this for querying).
For understanding on the logs processing pipeline, which is a common task for Hadoop/HDInsight, some walkthroughs and manuals are available, for example this. But you will need to adjust this algorithm to your scenario.
In case of Application Insights there is another option. New analytic tool Application Insights Analytics has been introduced.
https://blogs.msdn.microsoft.com/bharry/2016/03/28/introducing-application-analytics
This tool also alows you to work with all logged data using the specific language:
requests
| where timestamp >= ago(24h)
| summarize count() by client_CountryOrRegion
| order by count_ desc
You can export data you need.
Related
We are storing our Windows/Linux VM metrics and logs into Azure diagnostics storage account for long term retention. We keep this data in Log Analytics as well but being cost conscious we keep only the minimal essential set and for 1 month. However it seems there is no way to efficiently query the Table storage data when we need it - e.g. checking historical cpu usage for a particular machine over a specific period in the past, or checking the logs captured during that period. The partition key and row key is highly convoluted with some very basic help available for the WAD tables schema while none exist for LinuxsyslogVer2v0 table schema. I was curious if anyone else using the diagnostic logs table storage for any querying/reporting? If so how do you query these for a specific host and time period? I can do a querying using non primary/row key but besides being time consuming it will cost a hell eventually considering that will be a table scan. Really appreciate any advice.
You should consider using Azure Data Explorer (ADX) for your long-term storage solution. It allows for KQL queries on your long-term data and is the preferred method for keeping log/security data past the default for services like LogA and Sentinel.
The pricing page for ADX can be a bit confusing and there is a website to help you estimate costs here: https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
By default, logs ingested into Azure Sentinel are stored in Azure Monitor Log Analytics. This article explains how to reduce retention costs in Azure Sentinel by sending them to Azure Data Explorer for long-term retention.
Storing logs in Azure Data Explorer reduces costs while retains your ability to query your data, and is especially useful as your data grows. For example, while security data may lose value over time, you may be required to retain logs for regulatory requirements or to run periodic investigations on older data.
https://learn.microsoft.com/en-us/azure/sentinel/store-logs-in-azure-data-explorer?tabs=adx-event-hub
I need to build a dashboard which will visuallize the usage and cost of many azure subscriptions. accounts, departments.
My plan was:
Send the data that is 'behind' the Azure Cost Analysis view, to the log analytics workspace.
In the log analytics workspace, perform custom aggregations / filters.
Display those aggregations as charts in Azure Metrics or directly in Azure Dashboard.
Problem is with step 1, I dont know how to send the data that is 'behind' the Azure Cost Analysis view, to the log analytics workspace.
I thought of two solutions:
Fetching the data from azure cost & billing API.
Schedule Export cost analysis data to a storage account, and then somehow moving the data from the storage account to the log analytics workspace.
Both solutions seems to me a bit 'overkilling' - is there a more direct approach to send the cost analysis data to log analytics workspace?
If there is no option such as that, I would be happy to know how would you suggest moving the exported data from the storage account to the log analytics, or do you have some other idea?
Thank you!
The only native solution is, to schedule from the Costs-Blade an export of the Costs as CSV into a StorageAccount. If you want to load the Data into a Log-Analytics-Workspace, Azure Automation and a scheduled Script would work.
I believe a direct approach is currently not available but I see this feature request raised in UserVoice / feedback forum for the same requirement. If interested, you may upvote it because in general the responsible Azure product / feature team would triage / start checking feasibility and prioritizing a received feedback based on various factors like number of votes a feedback receives, feasibility, open prioritized backlog items, etc.
I would suggest you to fetch the data from azure cost & billing API and send that data to Log Analytics from a REST API client by using the HTTP Data Collector API. For more information and illustration with examples, refer this Azure document. Or else if you want to fetch the data from azure cost & billing API and store it in a machine then you may go with custom logs. For more information w.r.t it, refer this Azure document.
Other related references:
Use cost alerts to monitor usage and spending
Supported metrics with Azure Monitor
Currently we are using Blob trigger Azure Functions to move json data into Cosmos DB. We are planning to replace Azure Functions with Azure Data Factory(ADF) pipeline.
I am new to Azure Data Factory(ADF), so not sure, Could Azure Data Factory(ADF) pipeline be better option or not?
Though my answer is a bit late, I would like to add that I would not recommend replacing your current setup with ADF. Reasons:
It is too expensive. ADF costs way more than azure functions.
Custom Logic: ADF is not built to perform cleansing logics or any custom code. Its primary goal is for data integration from external systems using its vast connector pool
Latency: ADF has much higer latency due to the large overhead of its job frameweork
Based on you requirements, Azure Data Factory is your perfect option. You could follow this tutorial to configure Cosmos DB Output and Azure Blob Storage Input.
Advantage over azure function is being that you don't need to write any custom code unless there is a data cleaning involved and azure data factory is the recommended option, even if you want azure function for other purposes you can add it within the pipeline.
Fundamental use of Azure Data Factory is data ingestion. Azure Functions are Server-less (Function as a Service) and its best usage is for short lived instances. Azure Functions which are executed for multiple seconds are far more expensive. Azure Functions are good for Event Driven micro services. For Data ingestion , Azure Data Factory is a better option as its running cost for huge data will be lesser than azure functions. Also you can integrate Spark processing pipelines in ADF for more advanced data ingestion pipelines.
Moreover , it depends upon your situation . Azure functions are server less light weight processes meant for quick access in response to an event instead of volumetric responses which are meant for batch processes.
So, if your requirement is to quickly respond to an event with little information stay with Azure functions or if you have a need for batch process switch to ADF.
Cost
I get images from here.
Let's calculate the cost:
if your file is large:
43:51hour=43.867(h)
4(DIU)*43.867(h)*0.25($/DIU-H)=43.867$
43.867/7.514GB= 5.838 ($/GB)
if your file is small(2.497MB), take about 45 seconds:
4(DIU)*1/60(h)*0.25($/DIU-H)=0.0167$
2.497MB/1024MB=0.00244013671 GB
0.0167/0.00244013671= 6.844 ($/GB)
scale
The Max instances Azure function can run is 200.
ADF can run 3,000 Concurrent External activities. And In my test, only 1500 copy activities were running parallel. (This test wasted a lot of money.)
We run a software application on azure for one of our customers. The customer want to see the performance of the systems. This consist of two parts. One is the metric information of the servers and they also want to see some information I want to provide by custom logging.
My plan is to give the customer access to the portal and only allow him access to the metric information and the custom tables.
It seems to me that by assigning a role to the customer I should be able to block all the other possibilities.
Does someone can me tell which actions I have to allow/forbid to achieve this? Or were I can find the information for this?
Solution #1
Instead of giving Read access to the virtual machine which may breaks security policy, I'd recommend to go with Azure Log Analytics (ref: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-overview
) workspace. That said, you will need to create a workspace which collects and stores server metrics (ref: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-quick-collect-windows-computer) and other custom metrics.
Your customer will be given access to the workspace only which he can see all metrics in a dashboard. If there is a need for log filtering, you can use Log Analytics query language (ref: https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-log-search-transition)
Log Analytics is a paid service. You are given free up to 10 workspaces per subscription. The workspace is considered an Azure resource so the limit follows by subscription limit, which means you can create up to 800 workspaces per a resource group. A subscription can allow 800 * 800 (for reference if you would like to do capacity planning for your workspace-based solution). For Log Analytics pricing, read here (https://azure.microsoft.com/en-us/pricing/details/log-analytics/).
Log Analytics is a good choice as its value proportion is to offer your customer intuitive dashboard to monitor their virtual machine performance, and to offer Near Real Time monitoring. And this solution is a cloud native compatibility.
There is a management solution which offers a bundle of VM capacity and performance monitoring which you can try now https://learn.microsoft.com/en-us/azure/log-analytics/log-analytics-capacity
Solution #2
Log Analytics might not be your choice because it might add more Azure service and operational cost. If you need a cheaper cost, you would need to collect your virtual machine by Performance Counter which is a built-in feature in Windows OS. With Performance Counter you can export to Excel file, or visualize into Power BI or some custom chart.
Other Solutions
You can utilize Azure Monitor and API to get data, For example, this API https://learn.microsoft.com/en-us/rest/api/monitor/metricdefinitions/list. You would certainly need to visualize or format in some intuitive way to satisfy your customer. It can be a custom front-end web, or Power BI or even Excel with chart.
You can just query to Azure Blob Storage and use Stream Analytics combining with Power BI to visualize your data (https://thuansoldier.net/?p=7187).
There is not a single solution. This really depends on your existing resource capacity, financial stuff or so on.
I've recently begun using the Web Deployment Accelerator for my Windows Azure account. It is providing an immediate return in time saved and is an excellent offering.
However since "everything" is now stored to Azure Storage rather to the regular E:Drive I am immediately seeing a cost consequence for using the tool.
In one day I have racked up a mighty 4 cent NZD charge. In order to do that I had to burn through about 80,000 storage transactions and frankly i cant figure where they all went.
I uploaded 6 sites that are very small wouldn't have more than 300 files each. So I'm wondering:
a. is there is a profiling tool for the Web Deployment Accelerator that will allow me to see where and how 80,000 storage transactions were used for such a small offering. Is it storage transaction intensive tool? Has any cost analysis been carried out in terms of how this tool operates? Has it been optimised with cost in mind?
b. If I'm using this tool do i pay for 2 storage transactions per http request to a site? As since the tool now writes the web server logs to table storage, that would be one storage request to pull the http request resource (img, script, etc) and a storage request to write the log entry as well would it not?
I'm not concerned about current charges I 'm concerned about the future if i start rolling all my hosted business into the cloud. I mean Im now being charged even just to "look" at my data right? If i list the contents of a storage folder using a tool like Azure Storage Explorer that's x number of storage transactions where x = number of files in the folder?
Not sure of a 3rd-party profiler tool, but Windows Azure Storage logging and metrics will give you very detailed info regarding both individual accesses and hourly rollups. It's pretty straightforward to enable, and the November 2011 SDK includes support for the API calls required for enabling. See here for an overview of what's offered for metrics and logging.
My team worked with Fullscale180 to build a storage library, Azure Store XRay, to demonstrate how to enable and query storage metrics and logging. Note: This was published before the SDK had logging and metrics support, so it uses the REST API calls instead. But that won't impact you if you try to use the library.
You can also look at another code demo, Cloud Ninja, which calls the XRay library for its metrics display (see here for running demo).
Regarding querying storage for objects in blob containers: that's not a 1:1 transaction:file scenario. You can specify the maximum number of blobs to return when listing items in a container. It's possible that all blobs are returned in one transaction. Of course, if you then grab each blob, each of these will be at least one transaction (depending on blob size). See here for details about listing blobs.