Is there any way/trick/workaround to reduce Azure Application Insight cost? I have a very large volume of data (around 20M) ingestion every day. Data sampling set 5%, Even after Daily 5GB of data ingestion in Application Insights.
Application Insights has 90 days default retention period, but I don't need data even after 7 days.
Note: I'm only sending Info logs in application Insights with minimal information.
Few points I want to add here which help me.
Review logging and log data in correct category. (Verbose, Info, Error etc). Most of the time (verbose only for debugging) am using Info and Error log only.
Combined multiple lines of logs in a single line. It will reduce system-generated columns data.
Old
log.Info($"SerialNumber :" serialNumber)
log.Info($"id :" id)
log.Info($"name :" name)
New
log.Info($"SerialNumber, id, name :" serialNumber + id + name);
Check traces collection customDimensions. For me, a lot of system-generated data was there like thread name, logger name and class name. I did some changes in my logger, now my customDimensions column is empty.
Reduce some system-generated logs
{
"logger": {
"categoryFilter": {
"defaultLevel": "Information",
"categoryLevels": {
"Host": "Error",
"Function": "Error",
"Host.Aggregator": "Information"
}
}
}
}
All the above points help me to reduce log data, hope it will help others.
You can do more aggressive sampling. Also review what "info logs" are being sent. Perhaps you can live without sending many of them. Also review all the auto-collected telemetry -if there is something you dont care about - (eg: perf counters), remove them.
Also check this Msdn blog authored by application insights team members: https://msdn.microsoft.com/magazine/mt808502.aspx
Related
I am preparing for DP-200 exam and confused with lifecycle management.
The sample scenario is:
You manage a financial computation data analysis process. Microsoft Azure virtual machines (VMs) run the process in daily jobs, and store the results in virtual hard drives (VHDs.)
The VMs product results using data from the previous day and store the results in a snapshot of the VHD. When a new month begins, a process creates a new
VHD.
You must implement the following data retention requirements:
- Daily results must be kept for 90 days
- Data for the current year must be available for weekly reports
- Data from the previous 10 years must be stored for auditing purposes
- Data required for an audit must be produced within 10 days of a request.
You need to enforce the data retention requirements while minimizing cost.
How should you configure the lifecycle policy? To answer, drag the appropriate JSON segments to the correct locations. Each JSON segment may be used once, more than once, or not at all. You may need to drag the split bat between panes or scroll to view content.
<code>"BaseBlob": {
"TierToArchive": { "DaysAfterModificationGreaterThan": 365 },
"Delete": { "DaysAfterModificationGreaterThan": 3650 }
},
"Snapshot": {
"TierToCool": {"DaysAfterCreationGreaterThan": 90 }
}</code>
That's the solution marked as correct. I try to find the "snapshot" documentation regarding lifecycle, but cannot find it in MS docs.
Can someone explain me the purpose of a Snapshot here please? and what correct lifecycle should be.
--Daily results must be kept for 90 days
--Data for the current year must be available for weekly reports
--Data from the previous 10 years must be stored for auditing purposes
--Data required for an audit must be produced within 10 days of a request
baseblob
-- delete -10years
-- tiertocold -10days
-- tiertoarchive -1 years
snapshot
-- delete -90days
Reference link for the snapshot: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=template#rules
Sometime we queries at one and get the value at another column, so thought of sharing this finding came across
I have created a logic-app, with trackedProperties "MessageId" and attached with Log analytics workspace (Diagnostic settings).
How to add track properties to log analytics workspace in logi-app
"trackedProperties": {
"MessageId": "#{json(xml(triggerBody())).ABC.DEF.MessageID}"
}
When I queried in Log Analytics,there I saw 2 trackedProperties columns with the name trackedProperties_MessageId_g and trackedProperties_MessageId_s.
Significance of above said 2 column names: When you provide a GUID value, it populates to trackedProperties_MessageId_g and when you provide string it populates to trackedProperties_MessageId_s.
Thanks for sharing your finding(s). Yes, AFAIK when you send particular field/column to Log Analytics its name is changed based on the type. This is true for almost any field/column. However there are some fields/columns which are called reserved that you can send without name change if you send them in the right type of course. An MVP, Stanislav Zhelyazkov has covered this topic here.
If you are not expecting 2 trackedProperties with the names trackedProperties_MessageId_g and trackedProperties_MessageId_s and were expecting only 1 trackedProperty then I suggest you to share your feedback in this UserVoice / feedback forum. Responsible product / feature team would check feasibility on how this can be resolved by adding some kind of checkpoint in the background and then if it's really feasible to accomplish then responsible product / feature team would triage / start prioritizing the feedback based on various factors like number of votes a feedback receives, priority, pending backlog items, etc.
I currently have an Azure Application Gateway that is configured for a minimum of 2 instances and a maximum of 10 instances. It's Tier is "WAF V2" and autoscaling is enabled.
If autoscaling is enabled, then theoretically there should be somewhere between 2 and 10 instances. So where can I go to check the current number of instances that the gateway has scaled up to? This seems like important information if you want to figure out if your gateway is overloaded.
I currently have been pointed out by Microsoft to this topic after asking them the same question.
My assumption on this that I take, and may not be acurate, is that I look at the Current Capacity Units metric, to see how many are in use for a certain moment. Since, documentation says one instance uses around 10 capacity units, I do the simple math to know how many instances we are using, and if we need to increase the max, or lower the minimum.
https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-autoscaling-zone-redundant
"Each capacity unit is composed of at most: 1 compute unit, or 2500 persistent connections, or 2.22-Mbps throughput."
"Note
Each instance can currently support approximately 10 capacity units. The number of requests a compute unit can handle depends on various criteria like TLS certificate key size, key exchange algorithm, header rewrites, and in case of WAF incoming request size. We recommend you perform application tests to determine request rate per compute unit. Both capacity unit and compute unit will be made available as a metric before billing starts."
I've tried to obtain this value with Logs analytics.
Enable it and use this query:
AzureDiagnostics
| where TimeGenerated > ago(30m)
| summarize dcount(instanceId_s) by bin(TimeGenerated, 1m)
You will have the different request grouped by the distinct instance name every minute. Consider adding some additional filter to the query, since you may only be interested in certain types of events.
I think it can be a good approximation
I don't think it shows you the current number of instances (if you switch to manual it will show you the instance count under properties blade), because it doesn't make sense. That's what autoscale is for, you don't really care how many instances are running, what you care is request latency\failed requests. If you see those increase, you can increase the number of maximum Application Gateway instances.
Api gives the following response with autoscale enabled:
"sku": {
"name": "Standard_v2",
"tier": "Standard_v2"
},
And this without autoscale enabled:
"sku": {
"name": "Standard_v2",
"tier": "Standard_v2",
"capacity": 4
},
so I guess it hidden from the api, so no way to know it.
I tried to pull the Azure resource usage data for billing metrics. I followed the steps as mentioned in the blog to get Usage data of resources.
https://msdn.microsoft.com/en-us/library/azure/mt219001.aspx
Even If I set "start and endtime" parameter in the URL, its not take effect. It returns entire output [ from resource created/added time ].
For example :
https://management.azure.com/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/providers/Microsoft.Commerce/UsageAggregates?api-version=2015-06-01-preview&reportedStartTime=2017-03-03T00%3a00%3a00%2b00%3a00&reportedEndTime=2017-03-04T00%3a00%3a00%2b00%3a00&aggregationGranularity=Hourly&showDetails=true"
As per the above URL, it should return the data between "2017-03-03 to 2017-03-04". But It shows the data from 2nd March [ 2017-03-02]. don't know why this return entire output and time filter section is not working.
Note : Endtime parameter value takes effect, mean it shows the output upto what mentioned in the endtime. But it doesn't consider the start time.
Anyone have a suggestion on this.
So there are a few things to consider:
There is usage date/time and then there is reported date/time.
Former tells you the date/time when the resources were used while the
latter tells you the date/time when this information was received by
the billing sub-system. There will be some delay in when the
resources used versus when they are reported. From this link:
Set {dateTimeOffset-value} for reportedStartTime and reportedEndTime
to valid dateTime values. Please note that this dateTimeOffset value
represents the timestamp at which the resource usage was recorded
within the Azure billing system. As Azure is a distributed system,
spanning across 19 datacenters around the world, there is bound to be
a delay between the resource usage time (when the resource was
actually consumed) and the resource usage reported time (when the
usage event reached the billing system) and callers need a predictable
way to get all usage events for a subscription for a given time
period.
The query only lets you search for reported date/time and there is no provision for usage date/time. However the data returned back to you contains usage date/time and not the reported date/time.
Long story short, because of the delay in propagating the usage information to the billing sub-system, the behavior you're seeing is correct. In my experience, it takes about 24 hours for all the usage information to show up in the billing sub-system.
The way we handle this scenario in our application is we fetch the data for a longer duration and then pick up only the data we're interested in seeing. So for example, if I need to see the data for 1st of March then we query the data for reported date/time from 1st March to say 4th March (i.e. today's date) and then discard any data where usage date is not 1st of March.
If we don't find any data (which is quite possible and is happening in your case as well), we simply tell the users that usage information is not yet available.
I want to collect different metrics for the spark application, if some one have any idea about how do i get HDFS bytes read and write please tell me?
I'm looking for the same information and I can't find the information anywhere: neither Spark documentation or the Spark users mailing list (even if some poeple are asking the question) is giving me the information.
But, I found some clues over the internet that suggest that it is provided by Spark in the metrics.
I'm working on some application logs (the ones that are provided by the history server) and it seems that the Input Metrics and Output Metrics that are presents in Task Metrics in each SparkListenerTaskEnd events is giving the amount of data that is read and written for each tasks.
{
"Event": "SparkListenerTaskEnd",
...
"Task Metrics": {
...
"Input Metrics": {
"Bytes Read": 268566528,
"Records Read": 2796202
},
"Output Metrics": {
"Bytes Written": 0,
"Records Written": 0
},
...
},
...
}
Note that I'm not a 100% sure on that but the log I got seems to be consistent with this assumption :)
Also, If you are reading from local filesystem I think this will be mixed in the same metric.