I have been trying to set up alerts of a .NET Core App Service hosted in Azure to fire an event if X% of the requests are failing in the past 24 hours. I have also tried setting up an alert from the Service's AppInsights resource using the following metrics: Exception rate, Server exceptions, or Failed request.
However, none of these have the ability to capture a % (failure rate), all of them are using count as a metric.
Does anyone know a workaround for this?
Please try the query-based alert:
1.Go to application insights analytics, in the query editor, input below scripts:
exceptions
| where timestamp >ago(24h)
| summarize exceptionsCount = sum(itemCount) | extend t = ""| join
(requests
| where timestamp >ago(24h)
| summarize requestsCount = sum(itemCount) | extend t = "") on t
| project isFail = 1.0 * exceptionsCount / requestsCount > 0.5 // if fail rate is greater than 50%, fail
| project rr = iff(isFail, "Fail", "Pass")
| where rr == "Fail"
2.Then click the "New alert rule" on the upper right corner:
3.In the Create rule page, set as following:
I was looking for a way to avoid writing queries using something that is already built-in in app insights but in the end i also came up with something like yours solution using the requests instead:
requests
| summarize count()
| extend a = "a"
| join
(
requests
| summarize count() by resultCode
| extend a = "a"
)
on a
| extend percentage = (todouble(count_1)*100/todouble(count_))
| where resultCode == 200
| where percentage < 90 //percentage of success is less than 90%
| project percentage_of_failures = round(100- percentage,2), total_successful_req = count_, total_failing_req = count_ - count_1 , total_req = count_1
Related
I need a fixture in my behave code so that all the users that I create during testing automatically gets cleaned up. As a result, I added the following code
#test/features/steps/environment.py
#fixture()
def user_cleanup(context):
# -- SETUP-FIXTURE PART:
context.users_to_be_cleaned_up = []
print ("Creating Fixture")
yield context.users_to_be_cleaned_up
# -- CLEANUP-FIXTURE PART:
for userid in context.users_to_be_cleaned_up:
resp = delete_database_entry("users", userid)
print (resp)
context.users_to_be_cleaned_up = []
def before_feature(context, feature):
if "fixture.user.cleanup" in feature.tags:
use_fixture(user_cleanup, context)
In my features file, I added the following
#fixture.user.cleanup
Feature: Validating backend from the app side
Scenario Outline: Super Admin has permission to create other users
Given a set of existing users:
| user | details |
| superadmin | userdetails |
When "superadmin" successfully logs in
Then he can create non-existing "<user>" with "<details>"
and "<user>" can login successfully with "<details>"
Examples: User Roles
| user | details |
| superadmin_1 | user details |
The idea was to have the test append all the users into context.users_to_be_cleaned_up. But in the test, when I try to append, it says that property users_to_be_cleaned_up is not present in context.
Any idea what I am doing wrong here?
I got an answer for this and recording this here for posterity.
You need to keep your environments.py at the feature level and not at the steps level.
So the structure as it stands today
|
|-test.feature
|_environment.py
|--steps
|
|- steps.py
I'm running python v3 function app and it contains multiple functions with different bindings(cosmos, blob, http etc). I'm trying to get the details of this function app in application insights like no of request, exception raised during execution or number of request per function app and per function etc.
I'm able to run and get few details like request count. Now I'm trying to map request details with other tables like exceptions, request etc but not able to map and drill down to the particular function.
For e.g Let suppose I have 10 function in function app and they run one after another based on output of previous function. Let say in any case flow got failed at any function. Now I want at which step/function my function app failed, details of error, successful and unsuccessful flow completion of function app
Below are the some query I have used for monitoring purpose.
Request on first function to get the total number of request counts for function app.
requests
| where timestamp > ago(1d)
| where operation_Name =~ "function name"
| summarize RequestsCount=sum(itemCount) by cloud_RoleName,bin(timestamp,1d)
Request and Average Duration of functions
requests
| summarize RequestsCount=sum(itemCount), AverageDuration=avg(duration) by operation_Name
| order by RequestsCount desc
You can check the exception per function with:
exceptions
| extend OperationName = iff(operation_Name == "","[No operation name]",operation_Name)
| summarize Count = count() by cloud_RoleName, OperationName, type, method
To join with requests:
requests
| where timestamp > ago(24h) and success == false
| join kind= inner (
exceptions
| where timestamp > ago(24h)
) on operation_Id
| project exceptionType = type, failedMethod = method, requestName = name, requestDuration = duration, success
Keep it mind, if you catch an error yourself, the result of the function will be success.
You could also work with custom error logs in your functions where you maybe create a json object which will end up in the message column of the traces table. You can query further then:
traces
| where message contains "the error i am searching for"
| extend json = parse_json(message)
| project
timestamp,
errorSource = json.error_source,
step = json.step,
errors = json.errors,
url = json.url
Can you please have a look at my query below and try to assist me with this
// please add a list of your servers here, these ones are the ones that are *shutdown* overnight
let shutdownComputers = dynamic(["machines"]);
// always exclude these computers
let excludeComputers = dynamic(["machines"]);
// config the hours to exclude
let startHour = 1900; // 07:00 PM
let endHour = 06; // 06:00 Am
Heartbeat
// Get just the excluded Servers
| where TimeGenerated > startofday(ago(24h))
| where Computer in (shutdownComputers)
| summarize LastCall = arg_max( TimeGenerated, datetime_part("hour", TimeGenerated) between( startHour .. endHour) )
by Computer, sComputer = strcat("Computer goes offline between ", startHour," to ", endHour," :",Computer), ComputerEnvironment
| where isnotempty(LastCall)
| project Computer , LastCall, sComputer
// Now join those excluded servers with the others...
| join kind= fullouter
(
Heartbeat
| where TimeGenerated > startofday(ago(24h))
| where Computer !in (shutdownComputers) and Computer !in(excludeComputers)
| summarize LastCall = arg_max(TimeGenerated,*) by Computer
) on Computer
// This bit can probably be improved if I get time
| extend Computer = iif(isempty(Computer),Computer1,Computer),
LastCall = iif(isempty(LastCall),LastCall1,LastCall)
| summarize by LastCall, Computer, sComputer
| where LastCall < ago(10m)
Azure VM's heartbeat alert not working as expected shown in the screenshot below.
There are some machines, which are not being reported.
First machine from the example net-ovuat2 is stopped and i am getting an alert, second machine NET-P2PTESTAPP1. I have asked not to report about the machine at certain period starting from 7:00pm but no alert for this one also as the machine is switched off before 7:00pm.
enter image description here
I'd like to use Azure Log Analytics to create a monitoring alert for possible brute-force attempts on my users' accounts. That is to say, I'd like to be notified by Azure (or, at the very least, be able to manually run the script to obtain the data) when a user's account is successfully authenticated into O365 following a number of failed attempts.
I know how to parse the logs to, for example, obtain the number of unsuccessful sign-in attempts by all users during a defined period (see the example below):
SigninLogs
| where TimeGenerated between(datetime("2018-11-19 00:00:00") .. datetime("2018-11-19 23:59:59"))
| where ResultType == "50074"
| summarize FailedSigninCount = count() by UserDisplayName
| sort by FailedSigninCount desc
But I don't know how to script the following:
A user has created 9 unsuccessful sign-in attempts (type 50074) and
created a successful sign-in attempt.
Within a 60-second period.
Any help would be gratefully received.
Check out the Azure Sentinel community GitHub and see if the queries there help. Specifically I added https://github.com/Azure/Azure-Sentinel/blob/master/Detections/SigninLogs/SigninBruteForce-AzurePortal.txt which I think more or less does what you are after - also repasted below. Hope that helps.
// Evidence of Azure Portal brute force attack in SigninLogs:
// This query returns results if there are more than 5 authentication failures and a successful authentication
// within a 20-minute window.
let failureCountThreshold = 5;
let successCountThreshold = 1;
let timeRange = ago(1d);
let authenticationWindow = 20m;
SigninLogs
| where TimeGenerated >= timeRange
| extend OS = DeviceDetail.operatingSystem, Browser = DeviceDetail.browser
| extend StatusCode = tostring(Status.errorCode), StatusDetails = tostring(Status.additionalDetails)
| extend State = tostring(LocationDetails.state), City = tostring(LocationDetails.city)
| where AppDisplayName contains "Azure Portal"
// Split out failure versus non-failure types
| extend FailureOrSuccess = iff(ResultType in ("0", "50125", "50140"), "Success", "Failure")
| summarize StartTimeUtc = min(TimeGenerated), EndTimeUtc = max(TimeGenerated),
makeset(IPAddress), makeset(OS), makeset(Browser), makeset(City), makeset(ResultType),
FailureCount=countif(FailureOrSuccess=="Failure"),
SuccessCount = countif(FailureOrSuccess=="Success")
by bin(TimeGenerated, authenticationWindow), UserDisplayName, UserPrincipalName, AppDisplayName
| where FailureCount>=failureCountThreshold and SuccessCount>=successCountThreshold
I am trying to run few tests which needs "Delete.feature" file to be called at of the end of each scenario if it is successful but if it's a failure tests then it should not call "Delete.feature" file.
My test look something like this :
Given url ApiAdminURL
And path AdminPath
And header apigateway-apikey = apiGatewayKey
And header apigateway-basepath = 'lambdaTest'
* json myReq = read('users.json')
* set myReq.apiConf.subscriptionTiers = subscriptionTiers
* print 'my subscriptions : ', myReq.apiConf
And request myReq
When method post
Then status responseCode
* call read('Delete.feature')
Examples:
| subscriptionTiers |responseCode|
| [Unlimited,Gold,Bronze, Silver] |200 |
| [Unlimited,Gold,Bronze] |200 |
| [Unlimited,Gold,BronzeAuto-Approved] |400 |
If the response code is 200, then it should run the command "* call read('Delete.feature')" and if the responseCode is 400, then it should skip this command.
can someone please help me with this?
Please refer to the documentation: https://github.com/intuit/karate#conditional-logic
Then assert responseStatus == 200 || responseStatus == 400
And if (responseStatus == 400) karate.call('delete.feature')
One additional comment, Then status responseCode - I don't think that will work.
EDIT - also see: Check 2 differents status with Karate