Query AzureDiagnostics for failed requests - azure

I have an Azure "Firewall" resource, with (under "Rules (classic)") a Network rule collection to allow webhook calls only from specific IP addresses. This is how the rule is configured:
Name = "Allow webhook"
Protocol = "TCP"
Source Type = "IP Group"
Source = (redacted)
Destination Type = "IP Address"
Destination Addresses = (list of ip addresses for allowed servers)
Destination Ports = 443
This rule is working well. However, I have to deal with a third party that seems to mess around with their source IP addresses at the moment, and they are slow to respond. Their dashboard merely says their system receives a 502 status code, nothing else.
So I turned on logging to an Analytics Workspace, hoping to be able to query the logs for failed access attempts, and find the IP address they're using that way. None of the 5 built-in queries (nor any AzureDiagnostics | where msg_s contains "..." query) returns me failed requests.
There's also an Application Gateway to direct all this traffic in play. It has diagnostics as well. I can use this query:
AzureDiagnostics
| where requestUri_s startswith "/webhook/MyRedactedPathHere"
| where TimeGenerated > ago(30d)
| order by TimeGenerated desc
Then I can see all the successful calls from the third party made in the past, but no responses with serverStatus_s set to 502.
I know there is special 502 docs for Application Gateway and I should further investigate... but I'd expect these calls to show up in the logs regardless?
How do I query these failed requests in either my Application Gateway or in my Firewall logs?

I can't recall where, but I found a few posts online hinting that the Resource Specific logs won't work unless it's specifically enabled for your subscription. I switched to the generic one, waited a while, and it started showing up. (After I also fixed a few unrelated errors.)

Related

Azure Front-Door Route to API-M returns "DNSNameNotResolved" ErrorInfo

Randomly without any warning a request to be routed to a backend process returns a 503 error. After looking into it it looks like a "DNSNameNotResolved" get returned when forwarding the request.
I tried looking around but could not find out why this happens. The appears to be no problems when routing to the backend. I also can't find the request on the Backend at all. The backend is an Azure API-M service.
When one machine has to connect to another machine, it has to perform
DNS name resolution.
The Error indicates that APIM wasn't able to convert the hostname of the backend (e.g. contoso.azurewebsites.com) to an IP address and couldn't connect to it.
The most frequent cause for this error is using an incorrect hostname while setting up the API configuration
Refer the common network configuration issues on APIM
:https://learn.microsoft.com/en-us/azure/api-management/api-management-using-with-vnet#-common-network-configuration-issues
You may try by DNS resolution by refering : https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-integrate-internal-vnet-appgateway#set-up-custom-domain-names-in-api-management
Along with the DNS configuration
:https://learn.microsoft.com/en-us/azure/api-management/configure-custom-domain#dns-configuration
Also check if the Public IP address of APIM service is unchanged
:https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-ip-addresses#changes-to-the-ip-addresses
Other References:
Tutorial to add custom domain to your front door-
https://learn.microsoft.com/en-us/azure/frontdoor/front-door-custom-domain
Troubleshoot Azure Front Door configuration problems- https://learn.microsoft.com/en-us/azure/frontdoor/front-door-troubleshoot-routing
So according to Microsoft, the TTL on the DNS records on Front-Door is really short and thus it is very DNS aggressive, this falls within the 99.9% uptime. When this falls within the said uptime they will look into adjustments for Front-Door.

How to fetch all ip addresses that are pinged my application on Application gateway

So, I'm new on Azure and I'm struggling with one problem.
I would like to fetch all IP addresses that pinged my application(Application gateway).
I'm trying with log analytics but I can not find some kinda example which is doing something like that. The thing that I want is to fetch a list with the most requests in the past 1 hours or something like that, but just to fetch the IP addresses will be a great start. I thought this will be an easy task, but I'm really stuck.
Can anybody help me?
In the end, logs will be forwarded to Azure monitor. Go there and use a Kusto query to grap the ip addresses:
requests
| where timestamp > ago(1h)
| project appName, operation_Name, url, resultCode, client_IP, customDimensions.["client-ip"]

Azure application gateway -unknown error. Please try again

I am trying to set up ADFS Proxy servers behind an Azure Application Gateway but keep getting unknown error. Please try again when testing backend health.
I have 2 VMS in the backend pool with Windows 2012 Datacenter. I have set up the probes as follows :
Host: 127.0.0.1
Protocol : HTTPS
Path : /
Interval : 30
Timeout : 30
Unhealthy Threshold : 3
NSGs on the Backend VMs have been opened to allow all traffic for testing but still get the error
Since you say your NSGs allow traffic, check to ensure that your Firewalls on the VM itself are not blocking anything. In the Firewall settings check the boxes next to "enable file and printer sharing."
This seems obvious, but double-check that your VMs are all turned on and can ping each other. Also ensure that they are all joined to the domain.
Try removing the NSG temporarily to see if it works without it.
Allow the ports 65503-65534 in your NSGs and then check the status. These are necessary to be allowed to ensure that the App Gateway monitoring API can reach the endpoint for checking the health status.
Refer to this troubleshooting guide. https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-troubleshooting-502
Check the Azure status to make sure there is not an outage. We recently had some outages in the Central South region. https://azure.microsoft.com/en-us/status/

Do I need a Site Down page for Azure Traffic Manager in Priority Mode

I'm setting up an Azure Traffic Manager, in Priority Mode, for my website. I have a primary and a failover location, both being monitoring by a "FailoverMonitor.aspx" page - if any resources are down for the appropriate resource\region, I return a 500 error. I wanted to also make sure an error message was returned to the user if all locations were down.
In my testing, I decided to break both my primary (priority 1) and failover (priority 2), and in doing so, I saw that the primary location was served up.
This kind of surprised me, I half expected the site to not return anything at all.. but instead it served up a site that is considered to be in a "degraded" status.
I added a 3rd endpoint to the traffic manager that returns a "sorry we're down" page - but is this the intended methodology to return such a message? I just want to make sure I'm going through all intended steps and not misusing the service. Thanks!
When all the endpoints being monitored by Traffic Manager for a given profile are down, it makes a "best case effort" and responds as if all the endpoints are actually in an online state, instead of not returning any endpoint at all.
More details of this and other endpoint monitoring details can be found at: https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-monitoring/
Relevant section copy pasted below:
What happens if all Traffic Manager endpoints (excluding endpoints with a Disabled or Stopped status) are failing their health checks, and show a Degraded status?
This most commonly is caused by an error in the configuration of the service (such as an access control list [ACL] blocking the Traffic Manager health checks), or an error in the configuration of the Traffic Manager profile (such as an incorrect monitoring path).
In this case, Traffic Manager makes a "best effort" attempt and responds as if all the Degraded status endpoints actually are in an online state. This is preferable to the alternative, which would be to not return any endpoint in the DNS response.
A consequence of this behavior is that if Traffic Manager health checks are not configured correctly, it might appear from the traffic routing as though Traffic Manager is working properly. However, in this case, endpoint failover will not happen if an endpoint fails, and this affects overall application availability. To ensure that this does not occur, it is important to check that the profile shows an Online status, and not a Degraded status. An Online status shows that the Traffic Manager health checks are working as expected.
I wanted to also make sure an error message was returned to the user if all locations were down.
Since Traffic Manager is a DNS only solution i'm not sure who's supposed to serve the "We're down" page..
A 3rd endpoint serving a static page should do the job.

Azure Traffic Manager Endpoints show Degraded Even when 200 is Returned

I have an Azure Traffic Manager Profile with two Endpoints (Linux VM's running RabbitMQ).
The endpoints are of Type "Azure Endpoint" and the Target Resource Type is "Public IP Address".
When I look at the Traffic Manager Profile it reports that the Status of the profile is "Enabled", and the Monitor Status is "Degraded".
On Each of the endpoints it reports that their Status is "Enabled" and the Monitor Status is "Degraded".
I have the Traffic Manager Profile configured with Protocol as "HTTP" and Port as 15672 and the path as "/index.html".
The problem is I can't tell why it is reporting "Degraded" because if I do a wget command.
wget <vmname1>.cloudapp.azure.com:15672/index.html
Resolving <vmname1>.cloudapp.azure.com... <ip address>
Connecting to <vmname1>.cloudapp.azure.com|<ip address>|:15672... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1419 (1.4K) [text/html]
All the "documentation" (which for Azure is frustratingly just blog posts) says that if it returns 200 then it should be "Online" and not "Degraded".
Based on your reply, it is very likely that the issue is that the Traffic Manager health checks are being blocked by your NSG rules.
We don't have an easy way to configure Traffic Manager in NSGs today, nor do we publish the Traffic Manager health check source IP addresses. These are gaps we are planning to fill. In the meantime, the recommended workaround is to use a dedicated health check page running on a different TCP port for Traffic Manager, and apply the NSG only to the port used by your application.
Please take a look at this article, which may help you.
I can't be sure from the description you gave, but my best guess is that in your case the endpoint is returning a 301/302 redirect to a different URL, and the second URL is what is actually returning 200 OK. Traffic Manager health probes don't support re-directs. You can verify this for example using the F12 developer tools in IE.
Jonathan Tuliani, Program Manager, Azure Traffic Manager

Resources