Azure Cosmos DB: 503 Service Unavailable - azure

While connecting from my code outside of the company network, the Cosmos DB connection works just fine. But from the company network, it throws a 503 ServiceUnavailable status code. What are the possible issues?

Based on the comments:
System.Exception: 'Microsoft.Azure.Cosmos.CosmosException : Response status code does not indicate success: ServiceUnavailable (503); Substatus: 0;
ActivityId: ;
Reason: (The request failed because the client was unable to establish connections to 4 endpoints across 1 regions. Please check for client resource starvation issues and verify connectivity between client and server.
Normally, if you take that exception and store or view the ToString(), it will show you more information. But from the message itself, it means the client tried to connect to all known endpoints available and failed.
This normally means there is either something on the network blocking your request or the machine executing this code is completely overloaded (CPU at 100% or port exhaustion) and cannot process any request.
If this is consistently failing for all operations, check that your network has the correct port range open:
By default, the SDK works in Direct mode, so check that ports in the 10000 through 20000 range are open and available. If you have private endpoint enabled, the range is 0 to 65535.
As #GauravMantri mentioned, you can change to Gateway mode also if the network is restricted:
string connectionString = "<your-account-connection-string>";
CosmosClient client = new CosmosClient(connectionString,
new CosmosClientOptions
{
ConnectionMode = ConnectionMode.Gateway
});

Related

FileSync local endpoint offline

I have 3 servers (one of them with Windows Server 2012 R2 and 2 with Windows Server 2019) and I use Azure FileSync to sync files between them.
Since a few days I have a problem, the 2012 R2 server is appearing offline in the azure portal (it shows "no activity"). I tried the Test-StorageSyncNetworkConnectivity cmdlet and it fails with the following message:
Discovery service connectivity result:
Result: Success
HostUri: unknown
HostIPv4Addr: Fail. DNS name does not exist. Resolution through GetAddrInfo failed with error: 11001
HostIPv6Addr: Fail. DNS name does not exist. Resolution through GetAddrInfo failed with error: 11001
Management service connectivity result:
Result: Fail. Failed to run test
HostUri: unknown
HostIPv4Addr: Fail. DNS name does not exist. Resolution through GetAddrInfo failed with error: 11001
HostIPv6Addr: Fail. DNS name does not exist. Resolution through GetAddrInfo failed with error: 11001
HostNetworkLatency [min,avg,max]: Network Latency Request Failed.
Monitoring service connectivity result:
Result: No response from monitoring agent process.
HostUri: unknown
HostIPsAddr: IPv4 and Ipv6 addresses do not exist
ServerEndpoint: faf66731-1e22-47eb-93eb-b8d3331f0de2
SyncServiceResult:
SyncServiceHostUri:
SyncServiceHostIPsAddr: IPv4 and Ipv6 addresses do not exist
SyncServiceHostNetworkLatency: Request Failed.
ServerEndpoint: 80f3bb96-463b-4f86-9e26-8dcf0c92f915
SyncServiceResult:
SyncServiceHostUri:
SyncServiceHostIPsAddr: IPv4 and Ipv6 addresses do not exist
SyncServiceHostNetworkLatency: Request Failed.
ServerEndpoint: b9a874b4-7acd-4174-b5e8-26ac23c84c7e
SyncServiceResult:
SyncServiceHostUri:
SyncServiceHostIPsAddr: IPv4 and Ipv6 addresses do not exist
SyncServiceHostNetworkLatency: Request Failed.
Remediation Steps
For Azure File Sync to work correctly, you will need to configure your servers to communicate with multiple Azure servic
es
Refer the following public document for details on proxy settings or firewall settings for Azure File Sync - https://aka
.ms/AFS/ProxyAndFirewall
If you have configured a private endpoint refer the following public document for configuring private endpoint for Azure
File Sync - https://aka.ms/AFS/PrivateEndpoint
NetworkTestPassed Report
----------------- ------
False ...
The problem seems to be DNS related, but I tried the Test-NetConnection -ComputerName <remote-host> -Port 443 cmdlet with the correct URLs (taken from https://learn.microsoft.com/it-it/azure/storage/file-sync/file-sync-firewall-and-proxy#test-network-connectivity-to-service-endpoints) and all the endpoints seems to be working fine (the ping is failing but I think that is regular behavior. E.g.:
PS C:\Program Files\Azure\StorageSyncAgent> Test-NetConnection -ComputerName tm-kailani7.one.microsoft.com -Port 443
AVVISO: Ping to tm-kailani7.one.microsoft.com failed -- Status: TimedOut
ComputerName : tm-kailani7.one.microsoft.com
RemoteAddress : 20.38.85.153
RemotePort : 443
InterfaceAlias : Ethernet 2
SourceAddress : 192.168.0.185
PingSucceeded : False
PingReplyDetails (RTT) : 0 ms
TcpTestSucceeded : True
I also tried the FileSyncErrorsReport.ps1 but even that doesn't give me any error:
AVVISO: There are no file sync errors to report. Either the last completed sync session did not have per-item errors or
the ItemResults event log on the server wrapped due to too many per-item errors and the event log no longer contains
errors for this sync group. To learn more, see the Azure File Sync troubleshooting documentation:
https://aka.ms/AFS/FileSyncErrorReport
I think the problem lies with the fact that the AzureStorageSyncMonitor.exe process is not running and if i try to run it manually it just closes itself after a few seconds.
I've got no event ID 9301 (specified here: https://learn.microsoft.com/it-it/azure/storage/file-sync/file-sync-troubleshoot?tabs=portal1%2Cazure-portal#server-endpoint-health) and by searching in the other folder of eventvwr i could only find the event 4104 which shows me some error dated to the last time the server has reached the Azure endpoint:
Querying for new jobs failed.
HttpErrorCode: 0x80C8700C
InternalErrorCode: 0x80C80300
Any help would be greatly appreciated, thank you.
• Kindly please check the event ID 9302 in the ‘FileSync’ telemetry logs under ‘Application and Service Logs’ in the event viewer for the active sync sessions logged every 5 to 10 minutes and check whether it is making any progress as the ‘AzureStorageSyncMonitor.exe’ utility synchronizes the status of the Server endpoint to the storage sync service in the portal.
• You can also check the ‘Perfmon.msc’, i.e., performance counter which is built-in to the Azure File Sync to monitor the sync activity locally on the server.
• Finally, please check the Server’s configured IP address settings too as you are encountering the DNS resolution issue while trying to execute the ‘Test-StorageSyncNetworkConnectivity’ command. In the IP address settings, please check whether the configured DNS server IP addresses (Preferred and Secondary) are configured correct and are reachable.
Also, check the ‘localhosts’ file in the ‘C:\Windows\System32\drivers\etc’ path whether it contains the correct IP address of the server, i.e., Windows Server 2012 R2 and its expected DNS hostname as various services on the server itself including the ‘AzureStorageSyncMonitor’ refer the ‘localhosts’ file for sending DNS requests to the connected/configured external services and for communicating between the internal services also.
• Finally, would suggest you to please disable negative caching on the DNS client, put the suffix with the matching host A record as the last entry in the suffix search list and use the ‘AF_UNSPEC’ for the family and let your code determine the ‘A/AAAA’ results for you.
For more detailed information on this, kindly refer to the below link: -
https://learn.microsoft.com/en-us/troubleshoot/windows-server/networking/getaddrinfo-fails-error-11001-call-af-inet6-family#workaround

Could not find Host of Azure In App SQL Database

I've created a MySQL In App database for my Azure App, and got the connection string for it. This string is injected into the application.json, and then used to create the actual connection:
WebApplicationBuilder builder = // get it somewhere
var connectionString = builder.Configuration.GetConnectionString("DefaultConnection")
builder.Services.AddDbContext<DatabaseContext>(options => options.UseMySQL(connectionString));
Only... no connection string works. The one with the port (Database=localdb;Data Source=127.0.0.1:53844;User Id=azure;Password=password) throws:
System.Net.Sockets.SocketException (11001): No such host is known.
And the one without the port (Database=localdb;Data Source=127.0.0.1;User Id=azure;Password=password) throws:
System.Net.Sockets.SocketException (10013): An attempt was made to access a socket in a way forbidden by its access permissions.
This question sugested another connection string (Server=127.0.0.1; Port=53844; Database=localdb; Uid=azure; Pwd=password), which weirdly enough also throws this exception, even though the port is defined:
System.Net.Sockets.SocketException (10013): An attempt was made to access a socket in a way forbidden by its access permissions.
And the manual suggests yet another string (server=localhost;database=localdb;user=azure;password=password) which again throws one of the two exceptions depending on if the port is present.
Connecting via the browser works fine, so I can confirm port, username and password work normally.
Just to be sure, I tried "localhost" as the host, too. Same results.
What am I doing wrong?
It's a mix out of all these connection strings:
server=localhost;port=53844;database=localdb;user=azure;password=password
(Port and server separated, but both present.)
Works for me right now.

Could not get connection while getPartitionedTopicMetadata - io.netty.channel.ConnectTimeoutException: connection timed out

I have a basic Pulsar app, and when I try to connect to Pulsar, I get this exception:
2021-03-10 14:38:26.107 WARN 7 --- [r-client-io-1-1]
o.a.pulsar.client.impl.ConnectionPool : Failed to open connection
to my-pulsar-server-ms-tls.domain.com:6651 :
io.netty.channel.ConnectTimeoutException: connection timed out:
my-pulsar-server-ms-tls.domain.com/10.80.13.38:6651 2021-03-10
14:38:26.212 WARN 7 --- [al-listener-3-1]
o.a.pulsar.client.impl.PulsarClientImpl : [topic:
persistent://myTenant/myNamespace/myTopic]
Could not get connection while getPartitionedTopicMetadata -- Will try
again in 100 ms
My Pulsar client is pretty basic:
PulsarClient.builder()
.serviceUrl(serviceUrl)
.authentication(AuthenticationFactory.token(authToken))
.tlsTrustCertsFilePath(serverCertificateFilePath.toString())
.enableTlsHostnameVerification(false)
.allowTlsInsecureConnection(false)
.build();
The producer is also pretty basic and looks like this:
pulsarClient.newProducer(Schema.STRING)
.topic(topic)
.create();
I've verified that the token and TLS cert are correct. I've also tried connecting a consumer from this same environment and got a similar exception, and I know that others with the same code are able to connect to the same Pulsar cluster from other environments. What is the issue?
Your connection is getting blocked by a firewall or network issue.
Verify that you can establish a connection to your endpoint my-pulsar-server-ms-tls.domain.com:6651 from your environment.
If you're able to run a network packet dump (like tcpdump), that should make it obvious if you're not able to establish a connection.
You can also try running curl my-pulsar-server-ms-tls.domain.com:6651, and if you get back some html, that means you were able to reach the server. However, if you get Could not resolve host, then you were blocked by the network configuration (such as a missing route) or firewall.

Random 21/42 seconds timeout in outgoing traffic on Azure Web Sites

I have an ASP.NET MVC 5 application running in the azure german cloud as Azure Web App (single instance - Standard S3 size).
I'm calling a non azure hosted REST/SOAP service on a particular host and the web requests either succeed promptly or timeout after 21 / 42 seconds.
I've load tested the requests and the percentile of requests timing out is between 20 and 80.
One particular remarkable property of the timeout is, that they occur after exactly 21 or 42 seconds (this is serious, no reference to hitchhiker's guide to the galaxy intended).
Calling a different service from the web app works just fine, temporarily at least.
We've already checked the firewall of the non azure service and if the timeout occurs, not a single packet reached the host.
This issue occurred once in the past one year ago and support was unable to tell what the cause was until the issue suddenly went away roughly two weeks after first occuring, so the ticket got closed as fixed itself but now its back.
The code is using https://github.com/canton7/RestEase (uses HttpClient underneath) and looks like
[Header("Content-Type", "application/json")]
public interface IApi
{
[Post("/Login")]
Task<LoginToken> Login([Body]LoginRequest request);
}
private static Dictionary<string, IApi> ApiClientsByHost = new Dictionary<string, IApi>();
private IApi GetApiForHost(string host)
{
if (!ApiClientsByHost.TryGetValue(host, out var client))
{
lock (ApiClientsByHost)
{
if (!ApiClientsByHost.TryGetValue(host, out client))
{
ApiClientsByHost[host] = client = RestClient.For<IApi>(host);
}
}
}
return client;
}
var client = GetApiForHost("https://production/");
var loginToken = await client.Login(new LoginRequest { Username = username, Password = password });
By different service, i mean using "https://testserver/" instead of "https://production/" (testserver is located in a different data center with different IP and all).
The API authentication is passing a token via query but it timeouts already before being able to get a token.
The code is caching the IApi to avoid the TCP starvation problems of disposing HttpClients (but i've never run into port exhaustion).
Restarting the app does not resolve the issue and the issue only occurs to production currently (but a year ago, when this issue occurred on production, we've switched to testserver which worked initially but after some time, ran into the same problem)
EDIT: Found some explanation in the last answer as to where those magical 21 seconds are comming from.
EDIT: One way i've found to workaround is, is to setup a azure vm with a proxy on it and configure defaultProxy to pass through that vm.
That's TCP retransmission timing out. It's odd that you are getting different values though.

IIS Application pool identity

I am attempting to obtain a data feed from yahoo finance. I am doing this with the following code:
System.Net.WebRequest request = System.Net.WebRequest.Create(http://download.finance.yahoo.com/download/quotes.csv?format=sl&ext=.csv&symbols=^ftse,^ftmc,^ftas,^ftt1x,^dJA);
request.UseDefaultCredentials = true;
// set properties of the request
using (System.Net.WebResponse response = request.GetResponse())
{
using (System.IO.StreamReader reader = new System.IO.StreamReader(response.GetResponseStream()))
{
return reader.ReadToEnd();
}
}
I have placed this code into a console application and, using Console.WriteLine on the output I receive the information I require. I have used the 'Run as..' command to execute this using a specific domain account.
When I use this code from within a Page load I receive the following error message "No connection could be made because the target machine actively refused it 76.13.114.90:80".
This seems to suggest that the call is reaching yahoo (is this true?) and that there is something missing.
This would suggest there is an identity difference in the calls between the console application and application pool.
Environment is: Windows Server 2003, IIS 6.0, .net 4.0
"Target machine actively refused it" indicates that the TCP connection itself is not succeeding. This could be due to the fact that the Proxy settings when run under IIS are not the same as those that apply when you run in the console.
You can fix this by setting a WebProxy on your request, that points to the proxy server being used in the environment.
Yes, an active refusal is indication that the target machine is receiving the request and the information in the headers is either incorrect or insufficient to process the request. It is entirely possible that if you had to run this call using a "run as" command in console that the application pool's identity user does not have the appropriate permission or username. You can attempt to change the identity user to this specific domain account to see if that alleviates the problem, but you may have to isolate this particular function into its own application pool in order to protect the rest of the website from having this specification.

Resources