Disconnection issues with azure service bus relay - azure

We are running some long-running test apps with Azure Service Bus relay over http, hosted in a windows service and most of the time, these run fine for 2-3 days. However every so often an internal network glich may occur (e.g. firewall reboots) that kills the internet connection.
At this point, the relay is dropped in Azure and our web app can no longer communicate with the on-premise service.
I would have thought that the Azure relay client was fault-tolerant - in that if it realises that it's lost connection with Azure then it will re-establish the connection andf if it can't keep trying until it can.. but it appears that this is not the case. This seems pretty fundamental...?
Only once have I ever seen a "System.ServiceModel.CommunicationException" where the service can't communicate on the internet, and that was when the client was starting up and trying to establish the connection in the first place.
Is there any advice or feedback on handling transient disconnections through the relay service (as it's a cloud --> on-prem direction then the client can't AFAIK ping the server).

If you are still experiencing issues, you may want to contact Azure support to understand why it is disconnecting. The Relay client should reconnect if something happens to the existing connection.
You may want to add ConnectionStatusBehavior to your ChannelFactory to have it output when the status for the connection changes. It will contain the error that caused it to change status.
var connectionStatusBehavior = new ConnectionStatusBehavior();
connectionStatusBehavior.Online += ConnectionStatusOnlineMethod;
connectionStatusBehavior.Offline += ConnectionStatusOfflineMethod;
channelFactory.Endpoint.Behaviors.Add(connectionStatusBehavior);

This issue is solved by Microsoft in version 2.6.5 of Microsoft Azure Service Bus dll. After 1 month of testing it seems to work.

Related

Why does my Azure Hybrid Connection show a "Status Unknown"?

So...about 5pm 2 nights ago, all 14 of my listeners on my Azure Service Bus dropped. So I logged in to my on-prem SQL Server to check on my Hybrid Connections and both of them showed a status of "Status Unknown". I can't find anything on the internet about this specific status.
Nothing changed on my SQL Server other than the fact that I've pegged the RAM....it's at 100% usage.
If I go to the Azure Portal, navigate to either of my Hybrid Connection Overview pages and click on the "Hybrid Connection Url", I get the following message in the browser:
"error": {
"code":"TokenMissingOrInvalid",
"message":"MissingToken: Relay security token is required. TrackingId:*SOME GUID*, SystemTracker:*SERVICE BUS NAME*:*HYBRID CONNECTION NAME*, Timestamp:2021-08-04T04:19:16"}
}
Now....I didn't change anything on my Hybrid Connection configurations. I haven't changed anything about tokens. I have no idea what's going on other than my Azure App Services have been down for 2 days, now.
Any help would be greatly appreciated....
This looks like an authentication error where a token might not be generating when you are trying to make a call to the underlying On prem server
You can refer the SO thread for ServiceBusAuthorization and still if you are facing the issue kindly raise a ticket with MS-Q&A
Microsoft support led me to this article where I found the following information:
Make Sure that the Date and Time are Correct
The Hybrid Connection Manager connects to Azure Relay using Secure Sockets Layer (SSL) on port 443. If there's a problem with your SSL handshake or connection, it will break your Hybrid Connection. If you find that your Hybrid Connection works initially, and then it stops working after about 10 minutes, that's a sign that you need to check the date and time on the machine running the Hybrid Connection Manager. Make sure they are correct because if they're not, your SSL connection may not work.
Well...the time on my server was off by about 16 minutes b/c of a group policy that I had never bothered to fix b/c I don't know anything about group policies. So I looked up how to fix the server's clock and, once that done, resolved this issue.

Connections lost on slot swap

I want to use deployment slots for my Blazor server side application, but it stops working for the current users during the swap and they have to refresh the page.
I'm using an Azure SignalR Service for performance reasons, so it kinda makes sense, I imagine it like this:
Connections are held in memory and when I swap, obviously that is gone. At least without a SignalR Service. But shouldn't my SignalR Service keep SignalR connections (see red)? Did I set it up incorrectly?
I found others having similar problems (without using Blazor), but I'm not sure if these are viable with Blazor, especially because I just want to mitigate that 1-2 minute downtime for an update...
Automated reconnect
SignalR client disconnected on Azure slots swap
Storing connections in an external storage. But manually handling connections is absurd effort?
https://learn.microsoft.com/en-us/aspnet/signalr/overview/guide-to-the-api/mapping-users-to-connections
Update:
See: https://www.youtube.com/watch?v=Vvjdqq8MB44&t=12m10s
It seems there is "Web traffic" going directly to the Blazor app. My guess: After a swap the "Web traffic" still goes to the previous instance, while SignalR traffic goes to the newly swapped in instance. That sounds like a problem.
But once again, I have no clue what "Web traffic" actually is or if that is the problem and if Azure offers a way to solve the issue, so a definitive answer would still be appreciated.
I don't think you have it setup incorrectly. I looked into doing a similar thing and had the same results.
The azure signalr service is basically acting as a proxy, when you switch slot the azure signalr server is losing its connection to the blazor server hub which holds the current state.
I don't think there is anyway around it, when you want to update your blazor server site every connected client will lose its connection because it's not possible to move the client 'states' over to the new slot/site.
What would be nice is the ability for azure to switch the new slot in once all old connections have disconnected, but I don't think we'll ever get that as it's a very specific blazor server requirement.

Successfully established hybrid connection loses connection after 20 minutes (restart of azure hybrid connection service requiered)

I have a little problem while using hybrid connections in Azure to connect a local resource. I installed the connection manager successfully and established a connection to the on premise resource, everything works fine...
But after around 20 minutes the connection gets lost until I restart the azure hybrid connection service on the on premise server.
The AppService is connected to more than one server and all other connections work well. Does anyone have an idea how to fix this problem and establish a stable connection?
We encountered similar issues with Azure endpoints. We resolved it by having a scheduled message sent to the local BizTalk web services every 3 minutes to keep it alive.

Getting an intermittent error while connecting to on-premise sql database from Azure service

Created an azure MVC website, from service (controller) code we are connecting to an on-premise sql server using Azure Hybrid Connection. Intermittently we are facing below issue.
"A transport-level error has occurred when receiving results from the
server. (provider: TCP Provider, error: 0 - The specified network name
is no longer available.)"
Please provide suggestions to resolve this issue.
You can try following solutions :
Try increasing connection time-out.
check if remote connection is enabled.
Try adding firewall exception.
First of all the error means either the networks has some extra latency, the database is down or you may have too many concurrent connections open the database.
(Make sure you are closing all open datareaders.)
also it may be due to this
These are transient faults and are to be expected in the cloud. Implementing defensive programming is usually a must in the cloud. Try using some retry logic. Microsoft's transient fault exception library is an excellent start. Though meant primarily for SQL Azure and Azure Service bus, you can use the library for SQL IaaS.
In my opinion, 98% sure, because I recently had the same experience, it is a network issue from the server provider.
For instance: if you are rent the server from Ionos, by default all remote connections are blocked, even though you disable the firewall in the server. You still won't be able to connect remotely. You can, however, do your work on the server without any problem.
To connect remotely, you have to contact the server provider. They will explain how to enable firewall ports from your control panel.
I contacted my server provider as I almost get frustrated. Here was their response.
enter image description here
After this, every permitted client can connect remotely to the server.
I wish you success.

WorkerRole in Azure Cloud Service net connection

This afternoon I have uploaded my WorkerRole in Cloud Service on Azure, this service run on VM with Windows Server 2012. I have realized that WorkerRole can't get query from Databases (BigQuery, TSQL). When I have read the service log in VM I have seen the following error:
The VM and host networking components failed to negotiate protocol version '5.0'
I think that Hyper-V-vsc has something to do. Anybody knows what happens?
Thanks,
Roger
First thing I could check is to make sure the databases you are trying to connect too have whitelisted the VIP for the cloud service you're connecting from. And if you haven't already, remote into an instance of the worker and try reaching the DB's using a thin a client UI as you can.
In my experience, these issues are usually on the db end. Azure doesn't do much with blocking outbound connections. Those that fail are usually more a matter of protocol (UDP multicast for example).

Resources