Websockets with Socket.io Node apps on Microsoft Azure - node.js

We have a Nodejs server that communicates with the websocket protocol (WebRTC, socket.io).
During our development on Heroku, we did not encounter any particular problem.
However, we encountered problems during the deployment of our application on Azure:
The client / server communication is unstable, after analysis we noticed that there is a failure to communicate in websocket and that the transport protocol used is 'polling':
websocket.js:112 WebSocket connection to
'wss://hote.fr/socket.io/?EIO=3&transport=websocket&sid=EgjKLAtp89wrBKMzAAAG'
failed: Error during WebSocket handshake: Unexpected response code:
503
When you activate the "Websocket" function on the administration, the site becomes even more unstable and very long and communication is impossible. The problem of websocket communication is not solved with this function. Example
After having informed us we did disable the websockets in the web.config but without change.
Note that with the same code, everything works well under heroku and the protocol used is websocket.
Has anyone ever encountered this problem, and has a solution?
I'm sorry about my English.
Thank you in advance for your time.

You need do the following to make it work on Azure App Service.
Enable Web sockets via the Azure portal.
Disable the IIS WebSocket module to allow Node.js to provide its own
WebSocket implementation by add this to your web.config file:
<webSocket enabled="false" />
Tell Socket.IO to use WebSocket only instead of a few XHR requests by adding this to the Node.js server:
io.set('transports', ['websocket']);
And on the client add this:
var socket = io({transports: ['websocket']});

Try the suggestions outlined by Aaron to narrow the issue, I would like to highlight a few restrictions on the Azure Sandbox to help isolate the issue further:
1.In App Service, limits are enforced for the maximum number of outbound connections that can be made for each VM instance.
As mentioned in the document Cross-VM numerical limits:
“These limits apply only for customers of Basic or higher plans; in other words, customers running on their own dedicated VMs. These limits are there to protect the entire VM even though one particular site may be with its limits described above. The limits are different depending on the size of VM configured.”
This error also might occur if you try to access a local address from your application.
As mentioned in the document Local address requests:
“Connection attempts to local addresses (e.g. localhost, 127.0.0.1) and the machine's own IP will fail, except if another process in the same sandbox has created a listening socket on the destination port.
Rejected connection attempts, such as the following example which attempts to connect to 127.0.0.1:80, from .NET will result in the following exception:
Exception Details: System.Net.Sockets.SocketException: An attempt was made to access a socket in a way forbidden by its access permissions 127.0.0.1:80.”
3.For more information about outbound connections in your web app, see the blog post about outgoing connections to Azure websites.

Related

GraphQL subscription does not receive messages after successful handshake

I have created a simple GraphQL Subscription using Nest.js/Apollo GraphQL over Node.js. My client application which is a react.js/apollo client works find with the server. The client subscibes to the server via GraphQL similar to:
subscription
{
studentAdded
{
id
}
}
My problem is that it works only locally. When I deploy my server back-end to a hosted docker over internet, client won't receive data anymore.
I have traced the client, it sends GET request on ws://api.example.com:8010/graphql and receives the successful HTTP/1.1 101 Switching Protocols response. However, nothing is received from server like when the server was on my local machine. Checking the remote server log showed me that the client successfully connects to server. There, I can see onConnect log messages.
Now I need any guidance to solve the problem.
I check several things myself. Firstly, I thought WebSocket address is prohibited in the network but then realized that it is on same port as normal HTTP. Secondly, supposed that WebSocket messages/frames are transmitted over UDP but I was not correct, it is over TCP and no need to worry about network settings.
Additionally I have read several github threads and StackOverflow questions. But did not find any clue. I am not directly using Node.js/WebSocket, instead, I am using Nest.js/GraphqQL subscription. It has made my search tougher.
Your help is highly appreciated.

Random connection drops with Azure Firewall - no pattern observable

The company I work for has a web application running solidly in our Hosted Datacentre for years, no hiccups. Using route 53, NGINX, etc.
We started building in Azure recently, and are noticing weird connection drops in random times. No pattern we can find except that, the drops only occur with the Azure firewall involved.
Has anyone encountered this? The client-side traffic flow is as follows:
Client Machine --> Route 53 --> Azure Firewall --> NGINX server --> Azure application server
We've done multiple connection tests with our apps - to keep it short, internally within the Azure environment, there's no problems.
Connection tests involved just the app server stack, just the internal NGINX server, route 53 + nginx server (bypassing firewall) is fine.
It seems something specific with the firewall and how it keeps connections. I could provide some scrubbed logs, but not sure where to look.
I've found this, not sure it could be related:
https://github.com/wbuchwalter/azure-content/blob/master/includes/guidance-tcp-session-timeout-include.md
since the FW was only being used for port forwarding, we've bypassed it and that solved the issue. i suspect nginx didn't like the requests or didn't know how to forward back through to the firewall because, of, ip encapsulation from the firewall. there may be a way to solve that but we didn't investigate further.

Azure AspNetCore WebApp under high load returns "The specified CGI application encountered an error and the server terminated the process"

I'm hosting my AspNetCore app in Azure (Windows hosting plan P3v2 plan). It works perfectly fine under normal load (5-10 requests/sec) but under high load (100-200 requests/sec) starts to hang and requests return the following response:
The specified CGI application encountered an error and the server terminated the process.
And from the event logs I can get even more details:
An attempt was made to access a socket in a way forbidden by its access permissions aaa.bbb.ccc.ddd
I have to scale instance count to 30 instances, and while each instance getting just 3-5 requests per sec, it works just fine. I beleive that 30 hosts is too much to process this high load, beleive that the resource is underutilized and trying to find the real bottleneck. If I set instance count to 10 - everything crashes and every request starts to return the error above. Resources utilization metrics for the high load case with 30 instances enabled:
The service plan CPU usage is low, about 10-15% for each host
The service plan memory usage is around 30-40%
Dependency responses quickly, 50-200 ms
Azure SQL DTU usage is about 5%
I discovered this useful article on current tier limits and after an Azure TCP connections diagnostics I figured out a few possible issues:
High outbound TCP connection
High TCP Socket handle count - High TCP Socket handle count was detected on the instance .... During this period, the process dotnet.exe of site ... with ProcessId 8144 had the maximum open handle count of 17004.
So I dig more and found the following information:
Per my service plan tier, my tcp connections limit should be 8064 which is far from the displayed above. Next I've checked the socket state:
Even though I see that number of active TCP connections is below the limit, I'm wondering if open socket handles count could be an issue here. What can cause this socket handle leak (if any)? How can I troubleshoot and debug it?
I see that you have tried to isolate the possible cause for the error, just highlighting some of the reasons to revalidate/remediate:
1- On Azure App Service - Connection attempts to local addresses (e.g. localhost, 127.0.0.1) and the machine's own IP will fail, except if another process in the same sandbox has created a listening socket on the destination port. Rejected connection attempts, normally returns the above socket forbidden error (above).
For peered VNet/On_premise, kindly ensure that the IP address used is in the ranges listed for routing to VNet/Incorrect routing.
2.On Azure App service - If the outbound TCP connections on the VM instance are exhausted. limits are enforced for the maximum number of outbound connections that can be made for each VM instance.
Other causes as highlighted in this blog
Using client libraries which are not implemented to re-use TCP connections.
Application code or the client library is leaking TCP socket handles.
Burst load of requests opening too many TCP socket connections at once.
In case of higher level protocol like HTTP this is encountered if the Keep-Alive option is not leveraged.
I'm unusure if you have already tried the App Service Diagonstic to fetch more details, kindly give that a shot:
Navigate to the Diagnose and solve problems blade in the Azure portal.
In the Azure portal, open the app in App Services.
Select Diagnose and solve problems > "TCP Connections"
Consider optimizing the application to implement Connection Pooling for your .Net/Observe the behavior locally. If feasible restart the WebApp and then check to see if that helps.
If the issue still persists, kindly file a support ticket for a detailed/deeper investigation of the backend logs.

Socket.io clients keep reconnecting on Azure host

I'm hosting a small node.js app in azure, but when a client is joined it gets reconnected almost immediately and this is keep going on!
If i switch "Web Sockets" on in Azure, the reconnections are gone, but it doesn't seem to recieve any disconnect event if i close the clients*, also the connection events are registered relatively slow as well!
*disconnect events do get registered after a minute delay!
If i run the app in local, everything works fine!
You didn't share any code or web.config file with us. However, there is an official instruction: Create a Node.js chat application with Socket.IO in Azure App Service we can follow.
You may need to pay attention to Verify web.config settings:
Azure web apps that host Node.js applications use the web.config
file to route incoming requests to the Node.js application. For
WebSockets to function correctly with Node.js applications, the
web.config must contain the following entry.
<webSocket enabled="false"/>
This disables the IIS WebSockets module, which includes its own
implementation of WebSockets and conflicts with Node.js specific
WebSocket modules such as Socket.IO. If this line is not present, or
is set to true, this may be the reason that the WebSocket transport
is not working for your application.

Azure a connection attempt failed

I've a sitecore azure deployment 2.0. Unfortunately, when I try to run this from company network I get the error below:
A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection
failed because connected host has failed to respond 213.199.180.206:80
When I try below on the same machine it works:
http://www.google.com
https://www.google.com
Wondering what exactly is causing the above issue given both 443 and 80 works well via IE.
Thanks.
Definitely sounds like a corporate firewall/gateway problem. Have written a blog post with my experiences of just these types of issues. http://reservoirdevs.wordpress.com/2013/10/18/sitecore-azure-walkthrough-and-gotchas/
My solution was to try from outside the corporate network. It then worked fine.
This sounds like your firewall on your Azure machine is not set to allow incoming http traffic on port 80. Although there could be a lot of other reasons for this timeout.

Resources