CoucbDB Ubuntu 14 Azure Socket Connection Timeout error - linux

I have couchdb running on a Linux Ubuntu 14.04 VM and a .net Web application running under Azure Web Apps. Under our ELMAH logging for the web application I keep getting intermittent errors:
System.Net.Sockets.SocketException
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond [ipaddress]:5984
I've checked the CouchDB logs and there isn't a record of those requests so I don't believe it's hitting the CouchDB server, I can confirm this by looking at the web server logs on Azure and see the Error 500 response. I've also tried a tcpdump however with little success (another issue logging tcpdump to a separate disk keeps failing due to access denied)
We've previously ran CouchDB on a Windows VM with no issues so I wonder if the issue relates to the OS connection settings for tcp and timeouts
Anyone have any suggestions as to where to look or what immediately jumps to mind?

Related

504 error on nodejs sails API server randomly across all endpoints

we have sails js as an API server and intermittently we are facing 504 upstream timed out the issue, the call was going till Nginx from there its throwing upstream timeout, there is no log coming on the application server, and it is happening across all the APIs intermittently not all the time, and the traffic on server is very low i.e. 5-10 requests per minute and there is no heavy computation going on, so not sure how can I debug this issue. this is a very random issue. also, there are No server restarts or any other errors on application logs. it's running fine on PM2. we are using an AWS EC2 instance. current timeout is 60 seconds but none of our APIs takes more than 500 milliseconds. we are using the node 6.6 version as it is a legacy monolith app so can not upgrade it due to multiple dependencies and no single owner. and requests are passing through the load balancer to NGINX but some time does not reach the application server. also, the instance size is quite bigger in terms of CPU and memory and traffic is extremely low. this is a very random behavior not specific to API. sometimes it can happen 1 out of 10 sometimes 2 out of 5 requests can through a gateway timeout issue.
some of the logs from Nginx are below-
[error] 31688#31688: *38998779 connect() failed (110: Connection timed out) while connecting to upstream, client 10.X.X.X
2022/04/22 16:36:37 [error] 31690#31690: *38998991 connect() failed (110: Connection timed out) while connecting to upstream, client: <server_ip>, server: <DNS_URL>
guys I have tried almost all the things from StackOverflow but nothing helping me, so please help me to find the root cause so I can mitigate the issue

How to fix oracle TNS - Connection timed out error when connecting to database remotely?

I am trying to connect to oracle database hosted on Linux server remotely from my windows machine and getting error ORA-12170: TNS: Connect Timeout. I've already checked the following:
listener.ora configuration and it's status.
tnsnames.ora naming parameters.
firewall is listening on the IP and the default port 1521
If there is no firewall between the client and the target database then you should not normally need to adjust the timeout. You can try adjusting the sqlnet parameters mentioned in the error message: *Action: If the error occurred because of a slow network or system,
// reconfigure one or all of the parameters SQLNET.INBOUND_CONNECT_TIMEOUT,
// SQLNET.SEND_TIMEOUT, SQLNET.RECV_TIMEOUT in sqlnet.ora to larger values.
- -
However, I think you should ask your network team to trace your connection attempt especially if after increasing the timeout you still get the error. The full Oracle version in use and the platforms (client and target) may be important.

Random timeouts at Node.js + gRPC application on Kubernetes

We have a weird networking issue.
We have a Hyperledger Fabric client application written in Node.js running in Kubernetes which communicates with an external Hyperledger Fabric Network.
We randomly get timeout errors on this communication. When the pod is restarted, all goes good for a while then timeout errors start, sometimes randomly fixed on its own and then goes bad again.
This is Azure EKS, we setup a quick Kubernetes cluster in AWS with Rancher and deployed the app there and same timeout error happened there too.
We ran scripts in the same container all night long which hits the external Hyperledger endpoint both with cURL and a small Node.js script every minute and we didnt get even a single error.
We ran the application in another VM as plain Docker containers and there was no issue there.
We inspected the network traffic inside container, when this issue happens, we can see with netstat a connection is established but tcpdump shows no traffic, no packages are even tried to be sent.
Checking Hyperledger Fabric SDK code, it uses gRPC protocol buffers behind the scenes.
So any clues maybe?
This turned out to be not Kubernetes but dropped connection issue.
gRPC keeps connection open and after some period of inactivity intermediary components drop the connection. In Azure AKS case this is the load balancer, as every outbound connection goes through a load balancer. There is a non configurable idle timeout period of 4 minutes after which load balancer drops the connection.
The fix is configuring gRPC for sending keep alive messages.
Scripts in the container worked without a problem, as they open a new connection every time they run.
Application running as plain Docker containers didnt have this issue since we were hitting endpoints every minute hence never reaching idle timeout threshold. When we hit endpoints every 10 minutes, timeout issue also started there too.

IISWMSVC_STARTUP_UNABLE_TO_ACTIVATE_HWC

We've been able to successfully, automatically, deploy our MVC and WebApi to IIS7.5 on Win2008 R2, for some time now. Just this week the MSDeploy stopped working. The System event log shows this error when attempting to restart the Web Management Service on the targeted Win200 R2 server:
The Web Management Service service terminated with service-specific
error %%-2147483640.
The Application log shows this error at the same exact time the above error occurs.
IISWMSVC_STARTUP_UNABLE_TO_ACTIVATE_HWC
Failed to activate the Hostable Web Core (HWC). Web Management Service startup failed. Please reference the Win32 error in this
event for further information.
Exception:System.Runtime.InteropServices.COMException (0x8007007F):
The specified procedure could not be found. (Exception from HRESULT:
0x8007007F) at
System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32
errorCode, IntPtr errorInfo) at
Microsoft.Web.Management.Server.WebServer.Microsoft.Web.Management.Server.Interop.IWebServer.Start()
Process:WMSvc User=NT AUTHORITY\LOCAL SERVICE
The above error is preceded by this warning:
A listener channel for protocol 'http' in worker process '7164' serving application pool 'WMSvcAppPool' reported a listener channel
failure. The data field contains the error number.
Which is preceded by this error:
Failed to find the RegisterModule entrypoint in the module DLL C:\Windows\Microsoft.NET\Framework64\v4.0.30319\webengine.dll. The
data is the error.
We've attempted to reach the target server using https://ourservername:8172/MsDeploy.axd. The response is:
Error 102 (net::ERR_CONNECTION_REFUSED): The server refused the
connection
Is this occurring because the port is blocked or because the Web Management Service is not running? (The Windows Firewall with Advanced Security dialog says the "Windows Firewall is off" and there are no entries in the firewall log (C:\Windows\System32\LogFiles\Firewall).
We've seen some posts that indicate the certificate may be the issue. Not sure how to actually tell if this is the case though. The CERT we have says it is valid thru 2029.
I resolved the problem on a cloned Win2012 VM by changing the SSL certificate which was set to the original host. I used the self-cert.
I've put this here as a potential answer as I didn't read the question far enough to see the SSL comment at the end ;) and although the fix took 30 seconds we spent at least 2 days trying everything else.

Unable to connect to azure from a specific server

I have an Azure service bus queue which can't connect to my queue. On my pc it works fine, On our dev server it also works fine. We have deployed it on our test box and We are getting this error when trying to receive messages from the queue:
Microsoft.ServiceBus.Messaging.MessagingCommunicationException: Could
not connect to net.tcp://jeportal.servicebus.windows.net:9354/. The
connection attempt lasted for a time span of 00:00:14.9062482. TCP
error code 10060: A connection attempt failed because the connected
party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond
168.62.48.238:9354. ---> System.ServiceModel.EndpointNotFoundException: Could not connect to
net.tcp://jeportal.servicebus.windows.net:9354/. The connection
attempt lasted for a time span of 00:00:14.9062482. TCP error code
10060: A connection attempt failed because the connected party did
not properly respond after a period of time, or established
connection failed because connected host has failed to respond
168.62.48.238:9354. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed
because connected host has failed to respond
168.62.48.238:9354
We have disabled the firewall and it still doesn't work, any suggestions on troubleshooting ?
If this is related to firewall setting that you may want to try to set the connectivity mode to Http. More details at
http://msdn.microsoft.com/en-us/library/windowsazure/microsoft.servicebus.connectivitysettings.mode.aspx
and:
http://msdn.microsoft.com/en-us/library/windowsazure/microsoft.servicebus.connectivitymode.aspx
Try to increase the timeouts on your bindings to 1 minute and add your server application as an exception in Windows Firewall manually.
So this ended up being a simple issue of ou network firewall being restricted. We had told our SA's to open the ports up for 9354 goinging to the sb. They said they did open them... but they didn't. I walked throght it with them and we discovered it wasn't open

Resources