Requests Return blank page after 60 seconds - iis

Running ColdFusion on IIS every request that runs for more than 60 seconds flushes the browser with a blank page.
Ive tried changing every setting that might affect this and its still happening. I'm out of ideas other than posting here, im not sure if its IIS or ColdFusion timing out.

I worked it out, its not IIS or ColdFusion, its the AWS Load Balancer. If I bypass that it works fine.

In our case, too, it was the load balancer causing the issue, not IIS. Also this resulted in ASP.NET scripts being run twice when the load balancer timed out. It was trying once more to get a result after it timed out the first time. Accessing the scripts via a whitelisted "direct" server IP address avoiding the load balancer fixed the problem.
Also, FYI, the timeouts we were setting manually were effective again, and Response.Flush() started working again. It could well be that beyond the load balancer some caching servers got involved adding to the problem.

Related

AWS Load Balancer 502 Bad Gateway

I have microservices written in node/express hosted on EC2 with an application load balancer.
Some users are getting a 502 even before the request reaches the server.
I register every log inside each instance, and I don't have the logs of those requests, I have the request immediately before the 502, and the requests right after the 502, that's why I am assuming that the request never reaches the servers. Most users solve this by refreshing the page or using an anonymous tab, which makes the connection to a different machine (we have 6).
I can tell from the load balancer logs that the load balancer responds almost immediately to the request with 502. I guess that this could be a TCP RST.
I had a similar problem a long time ago, and I had to add keepAliveTimeout and headersTimeout to the node configuration. Here are my settings (still using the LB default of the 60s):
server.keepAliveTimeout = 65000;
server.headersTimeout = 80000;
The metrics, especially memory and CPU usage of all instances are fine.
These 502 errors started after an update we made where we introduced several packages, for instance, axios. At first, I thought it could be axios, because the keep-alive is not enabled by default. But it didn't work. Other than the axios, we just use the request.
Any tips on how should I debug/fix this issue?
HTTP 502 errors are usually caused by a problem with the load balancer. Which would explain why the requests are never reaching your server, presumably because the load balancer can't reach the server for some or other reason.
This link has some hints regarding how to get logs from a classic load balancer. However, since you didn't specify, you might be using an application load balancer, in which case this link might be more useful.
From the ALB access logs I knew that either the ALB couldn't connect the target or the connection was being immediately terminated by the target.
And the most difficult part was figure out how to replicate the 502 error.
It looks like the node version I was using has a request header size limit of 8kb. If any request exceeded that limit, the target would reject the connection, and the ALB would return a 502 error.
Solution:
I solved the issue by adding --max-http-header-size=size to the node start command line, where size is a value greater than 8kb.
A few common reasons for an AWS Load Balancer 502 Bad Gateway:
Be sure to have your public subnets (that your ALB is targeting) are set to auto-assign a public IP (so that instances deployed are auto-assigned a public IP).
Security group for your alb allows http and/or https traffic from the IPs that you are connecting from.
I was also Having the same problem from 1 or 2 months something like that and I didn't found the solution. And I was also having AWS Premium support but they were also not able to find the solution. I was getting 502 Error randomly loke may be 10 times per day. Finally after reading the docs from AWS
The target receives the request and starts to process it, but closes the connection to the load balancer too early. This usually occurs when the duration of the keep-alive timeout for the target is shorter than the idle timeout value of the load balancer.
https://aws.amazon.com/premiumsupport/knowledge-center/elb-alb-troubleshoot-502-errors/
SOLUTION:
I was running "Apache" webserver in EC2 so Increased "KEEPALIVETIMEOUT=65". This did the trick. For me.

Error 503 (Service Unavailable) on old server after moving with cPanel Transfer Tool

I used cPanel's Transfer Tool to move my websites to a new IP address. It was a temporary move and I wanted to revert back to my old server today. First thing I noticed was the transfer tool changed all the A records for all sites. I changed these back using swapip, and then tried accessing the sites. They load for a very long time and finally fail with:
Service Unavailable The server is temporarily unable to service your
request due to maintenance downtime or capacity problems. Please try
again later.
Additionally, a 503 Service Unavailable error was encountered while
trying to use an ErrorDocument to handle the request.
From numerous threads, I realized 503 usually occurs when System PHP-FPM is on. However, I didn't set this on niether before nor after moving. I didn't change any other settings except the DNS, so I'm guessing it should be a DNS issue, not sure if DNS issues can cause 503 errors. I've been struggling with this for a day now.
Checking Apache Error log, I see attempts to connect to the server I temporarily moved to:
[proxy_http:error] [pid 1659:tid 47454830633216] (110)Connection timed out: AH00957: HTTPS: attempt to connect to [new.ip.address]:443
After a few days digging, I found my mistake and how to rectify it thanks to the cPanel Support. I thought it worthwhile sharing in case anyone else faces the same problem:
First, I had to disable the live transfer feature before doing the transfer. It prevents the tool from disabling IPs and proxying domains.
Given I hadn't I had to rever the changes the tool had made, which basically involved running the following scripts from the command prompt:
$ whmapi1 unset_all_service_proxy_backends username=$USER
$ /scripts/xfertool --unblockdynamiccontent $username
$ whmapi1 unset_manual_mx_redirects domain=domain.tld
The link above explains what each of these scripts does.

Windows DNS problem with Python socket.getaddrinfo()

I have DNS problems with my Python scripts, but not with network tools or browser on my Windows 10 desktop.
Running my scripts every network request takes at least 5-10 seconds. Profiling with py-spy with the --idle flag identified socket.getaddrinfo() as the function spent most time in. I tested in the Python REPL with following command:
socket.getaddrinfo("example.org", 80, proto=socket.IPPROTO_TCP)
It took around 5-10 seconds to return. Setting fixed DNS server in my active network interface did change anything.
Rebooting fixes the problem and brings down the respons time below second times. But after keeping the computer up for some days, the problem returns.
It looks like socket.getaddrinfo hits some timeout and then resolves with the correct DNS.
nslookup works just fine. Response time in ms. Also internet surfing in browsers works just fine.
Any ideas where I could start to dig?
What are you trying to accomplish?
Resolving a Domain Name to an IP address?
simply try:
socket.gethostbyname("example.com")
check which dns servers are configured (in cmd- ipconfig /all, DNS Servers category).
Try using a propriety python DNS client, see if the problem persists.
e.g. dnspython
(See this answer -
Socket resolve DNS with specific DNS server)

Cloudflare 524 w/ nodejs + express

I'm running a nodejs webserver on azure using the express library for http handling. We've been attempting to enable cloudflare protection on the domains pointing to this box, but when we turn cloudflare proxying on, we see cycling periods of requests succeeding, and requests failing with a 524 error. I understand this error is returned when the server fails to respond to the connection with an HTTP response in time, but I'm having a hard time figuring out why it is
A. Only failing sometimes as opposed to all the time
B. Immediately fixed when we turn cloudflare proxying off.
I've been attempting to confirm the TCP connection using
tcpdump -i eth0 port 443 | grep cloudflare (the request come over https) and have seen curl requests fail seemingly without any traffic hitting the box, while others do arrive. For further reference, these requests should be and are quite quick when they succeed, so I'm having a hard time believe the issue is due to a long running process stalling the response.
I do not believe we have any sort of IP based throttling or firewall (at least not intentionally?)
Any ideas greatly appreciated, thanks
It seems that the issue was caused by DNS resolution.
On Azure, you can configure a custom domain name for your created webapp. And according to the CloudFlare usage, you need to switch the DNS resolution to CloudFlare DNS server, please see more infomation for configuring domain name https://azure.microsoft.com/en-us/documentation/articles/web-sites-custom-domain-name/.
You can try to refer to the faq doc of CloudFlare How do I enter Windows Azure DNS records in CloudFlare? to make sure the DNS settings is correct.
Try clearing your cookies.
Had a similar issue when I changed cloudflare settings to a new host but cloudflare cookies for the domain was doing something funky to the request (I am guessing it might be trying to contact the old host?)

Webserver failover

I will be running a dynamic web site and if the server ever is to stop responding, I'd like to failover to a static website that displays a "We are down for maintenance" page. I have been reading and I found that switching the DNS dynamically may be an option, but how quick will that change take place? And will everyone see the change immediately? Are there any better ways to failover to another server?
DNS has a TTL (time to live) and gets cached until the TTL expires. So a DNS cutover does not happen immediately. Everyone with a cached DNS lookup of your site still uses the old value. You could set an insanely short TTL but this is crappy for performance. DNS is almost certainly not the right way to accomplish what you are doing.
A load balancer can do this kind of immediate switchover. All traffic always hits the load balancer first which under normal circumstances proxies requests along to your main web server(s). In the event of web server crash, you can just have the load balancer direct all web traffic to your failover web server.
pound, perlbal or other software load-balancer could do that, I believe, yes
perhaps even Apache rewrite rules could allow this? I'm not sure if there's a way to branch when the dynamic server is not available, though. Customize Apache 404 response to your liking?
first of all is important understand which kind of failure you want failover, if it's app/db error and the server remain up you can create a script that do some checks and failover your website to another temp page. (changing apache config or .htaccess)
If is an hardware failover the DNS solution is ok but it's not immediate so you will lose some users traffic.
The best ideal solution is to use a proxy (like HAProxy) that forward the HTTP request to at least 2 webserver and automatically detect if one of those fail and switch over to the working one.
If you're using Amazon AWS you can use ELB - Elastic Load Balancer

Resources