Spring Cloud Gateway memory leak with non-standart response content-type - memory-leaks

We are facing strange memory leak in Spring Cloud Gateway that only appears in one route, that points to a regular Spring Web Rest service that returns response with non standart content-type: "application/jwt" and JWT-token as response body as string.
Heap dump shows that all memory is allocated by reactor.netty.http.client.HttpClientConfig.
So i've tried to set:
spring.cloud.gateway.httpclient.pool.max-idle-time: "20s"
spring.cloud.gateway.httpclient.pool.max-life-time: "30s"
And it helped, no more issues with memory leak.
But i'm curious what is the problem with non-standart content-type?
Secondly, i've tested pool timeouts locally and see no difference in response timeouts, when channel is closed.
First request is creating new connection, so response timeout is pretty big (300ms+), then after 30+ seconds (i'am checking via metrics that no more active/idle connections) i'm doing next request and getting standart response time as 10-30ms.
Real connection from pool is not closed or what?
Dependencies:
spring boot 2.7.5
spring cloud gateway 3.1.4
netty-all 4.1.84Final
reactor-netty 1.0.24

Related

HTTP 413 Error When App is Deployed on Azure

There is a strange problem that we have when we deploy our application on the Azure environment. When I start the application on my laptop, no Azure, no Docker or anything, on sending requests (which is a little bit big), I don't face any issues.
Our test and production environments are all on Azure right now. So when the application is deployed on it, I get this strange error:
log4javascript error: AjaxAppender.append: XMLHttpRequest request to URL ./common/logToServer.jsp?controllerName=6c3eaf3e-897d-4b30-a15e-62f9d3d3ce78 returned status code 413
Now I know what HTTP 413 error code is, but not sure, why my local is not showing the same error. Which leads me to believe that it might be some Azure configuration that I need to change. But don't know what.
It is simple web application on Java, Servlets and running on Tomcat.
Log4j is used as a logging framework for JavaScript with no runtime dependencies. As per the error statement, the issue was caused by the length of the payload, which is too large.
The HTTP status code 413 ("Payload Too Large") indicates that the request entity is larger than the limits defined by the server; the server might close the connection.
Fix:
Under java code -> application.properties add these two lines
server.tomcat.max-swallow-size=***MB //maximum size of the request body/payload
server.tomcat.max-http-post-size=*** MB //maximum size of entire POST request
NOTE:
*** is your desired integer representing megabyte.
reference article for more information and solution.

Azure App service returns 502 bad gateway from HttpClient

I have an app service (plan B2) running on Azure.
My integration tests running from docker container are calling some app service endpoints one by one and sometimes receive 500 or 502 error.
When I debug tests I make some pauses between calls and all requests work successfully. Also, when I scale up my app service, everything works properly.(I don't want to scale up because cpu and other params are low.)
In my tests I have only one HttpClient and I dispose it at the end so I don't think there should be any connections leaks.
Also, in TCP Connections I have around 60 total connections while in Azure docs the limit is 1,920.
This app is not accessed by any users but here it says that I had the maximum connections. Is there any way how can I track these connections? Why when I receive these 5xx errors I don't see anything in app insights? Also how 15 connections can exceed the limit when the limit is 1920? Are these connections related to my errors and how they can be fixed?
You don't see them in Application Insights because they're happening at IIS level which is breaking the request, and because of that, data is not being sent to Application Insights.
The place to look for information is "Diagnose and solve problems", then "Availability and Performance". More info in here:
https://learn.microsoft.com/en-us/azure/app-service/overview-diagnostics
PS: I do think the problem is related to the Dispose of your HTTPClient. It's a well known issue and the reason why they've introduced HttpClientFactory. More info in here:
https://www.stevejgordon.co.uk/httpclient-creation-and-disposal-internals-should-i-dispose-of-httpclient
https://stackoverflow.com/a/15708633/1384539

Memory leak/consumption in Hubs due to large messages?

I'm currently trying SignalR and RabbitMQ in order to round-robin / load balance json webservice queries and I'm having troubles with the memory consumption by one of the application when it processes large (~ 300 - 2500 kb) messages.
I have a IIS server hosting a web application (named "Backend") that needs to query an another web application (name "Pricing") also hosted by a IIS server.
In order to keep a connection alive with my RabbitMQ server, I developped console application that are connected to Backend and Princing using SignalR.
So when Backend needs to query Princing, it asks its console to publish the message in the queue and the console attached to Pricing takes the message and give it to Pricing (with Invoke<> method). When Pricing finished its job, it asks its console to publish the reply message and the console attached to Backend takes it and give to Backend.
To sum up :
[Backend] -> [Console] -> [RabbitMQ] <- [Console] <- [Pricing]
And I have 2 Pricing taking messages from their console from the RabbitMQ queue.
This setup is to replace a traditionnal webservice query between the 2 IIS and benefit from the advantages of RabbitMQ (load balancer and asynchronous call in a micro/web services architecture)
I added
GlobalHost.Configuration.MaxIncomingWebSocketMessageSize = null;
in Startup.cs in both IIS in order to accept large messages.
When I take a look at Pricing's memory consumption in Windows Task Manager, it quickly grows from 500Mb to 1500Mb (in 5 minutes, dealing with neverending queries from Backend to test the setup).
I tried something else by writing the queries content in files in a shared folder and just publishing the name of the file in RabbitMQ's messages and the memory consumption of Pricing (with of course a code modification to load the file) doesn't move and stays around 500Mb.
So it seems that it has something to do with the message length that my console passes to the IIS.
I tried to disconnect the console from the IIS Hubs because I thought that it will maybe free some memory but nope.
Does anyone experienced this issue of memory consumption by large messages into Hubs ? How can I check if there's indeed a memory leak in my application ?
What about using SignalR and RabbitMQ in web/micro services environment ? Any feedback ?
Many thanks,
Jean-Francois
.NETFramework : 4.5
Microsoft.AspNet.SignalR : 2.4.1
So it seems that the version of SignalR I use (.NetFramework) allows to tune the number of messages per hub per connection kept in memory.
I fixed it to an arbitrary 50 in Startup.cs
GlobalHost.Configuration.DefaultMessageBufferSize = 50;
Its default value is 1000, meaning (if I understood it clearly) that IIS keep a circular buffer of 1000 messages in memory. Some of the messages were weighting 2.5Mo meaning that the memory used could go up to 2500Mo per connection.
As my IIS only has one connection (its console) and doesn't need to keep track of messages (as it works as webservice), it seems that 1000 messages is way too much.
With the limit of 50 messages, the memory used by the application in Windows Task Manager stays put (around 500Mo).
Is there any flaw in the way I'm using it ?
Thanks !

Does IIS Request Content Filtering Load the full request before filter

I'm looking into IIS Request filtering by content-length. I've set the max allowed content length :
appcmd set config /section:requestfiltering /requestlimits.maxallowedcontentlength:30000000
My question is about when the filter will occur.
Will IIS first read ALL the request into memory and then throw an error, or will it raise an issue as soon as it reaches the threshold?
The IIS Request Filtering module is processed very early in the request pipeline. Unwanted requests are quickly discarded before proceeding to application code which is slower and has a much larger attack surface. For this reason, some have reported performance increases after implementing Request Filtering settings.
Limitations
Request Filtering Limitations include the following:
Stateless - Request Filtering has no knowledge of application or session state. Each request is processed individually regardless of whether a session has or has not been established.
Request Header Only - Request Filtering can only inspect the request header. It has no visibility into the request body or any part of the response.
Basic Logic - Regular expressions and wildcard matches are not available. Most settings consist of establishing size constraints while others perform simple string matching.
maxAllowedContentLength
Request Filtering checks the value of the Content-Length request header. If the value exceeds that which is set for maxAllowedContentLength the client will receive an HTTP 404.13.
The IIS 8.5 STIG recommends a value of 30000000 or less.
IISRFBaseline
This above information is based on my PowerShell module IISRFBaseline. It helps establish an IIS Request Filtering baseline by leveraging Microsoft Logparser to scan a website's content directory and IIS logs.
Many of the settings have a dedicated markdown file providing more information about the setting. The one for maxAllowedContentLength can be found at the following:
https://github.com/phbits/IISRFBaseline/blob/master/IISRFBaseline-maxAllowedContentLength.md
Update - #johnny-5 comment
The filtering happens immediately which makes sense because Request Filtering only has visibility into the request header. This was confirmed via the following methods:
Failed Request Tracing - the Request Filtering module responded to the request with an HTTP 413 Request entity too large.
http.sys event tracing - the request is accepted and handed off to the IIS website. Shortly thereafter is an entry showing the HTTP 413 response. The time between was not nearly long enough for the upload to complete.
Packet capture - Using Microsoft Network Monitor, the HTTP conversation shows IIS immediately responded with an HTTP 413 Request entity too large.
The part you're rightfully concerned with is that IIS still accepts the upload regardless of file size. I found the limiting factor to be connectionTimeout which has a default setting of 120 seconds. If the file is "completed" before the timeout then an HTTP 413 error message is displayed. When a timeout occurs, the browser shows a connection reset since the TCP connection is destroyed by IIS after sending a TCP ACK/RST.
To test this further the timeout was increased and set to connectionTimeout=6000. Then a large upload was submitted and the following IIS components were stopped one at a time. After each stop, the upload was checked via Network Monitor and confirmed to be still running.
Website
Application Pool (Stop-WebAppPool -Name AppPoolName)
World Wide Web Publishing Service (Stop-Service -Name W3SVC)
With all three stopped I verified there was no IIS process still running and yet bytes were still being uploaded. This leads me to conclude that the connection is maintained by http.sys. The fact that connectionTimeout is closely tied to http.sys seems to support this. I do not know if the uploaded bytes go to a buffer or are simply discarded. The event tracing messages didn't provide anything helpful in this context.
Leaving out the Content-Length request header will result in an RFC protocol error (i.e. HTTP 400 Bad request) generated by http.sys since the size of the HTTP payload isn't being declared.

Azure - App Availability percentage is Zero

our Api app is in UAT on Azure with service plan (Standard 3 large). What should we do if App Availability is Zero. It is getting slow response or timeout issue. When i restart the application it is up to normal. (We are using Parallel Language programming.(Async/Await)
How to find the route cause from it for slowness issue.
Ensure that Always On feature is enabled.
Such problems may be caused by application level issues, such as:
network requests taking a long time
application code or database queries being inefficient
application using high memory/CPU
application crashing due to an exception
You could enable web server diagnostics to fetch more details on the issue.
Detailed Error Logging - Detailed error information for HTTP status codes that indicate a failure (status code 400 or greater). This may contain information that can help determine why the server returned the error code.
Failed Request Tracing - Detailed information on failed requests, including a trace of the IIS components used to process the request and the time taken in each component. This can be useful if you are attempting to improve web app performance or isolate what is causing a specific HTTP error.
Web Server Logging - Information about HTTP transactions using the W3C extended log file format. This is useful when determining overall web app metrics, such as the number of requests handled or how many requests are from a specific IP address.
Also, Azure Application Insights collects telemetry from your application to help analyze its operation and performance. You can use this information to identify problems that may be occurring or to identify improvements to the application that would most impact users. This tutorial takes you through the process of analyzing the performance of both the server components of your application and the perspective of the client: https://learn.microsoft.com/en-us/azure/application-insights/app-insights-tutorial-performance
Ref: https://learn.microsoft.com/en-us/azure/app-service/app-service-web-troubleshoot-performance-degradation

Resources