http: Accept error: accept tcp [::]:8080: accept4: too many open files; - linux

I have written REST API in Golang and I am doing performance test of my API using Jmeter.
When I run the test with 300 or more users, each user sending 20 requests with a gap of 500ms between each request I get the below error:
http: Accept error: accept tcp [::]:8080: accept4: too many open files;
I am running this Go application in AWS EC2 server. I am running this app on a 8GB RAM machine.
Below is what I have tried already:
I have increased the ulimit to a sufficiently good number. When I run ulimit -n command the output is: 1048576
In my code I made sure that the response body is closed.
But, none of these solved the issue. Any help is appreciated.
Thanks in advance.

One problem could be not closing the opened files or releasing the resources.
For example: the body object in http request is of type io.ReadCloser
This read closer has a close method which you must call after your process has been finished to release the resources.
func UserHandler(w http.ResponseWriter, r *http.request) {
var user User
if err := json.NewDecoder(r.Body).Decode(&user); err != nil {
//handle error
}
defer r.Body.Close()
// More Code
}
Here calling a defer on the r.Body.Close() will lead to releasing the associated resources after the method has returned its value.
Similar To this, there are alot of methods which implement this interface, like: * os.File, sql.DB, mgo.Session* etc. So you can just check if you're properly closing the resources.

Related

Pyzmq swallows error when connecting to blocked port (firewall)

I'm trying to connect to a server using python's pyzmq package.
In my case I'm expecting an error, because the server the client connects to, blocks the designated port by a firewall.
However, my code runs through until I terminate the context and then blocks infinitely.
I tried several things to catch the error condition beforehand, but none of them succeeded.
My base example looks like this:
import zmq
endpoint = "tcp://{IP}:{PORT}"
zmq_ctx = zmq.Context()
sock = zmq_ctx.socket(zmq.PAIR)
sock.connect(endpoint) # <--- I would expect an exception thrown here, however this is not the case
sock.disconnect(endpoint)
sock.close()
zmq_ctx.term() # <--- This blocks infinetely
I extended the sample by sock.poll(1000, zmq.POLLOUT | zmq.POLLIN), hoping that the poll command would fail if the connection could not be established due to the firewall.
Then, I tried to solve the issue by setting some sock options, before the sock = zmq_ctx.socket(zmq.PAIR):
zmq_ctx.setsockopt(zmq.IMMEDIATE, 1) # Hoping, this would lead to an error on the first `send`
zmq_ctx.setsockopt(zmq.HEARTBEAT_IVL, 100) # Hoping, the heartbeat would not be successful if the connection could not be established
zmq_ctx.setsockopt(zmq.HEARTBEAT_TTL, 500)
zmq_ctx.setsockopt(zmq.LINGER, 500) # Hoping, the `zmq_ctx.term()` would throw an exception when the linger period is over
I also added temporarily a sock.send_string("bla"), but it just enqueues the msg without returning me some error and did not provide any new insights.
The only thing I can imagine to solve the problem would be using the telnet package and attempting a connection to the endpoint.
However, adding a dependency just for the purpose of testing a connection is not really satisfactory.
Do you have any idea, how to determine a blocked port from the client side still using pyzmq? I'm not very happy that the code always runs into the blocking zmq_ctx.term() in that case.

First call to Microsoft.Azure.ServiceBus.Core.MessageSender.SendAsync times out, subsequent calls don't

I have some code written to communicate with an azure service bus. It sends messages to a queue. It's in a project targeting .net standard 2.0
When I run it from a .net core terminal app it runs fine. But, when the same code is called from a .net framework 4.7.2 project then the first attempt to send a message results in the following exception after 30 to 90 seconds:
"The remote party closed the WebSocket connection without completing the close handshake."
But any further messages will be sent without problem.
// This is using Microsoft.Azure.ServiceBus, if that makes any difference...
MessageSender MessageSender = new MessageSender(ConnectionString, SendQueueName;
try
{
await MessageSender.SendAsync(new Message(Encoding.UTF8.GetBytes("Test that won't work")));
}
catch(Exception e)
{
// Error will be caught here:
// "The remote party closed the WebSocket connection without completing the close handshake."
}
await MessageSender.SendAsync(new Message(Encoding.UTF8.GetBytes("Test that will work")));
Does anybody know why the first call fails? And how to make it not fail? Or fail quicker? I've tried changing the OperationTimeout and RetryPolicy but they don'e seem to have any effect.
These first connections are via port 5671/56712, which Trend antivirus intercepts. Once these have timed out then the framework falls back to using 443, which works fine.
We tried turning Trend off and running testing the connection and its pretty much instantaneous.

Random 'ECONNABORTED' error when using sendFile in Express/Node

I have set a node server with Express middleware. I get the ECONNABORTED error randomly on some files when loading an HTML file which triggers about 10 other loads (js, css, etc.). The exact error is:
{ [Error: Request aborted] code: 'ECONNABORTED' }
Generated by this simplified code (after I tried to debug the issue):
res.sendFile(res.locals.physicalUrl,function (err) {
if (err)
console.log(err);
...
}
Many posts talk about this error resulting from not specifying the full path name. That is not the situation here. I do specify the full path and indeed the error is randomly generated. There are times when the page and all its subsequent links load perfectly and there are times when they do not. I tried to flush the cache and did not find any pattern to connect it with this.
This specific error appears to be a a generic term for socket connection getting aborted and is discussed in the context of other applications like FTP.
Having realized that the node worker threads can be increased, I tried to do so using:
process.env.UV_THREADPOOL_SIZE = 20;
However, my understanding is that even absent this, at most the file transfer may have to wait for a worker thread to be free and not get aborted. I am not talking about big files here, all files are less than 1 MB.
I have a gut feeling that this has nothing to do with node directly.
Please point to any other possibilities (node or otherwise) to handle this error. Also, any other indirect solutions? Retrying a few times could be one but that would be clumsy. EDIT: No, I cannot retry. Headers are already sent with the error!
A SIDE NOTE:
Many examples on the use of sendFile skip using the callback thereby giving the impression that it is a synchronous call. It is not. Do use the callback at all times, check for success and only then move on to the "next" middleware or take appropriate steps if the send fails for whatever reason. Not doing so can make it difficult to debug the consequences in an asynchronous environment.
See https://stackoverflow.com/a/36949631/2798152
Could it be possible that in some cases you terminate the connection by calling res.end before the asynchronous call to res.sendFile ends?
If that's not the case - can you pastebin more of your application code?
Uninstalling and Re-installing MongoDB solved this for me.
I was facing the same problem. It started happening when I had to force restart my laptop because it became unresponsive. On restarting, trying to connect to mongo server using nodejs, always threw ECONNABORTED error

Meteor: “Failed to receive keepalive! Exiting.”

I'm working on a project which uses Npm request package for making request to an API server. On getting response, the callback processes the returned response. During this response processing I get the error: Failed to receive keepalive! Exiting. The following code will help you understand.
request({url: 'http://api-link-from-where-data-is-to-be-fetched'
},
function (err,res,body) {
//The code for processing response
}
Anybody can help me please who knows how to resolve this issue?
This might help answer this for you:
https://github.com/meteor/meteor/issues/1302
The last post on that page says:
Note that this is just a behavior of the develop-mode meteor run (and any hosting environment that chooses to turn on the keepalive option, which probably isn't most of them), not a production issue. And in any case, if your Node process is churning CPU for seconds, it's not going to be able to respond to any network traffic.
this post might help you : Meteor error message: "Failed to receive keepalive! Exiting."
Removing autopublish with meteor remove autopublish and then writing my own publish and subscribe functions fixed the problem.

connect EADDRNOTAVAIL in nodejs under high load - how to faster free or reuse TCP ports?

I have a small wiki-like web application based on the express-framework which uses elastic search as it's back-end. For each request it basically only goes to the elastic search DB, retrieves the object and returns it rendered with by the handlebars template engine. The communication with elastic search is over HTTP
This works great as long as I have only one node-js instance running. After I updated my code to use the cluster (as described in the nodejs-documentation I started to encounter the following error: connect EADDRNOTAVAIL
This error shows up when I have 3 and more python scripts running which constantly retrieve some URL from my server. With 3 scripts I can retrieve ~45,000 pages with 4 and more scripts running it is between 30,000 and 37,000 pages Running only 2 or 1 scripts, I stopped them after half an hour when they retrieved 310,000 pages and 160,000 pages respectively.
I've found this similar question and tried changing http.globalAgent.maxSockets but that didn't have any effect.
This is the part of the code which listens for the URLs and retrieves the data from elastic search.
app.get('/wiki/:contentId', (req, res) ->
http.get(elasticSearchUrl(req.params.contentId), (innerRes) ->
if (innerRes.statusCode != 200)
res.send(innerRes.statusCode)
innerRes.resume()
else
body = ''
innerRes.on('data', (bodyChunk) ->
body += bodyChunk
)
innerRes.on('end', () ->
res.render('page', {'title': req.params.contentId, 'content': JSON.parse(body)._source.html})
)
).on('error', (e) ->
console.log('Got error: ' + e.message) # the error is reported here
)
)
UPDATE:
After looking more into it, I understand now the root of the problem. I ran the command netstat -an | grep -e tcp -e udp | wc -l several times during my test runs, to see how many ports are used, as described in the post Linux: EADDRNOTAVAIL (Address not available) error. I could observe that at the time I received the EADDRNOTAVAIL-error, 56677 ports were used (instead of ~180 normally)
Also when using only 2 simultaneous scripts, the number of used ports is saturated at around 40,000 (+/- 2,000), that means ~20,000 ports are used per script (that is the time when node-js cleans up old ports before new ones are created) and for 3 scripts running it breaches over the 56677 ports (~60,000). This explains why it fails with 3 scripts requesting data, but not with 2.
So now my question changes to - how can I force node-js to free up the ports quicker or to reuse the same port all the time (would be the preferable solution)
Thanks
For now, my solution is setting the agent of my request options to false this should, according to the documentation
opts out of connection pooling with an Agent, defaults request to Connection: close.
as a result my number of used ports doesn't exceed 26,000 - this is still not a great solution, even more since I don't understand why reusing of ports doesn't work, but it solves the problem for now.

Resources