"Server x timed out" during MongoDB aggregation - node.js

I have a script that periodically runs aggregation on a mongodb collection. As the dataset has grown, the amount of time it takes to aggregate has also grown. My aggregation script has recently stopped working consistently, and the error logs show:
error: { [MongoError: server <x> timed out]
name: 'MongoError',
message: 'server <x> timed out' }
I've tried debugging this, and the only pattern I can find is that this timeout seems to only occur when the aggregation takes longer than 2 minutes (it times out right around 2m). Does anyone have additional debugging tips for this? The 2-minute thing is giving me the impression that I just need to configure some timeout somewhere but I can't figure out where or if i'm just falling into a red-herring trap.
About the system configuration: This aggregation script is a node.js (v5.9.1) application running in an alpine-based docker (v1.9.1) container. It uses the mongodb node driver (v2.1.19). Single mongodb server (though this is also happening in a separate environment with a replSet) running mongod (v3.2.6)

I got the same problem for logs time aggregation. I think I have the solution for you.
I found that the option socketTimeoutMS is responsible for that.
Check your mongo_client.js default socketTimeoutMS value. For me it was 2min. Mongodb module version 2.1.18.
So just add this option into your url :
mongodb://localhost:27017/test?maxPoolSize=2&socketTimeoutMS=60000
It will set timeout to 10 mins. That does the trick for me.

Related

Elasticsearch nodejs check if queue is full

I have the following error with elasticsearch
[remote_transport_exception] [es-0][x.x.x.x:9300][indices:data/write/bulk[s]]
Or
[remote_transport_exception] [es-0][x.x.x.x:9300][indices:data/write/bulk[s][p]]
It seems like it seems that the elasticsearch queue is full
I am using the nodejs lib https://www.npmjs.com/package/elasticsearch and this error occured after calling client.index.
I am using index as a promise into a rabbitmq consumer, the message are not coming more than 8 in the same time.
client.index().then(...)
It seems that the then is called when the update or create is still in queue, i tried to add {wait_for_active_shards: 'all'} but I have the same issue.
It was an issue because the elasticsearch server was too busy.
I added a retry system in case of 429 error code, now it works fine

Random connection errors to MS SQL from nodeJS app

We have an AWS server running some nodeJS services. The services connecting to MS sql are randomly crashing with message "Failed to connect to databaseserver:1433 - Could not connect (sequence)".
We are running on:
App server:
Linux Ubuntu 14.4
AWS m5
NodeJS: 8.11.2
Services are using package mssql latest version (4.3.0). This includes tedious 2.7.1.
DB server:
Windows server 2012.
sql server 2012
throughput: about 300 rpm, error also happens when throughput is lower (about 20 rpm).
App is running in a cluster through PM2 (runs 4 times). We see the error happening on all 4 at the same time, but sometimes also on 1 or 2 instances.
What we tried:
Upgrading to alpha version of mssql with tedious 3.0.1. Did not make a difference
Upgrading from Amazon M4 machine to M5 machine with enhanced networking
Changing the pool settings in the app. We tried setting min connections to 0 or low/high value. Max also to low/high value but no avail.
Duplicate server to new machine.
Setting idleTimeoutMillis to 1 second
Pinging DB server to see if there is a connection problem, but we see no weird pings when the error happens.
Connection on app startup:
App.sqlConnection = new App.SQL.ConnectionPool(config, function(err) {
if(err){
Log.error(err);
process.exit(1);
}
App.sqlConnection.on('error', err => {
Log.error(`There was a connection err : ${err}`);
process.exit(1);
});
});
request;
var request = new App.SQL.Request(App.sqlConnection);
request.query(sQuery, function(err,results)
{
});
Errors are catched by the "on error" handler.
The error happens randomly across services. Some have more instances of the error then others.
We are running out of options. Any idea if we can see more detailed errors?
I have a couple suggestions.
First, how sure are you that these errors are actually a problem? If your code simply retries, instead of exiting, are the connections stable afterwards, or can a connection drop in the middle of a query?
(Connections dropping in the middle of queries are obviously not good, but random failures on connection, that can be fixed by retries, are the best kind of problem to have IMHO.)
Ignoring the potential in-code fix, I'm wondering when you say you "duplicated server to new machine" - did you launch a new AMI using latest Windows Server 2012, or did you image and clone? If your database server is a couple years old, you might actually be running outdated network drivers in your instance, which could give you some hiccups.
If you wanted to explore that, you could attempt rebuilding the entire database server from scratch on a newly launched AMI. Alternately you can upgrade PV driver, network adapter, and EC2Config on your existing instance, you can find the instructions at the following links:
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/Upgrading_PV_drivers.html#aws-pv-upgrade
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/sriov-networking.html#enable-enhanced-networking
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/UsingConfig_Install.html

MongoError: Cannot create collection users - database is in the process of being dropped

I have a REST API and I'm writing TDD for this project. My TDD is consisted of two parts: route and service. I chose to use Jest. I have a MongoDB database that I use for testing. When each test is completed, I reset my database using the afterAll() method. In this method, I run the mongoose.connection.dropDatabase function.
There is no error when I ran only one test file but when I run multiple test files, I get an error. The error message:
MongoError: Cannot create collection auth-db.users - database is in
the process of being dropped.
I share sample codes with you:
users.route.test.ts:
https://gist.github.com/mksglu/8c4c4a3ddcb0e56782725d6457d97a0e
users.service.test.ts:
https://gist.github.com/mksglu/837202c1048687ad33b4d1dee01bd29c
When all my tests run, "sometimes" gives errors. I wrote the above error message. The reason for this error is that the reset process still continues. I can't solve this problem. I'd appreciate it if you could help.
Thanks.
https://jestjs.io/docs/en/cli.html#runinband
What you are looking for is --runInBand command. Which makes jest to run serially instead of creating a worker pool of child processes that run tests

Random 'ECONNABORTED' error when using sendFile in Express/Node

I have set a node server with Express middleware. I get the ECONNABORTED error randomly on some files when loading an HTML file which triggers about 10 other loads (js, css, etc.). The exact error is:
{ [Error: Request aborted] code: 'ECONNABORTED' }
Generated by this simplified code (after I tried to debug the issue):
res.sendFile(res.locals.physicalUrl,function (err) {
if (err)
console.log(err);
...
}
Many posts talk about this error resulting from not specifying the full path name. That is not the situation here. I do specify the full path and indeed the error is randomly generated. There are times when the page and all its subsequent links load perfectly and there are times when they do not. I tried to flush the cache and did not find any pattern to connect it with this.
This specific error appears to be a a generic term for socket connection getting aborted and is discussed in the context of other applications like FTP.
Having realized that the node worker threads can be increased, I tried to do so using:
process.env.UV_THREADPOOL_SIZE = 20;
However, my understanding is that even absent this, at most the file transfer may have to wait for a worker thread to be free and not get aborted. I am not talking about big files here, all files are less than 1 MB.
I have a gut feeling that this has nothing to do with node directly.
Please point to any other possibilities (node or otherwise) to handle this error. Also, any other indirect solutions? Retrying a few times could be one but that would be clumsy. EDIT: No, I cannot retry. Headers are already sent with the error!
A SIDE NOTE:
Many examples on the use of sendFile skip using the callback thereby giving the impression that it is a synchronous call. It is not. Do use the callback at all times, check for success and only then move on to the "next" middleware or take appropriate steps if the send fails for whatever reason. Not doing so can make it difficult to debug the consequences in an asynchronous environment.
See https://stackoverflow.com/a/36949631/2798152
Could it be possible that in some cases you terminate the connection by calling res.end before the asynchronous call to res.sendFile ends?
If that's not the case - can you pastebin more of your application code?
Uninstalling and Re-installing MongoDB solved this for me.
I was facing the same problem. It started happening when I had to force restart my laptop because it became unresponsive. On restarting, trying to connect to mongo server using nodejs, always threw ECONNABORTED error

Using memcached failover servers in nodejs app

I'm trying to set up a robust memcached configuration for a nodejs app with the node-memcached driver, but it does not seem to use the specified failover servers when one server dies.
My local experiment goes as follows:
shell
memcached -p 11212
node
MC = require('memcached')
c = new MC('localhost:11211', //this process does not exist
{failOverServers: ['localhost:11212']})
c.get('foo', console.log) //this will eventually time out
c.get('foo', console.log) //repeat 5 or 6 times to exceed the retries number
//wait until all the connection errors appear in the console
//at this point, the failover server should be in use
c.get('foo', console.log) //this still times out :(
Any ideas of what might we be doing wrong?
It seems that the failover feature is somewhat buggy in node-memcached.
To enable failover you must set the remove options:
c = new MC('localhost:11211', //this process does not exist
{failOverServers: ['localhost:11212'],
remove : true})
Unfortunately, this is not going to work because of the following error:
[depricated] HashRing#replaceServer is removed.
[depricated] the API has no replacement
That is, when trying to replace a dead server with a replacement from the failover list, node-memcached outputs a deprecation error from the HashRing library (which, in turn, is maintained by the same author of node-memcached). IMHO, feel free to open a bug :-)
This is come when your nodejs server not getting any session id from memcached
Please check properly in php.ini file you are setting properly or not for memcached
session.save = 'memcache'
session.path = 'tcp://localhost:11212'

Resources