Excessive open connections to Mongos instances - node.js

We're moving from a single replica set to shards and are experiencing some issues. We have 3 mongos instances, 3 config servers, and 15 data nodes (5 shards with 3 replicas). We're seeing really poor query performance and looking at the mongos instances I'm seeing something like 25k open connections per instance!
For example, I'm seeing log lines like
[listener] connection accepted from 10.10.36.122:35098 #521622 (23858 connections now open)
and
[conn498875] end connection 10.10.36.122:41520 (23695 connections now open)
For reference, we have another nearly identical environment that we have not yet moved to sharding which is showing ~250 total open connections.
The application code is using the nodejs driver and is using a connection url that looks something like
mongodb://mongos0.some.internal.domain:27017,mongos1.some.internal.domain:27017,mongos2.some.internal.domain:27017
I'm at a bit of a loss for how to track this issue down. Is this not the correct way to connect to mongos?
EDIT (7/7/18)
After some experimenting, I found that we were using a connectTimeoutMS of 180000 (3 minutes). Removing this value resolved the issue. However, it's still not clear why this configuration works with a standalone replica set, but causes issues when sharding. Can anyone explain what's going on here?

Related

MongoDB NodeJS driver pooling connection (Question)

I've just set up a full NodeJS bot, using MongoDB. This Discord server has roughly 24k people spamming the bot left and right with commands, and there for I've used
(Info blurred out, due to having username, password, ips there)
"url": "mongodb://XXXX:XXXX#XXX.XX.XXX.XX.XXX:25000/?authSource=admin?maxPoolSize=500&poolSize=300&autoReconnect=true",
This is my URI, and as you see I've allowed a farely large poolsize.
Normally my application (before i enabled pooling) would have hit 300-600 on average connections, due to having it have multiple instances of "MongoDB.Connect(uri) etc" around in the cose, as well as a massive amount of db.close() at the end of collections.
I've cleaned up the entire thing, and i only call 1 instance of MongoClient.Connect() & then refer this connection around once in the code (as a bypasser).
There after I've made sure to wipe everything that would close the db (db.close();)
I've started up, and everything still seems responsive - so theres no database/mongo errors.
However, looking through MongoDB Compass, my connection count is around 29 stable. Which is good obviously, but when i enabled 300 Pools, shouldn't this be higher?
This is how my mongod.cfg looks like
Is there something i have missed? or is it all behaving as it should?
Each client connects to each server once or twice for monitoring. If you create a client that performs a single operation, while that operation is running against a 4.4 replica set you have 7 open connections.
By reusing clients you can have a dramatic reduction in the number of total connections.
Additionally a further reduction is expected since each of your operations can complete faster (it doesn't have to wait for server discovery).

MongoNetworkError Connection Timed Out

I have a MongoDB Server on an EC2 instance. My meteor app is hosted on Heroku and is connected to said Server. We've had about 2 months uptime and just yesterday things crapped out causing the app to crash.
Logs show `
Exception while polling query {"collectionName":"Foo","selector":{"barId":"9hcnn7vreGbM9dKSH"},"options":{"transform":null}} { MongoNetworkError: connection 5 to IP:27017 timed out`
at TLSSocket.<anonymous> (/app/.meteor/heroku_build/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/connection.js:259:7)
at Promise.asyncApply (packages/mongo/mongo_driver.js:1042:14)
This then repeats for what seems like an infinite amount of lines. I can see it being done for several other queries. Seems like clients had several connections trying to query data and the logs are showing failure for all of them?
Restarting the Heroku dynos seems to have resolved things. I also checked the mongod.log file. I'm seeing msg":"Slow query" on some line, but other than that, nothing stands out (or rather, I'm not sure what to look for).
Never had this issue before. Sounds like it could just be an anomaly with the connection, or maybe the DB being bogged down? Any insights? Thanks!

Understand Cassandra pooling options (setCoreConnectionsPerHost and setMaxConnectionsPerHost)?

I recently started working with Cassandra and I was reading more about connection pooling here. I was confuse about pool size and couldn't understand what does this mean here:
poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost( HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(2000);
Below is what I want to understand in detail:
I would like to know what does setCoreConnectionsPerHost, setMaxConnectionsPerHost and setMaxRequestsPerConnection means?
What is LOCAL and REMOTE means here?
If someone can explain with an example then it will really help me understand better.
We have a 6 nodes cluster all in one dc with RF as 3 and we read/write as local quorum.
Cassandra protocol allows to submit for execution multiple queries over the same network connection in parallel, without waiting for answer. The setMaxRequestsPerConnection sets how many in-flight queries could be in one connection simultaneously - maximal limit depends on protocol, and since protocol v3, it's 32k, but in reality you need to keep it around 1000-2000 - if you have more, then it's a sign that server is not keeping with your queries.
Drivers are opening connections to every node in the cluster, and these connections are marked either as LOCAL - if they are to the nodes in the data center that is local to the application (either set explicitly in load balancing policy, or inferred from first contacted point), or as REMOTE if they are to the nodes that in the other data centers.
Also, driver can open several connections to nodes. And there are 2 values that control their number: core - the minimal number of connections, and max - what is the upper limit. Driver will open new connections if you submit new requests that doesn't fit into the existing limit.
So in your example:
poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost( HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(2000);
for local data center, it will open 4 connections per node initially, and it may grow up to 10 connections
for other data centers it will open 2 connections, that could grow up to 4 connections

cassandra connections spikes load issue

I am using cassandra according to the following struct:
21 nodes , AWS EC2 i3.2xlarge , version 3.11.4 .
The application is opening about 5000 connection per node (so its 100k connections per cluster) using the datastax java connection driver.
Application is using autoscale and frequently opens/close connections.
Number of connections to open at once by app servers can reach up to 500 per node (opens simultaneously on all nodes at once - so its 10k connections opens at the same time across the cluster)
This cause spikes of load on cassandra and cause reads and writes latency.
I have noticed each time connections opens/close there are high number of reads from system_auth.roles and system_auth.role_permissions.
How can I prevent the load and resolve this issue ?
You need to modify your application to work with as small number of connections as possible. You need to have following in mind:
Create Cluster/Session object, once at start and keep it. Initialization of session is very expensive operation, it adds a load to Cassandra, and to your application as well
you may increase the number of the simultaneous requests per connection, instead of opening new connections. Protocol allows to have up to 32k requests per connection. Although, if you have too many requests in-flight, then it's a sign that your Cassandra doesn't keep with workload and can't answer fast enough. See documentation on connection pooling

pgbouncer - auroraDB cluster not load balancing correctly

I as using AuroraDB cluster with 2 readers and pgBouncer to maintain a connection pool.
My application is very read intensive and fires a lot of select queries.
the problem I am facing is my 2 read replicas are not getting used completely in parallel.
I can see the trends where all connections get moved to 1 replica where other replica is serving 0 connections and after some time the situation shift when 2nd replica serves all connections and 1st serves 0.
I investigated this and found that auroraDB cluster load balancing is done on by time slicing 1-second intervals.
My guess is when pgBouncer creates connection pool all connection are created within 1 second window and all connections end up on 1 read replica.
is there any way I can correct this?
The DB Endpoint is a Route 53 DNS and load balancing is done basically via DNS round robin, each time you resolve the DNS. When you use pgBouncer, is it resolving the DNS once and trying to open connections to the resolved IP? If yes, then this is expected that all your connections are resolved to the same instance. You could fix this conceptually in multiple ways (I'm not too familiar with pgBouncer), but you basically need to somehow make the library resolve the DNS explicitly for each connection, or explicitly add all the instance endpoints to the configuration. The latter is not recommended if you plan on issuing writes using this Connection pool. You don't have any control over who stays as the writer, so you may inadvertently end up sending your writes to a replica.
AuroraDB cluster load balancing is done on by time slicing 1-second intervals
I'm not too sure where you read that. Could you share some references?

Resources