I've been working with MongoDB for a while now and I've been liking it a lot. One thing I do not understand however is "Connections". I've searched online and everything just has very vague and basic answers. I'm using MongoDBs cloud service called "Atlas" and it describes the connection count as
The number of currently active connections to this server. A stack is allocated per connection; thus very many connections can result in significant RAM usage.
However I have a few questions.
What is a connection I guess? As I understand it, a connection is made between the server and the database service. Essentially when I use mongoose.connect(...);, a connection is made. So at most, there should only be one connection. However when I was testing my program I noticed my connection count was at 2 and in some moments it spiked up all the way to 7 and went to 5 and fluctuated. Does a "connection" have anything to do with the client? On the dashboard of Atlas it says I have a max connection amount of 500. What does this value represent? Does this mean only 500 users can use my website at once? If that's the case, how can I increase that number? Or how can I make sure that more than 500 connections never get passed? Or is a connection something that gets opened and I have to manually close myself? Because I've been learning from tutorials and I've never seen/heard anything like that.
Thanks!
mongoose.connect doesn't limit itself to 1 connection to the Mongo Server.
By default, mongoose creates a pool of 5 connections to Mongo.
You can change this default if necessary.
mongoose
.connect(mongoURI, {poolSize : 200});
See https://mongoosejs.com/docs/connections.html
More number of connections which you see in Atlas because there are some internal connections are also made in order to make the cluster running, these may include the connections from:
Connections made from a client.
Internal connections between primary and secondaries.
As it is a hosted service and everything is being monitored so connections from the monitoring agent.
As automation works, so the connections from the automation agent as well.
Hence whenever a new cluster is being created in Atlas, you will always see some connections in the metrics Page even though no client is being connected.
Related
I have a nodejs/ExpressJS/Mongodb/Mongoose app hosted on aws elasticbeanstalk.
The problem is elasticbeanstalk health degrades randomly ( no specific times ), that happens because any request that requires database interaction results the following in logs:
*1360931 upstream timed out (110: Connection timed out) while reading response header from upstream
This happens no matter how much data I try to load. it happens with the least amount of data, this can last from a minute up to 20 minutes and it works again on its own, it is completely random.
And I can force it to work immediately by restarting the environment ( I connect to mongodb using connection string on app startup ).
While other requests that don't require database interaction work 100%.
The thing is while database queries aren't working , I can connect to the same database from localhost and database requests work like a charm, they even work really fast.
What is even more strange is I have 4 other identical apps with the same setup, and This situation doesn't occur with any of them, only this app faces this problem !
What is the problem here ?
The above error usually means that your server closed the connection due to shorter timeout , but your application is not aware about it. You may need to check your connection string and modify the timeouts maybe decrease them , example connection string:
MONGO_URI=mongodb://user:password#127.0.0.1:27017/dbname?keepAlive=true&poolSize=30&autoReconnect=true&socketTimeoutMS=360000&connectTimeoutMS=360000
I've gone through enough articles and typeorm official documentation on setting up connection pooling with typeorm and postgressql but couldn't find a solution.
All the articles, I've seen so far explains about adding the max/Poolsize attribute in orm configuration or connection pooling but this is not setting up a pool of idle connections in the database.
When I verify pg_stat_activity table after the application bootstraps, I could not see any idle connections in the DB but when a request is sent to the application I could see an active connection to the DB
The max/poolSize attribute defined under the extras in the orm configuration merely acts as the max number of connections that can be opened from the application to the db concurrently.
What I'm expecting is that during the bootstrap, the application opens a predefined number of connections with the database and keep it in idle state. When a request comes into the application one of the idle connection is picked up and the request is served.
Can anyone provide your insights on how to have this configuration defined with typeorm and postgresql?
TypeORM uses node-postgres which has built in pg-pool and doesn't have that kind of option, as far as I can tell. It supports a max, and as your app needs more connections it will create them, so if you want to pre-warm it, or maybe load/stress test it, and see those additional connections you'll need to write some code that kicks off a bunch of async queries/inserts.
I think I understand what you're looking for as I used to do enterprise Java, and connection pools in things like glassfish and jboss have more options where you can keep hot unused connections in the pool. There are no such options in TypeORM/node-postgres though.
I was thinking about building some kind of API built on NodeJS with mongoose. I read that mongoose uses 1 connection per app.
But let us say that we have 300,000 users joining a room to answer some questions (real-time), will mongoose/mongodb handle it? Or will the server itself even handle it?
Thinking on the database side only:
The mongod executable have a parameter (--maxConns) for setting the maximum number of connections, prior to v2.6 you had a limit for that. Now, as the docs say: "This setting has no effect if it is higher than your operating system’s configured maximum connection tracking threshold", see here for linux.
Besides that you MUST consider a sharded cluster for this kind of load.
I am using MongoDB, so I am connecting trough MongoClient.connect
but I have to use that, for every route where I want to work with the database.
Tried to preload it to an object, but then the changes are not visible, till the server is restarted. Right now, it's working properly, I am only a bit worried about the performance.
Is there a better way to do that?
THe correct answer is "it depends".
For a simple desktop application where your NodeJS program is the only client: sure. A persistent connection is fine.
For an enterprise application with 100s or 1000s of concurrent users each connecting independently: no, you probably do NOT want to hold the connection open "forever".
One possible solution for the latter scenario is Connection Pooling.
You should only need to connect to your DB once - when the server starts. As long as the server is running, the connection should persist. There is no reason to connect multiple times.
We have a project on Node.js that is based on restify and we are using RethinkDB as a database. The problem is that RethinkDB should be accessed from different parts of code (from route handlers, middlewares), but not for all requests. I am wondering what is the best way to connect to RethinkDB in this case?
I see next options:
have one long connection that is stored somewhere (approach we use now),
connect to RethinkDB on each HTTP request, which potentially some of the connections being never used,
connect in each part individually, with potentially several connections per HTTP request, but without useless connections.
I ask this question because I am not sure how well Rethink handle well short/long connections and how expensive they are. For instance MongoDB prefers long connections, but all examples in RethinkDB docs uses one connection per HTTP request.
I recommend a connection pool or one connection per query. Especially if you use feature like changefeeds, which is recommened to be on its own connection.
When you use a single connection for everything, you have to also handle re-connection when the connection timeout/broken. I think it's easier to just use a connection per query, or shared a connection on a request/response.
Just ensure to close your connection after using it, otherwise you will leak connections and new connection cannot be created.
Some driver goes further and doesn't require you to think of connection anymore such as: https://github.com/neumino/rethinkdbdash
Or Elixir RethinkDB: https://github.com/hamiltop/rethinkdb-elixir/issues/32 has an issue to create connection pool.
RethinkDB has an issue related connection pool: https://github.com/rethinkdb/rethinkdb/issues/281
That's probably what community is heading too.