MongoDB queries are taking 2-3 seconds from Node.js app on Heroku - node.js

I am having major performance problems with MongoDB. Simple find() queries are sometimes taking 2,000-3,000 ms to complete in a database with less than 100 documents.
I am seeing this both with a MongoDB Atlas M10 instance and with a cluster that I setup on Digital Ocean on VMs with 4GB of RAM. When I restart my Node.js app on Heroku, the queries perform well (less than 100 ms) for 10-15 minutes, but then they slow down.
Am I connecting to MongoDB incorrectly or querying incorrectly from Node.js? Please see my application code below. Or is this a lack of hardware resources in a shared VM environment?
Any help will be greatly appreciated. I've done all the troubleshooting I know how with Explain query and the Mongo shell.
var Koa = require('koa'); //v2.4.1
var Router = require('koa-router'); //v7.3.0
var MongoClient = require('mongodb').MongoClient; //v3.1.3
var app = new Koa();
var router = new Router();
app.use(router.routes());
//Connect to MongoDB
async function connect() {
try {
var client = await MongoClient.connect(process.env.MONGODB_URI, {
readConcern: { level: 'local' }
});
var db = client.db(process.env.MONGODB_DATABASE);
return db;
}
catch (error) {
console.log(error);
}
}
//Add MongoDB to Koa's ctx object
connect().then(db => {
app.context.db = db;
});
//Get company's collection in MongoDB
router.get('/documents/:collection', async (ctx) => {
try {
var query = { company_id: ctx.state.session.company_id };
var res = await ctx.db.collection(ctx.params.collection).find(query).toArray();
ctx.body = { ok: true, docs: res };
}
catch (error) {
ctx.status = 500;
ctx.body = { ok: false };
}
});
app.listen(process.env.PORT || 3000);
UPDATE
I am using MongoDB Change Streams and standard Server Sent Events to provide real-time updates to the application UI. I turned these off and now MongoDB appears to be performing well again.
Are MongoDB Change Streams known to impact read/write performance?

Change Streams indeed affect the performance of your server. As noted in this SO question.
As mentioned in the accepted answer there,
The default connection pool size in the Node.js client for MongoDB is 5. Since each change stream cursor opens a new connection, the connection pool needs to be at least as large as the number of cursors.
const mongoConnection = await MongoClient.connect(URL, {poolSize: 100});
(Thanks to MongoDB Inc. for investigating this issue.)
You need to increase your pool size to get back your normal performance.

I'd suggest you do more log works. Slow queries after restarted for a while might be worse than you might think.
For a modern database/web app running on a normal machine, it's not very easy to encounter with performance issues if you are doing right. There might be a memory leak or other unreleased resources, or network congestion.
IMHO, you might want to determine whether it's a network problem first, and by enabling slow query log on MongoDB and logging in your code where the query begins and ends, you could achieve this.
If the network is totally fine and you see no MongoDB slow queries, that means something goes wrong in your own application. Detailed logging might really help where query goes slow.
Hope this would help.

Related

next.js and mongodb coherence?

I googled a lot but still have no clear solution to my issue.
Connecting to MongoDB, usually you establish a connection and after the job is done you close it.
Since next.js (and probably node.js) is single threaded. Sometimes it happens that there are two requests processed async while one request established the connection to the database, the otherone is closing the exact same connection. So the first request runs into an Topology closed exception. I have the feeling that the mongodb driver client is shared.
Is there something I did not understood correct in this?
try {
await client.connect()
const database = client.db("test")
const collection = database.collection("test")
const newDataset = await collection.insertOne({})
return newDataset.insertedId.toString()
} finally {
await client.close();
}
As in the comments stated, ive seen a lot of examples & questions here on stackoverflow where in each received request (example below) a database connection is established. This has no benefits and is "bad" because it just takes time and makes no sense. E.g:
app.get("/", (req, res) => {
MongoClient.connect("...", (err, client) => {
// do what ever you want here
client.close();
});
});
If you application needs a database connection, establish the connection "in the startup phase" and keep the connection open. There is no reason to open and close the database connection for each request.
const mongodb = require("monogdb");
const express = require("express");
const app = express();
// some custom init stuff
// e.g. require your route handler etc.
mongodb.MongoClient("...", (err, client) => {
// do what ever you want with the db connection now
// e.g. monkey patch it, so you can use it in other files
// (There are better ways to handle that)
mongodb.client = client;
// or the better way
// pass it as function parameter
require("./routes")(app, client);
app.listen(8080, () => {
console.log("http server listening");
});
});
As you can see in the code above, we first create a database connection and then do other stuff. This has some advantages:
If your credentials are invalid, your application is not externeal reachable because the http server is not started
You have a single connection for all requests
Database queries are potential faster because you dont have to wait to establish first a db connection
NOTE: the code above was "inline coded" here and is not tested.
But i think its illustrated the concept behind my statement.

Can you keep a PostgreSQL connection alive from within a Next.js API?

I'm using Next.js for my side project. I have a PostrgeSQL database hosted on ElephantSQL. Inside the Next.js project, I have a GraphQL API set up, using the apollo-server-micro package.
Inside the file where the GraphQL API is set up (/api/graphql), I import a database helper-module. Inside that, I set up a pool connection and export a function which uses a client from the pool to execute a query and return the result. This looks something like this:
// import node-postgres module
import { Pool } from 'pg'
// set up pool connection using environment variables with a maximum of three active clients at a time
const pool = new Pool({ max: 3 })
// query function which uses next available client to execute a single query and return results on success
export async function queryPool(query) {
let payload
// checkout a client
try {
// try executing queries
const res = await pool.query(query)
payload = res.rows
} catch (e) {
console.error(e)
}
return payload
}
The problem I'm running into, is that it appears as though the Next.js API doesn't (always) keep the connection alive but rather opens up a new one (either for every connected user or maybe even for every API query), which results in the database quickly running out of connections.
I believe that what I'm trying to achieve is possible for example in AWS Lambda (by setting context.callbackWaitsForEmptyEventLoop to false).
It is very possible that I don't have a proper understanding of how serverless functions work and this might not be possible at all but maybe someone can suggest me a solution.
I have found a package called serverless-postgres and I wonder if that might be able to solve it but I'd prefer to use the node-postgres package instead as it has much better documentation. Another option would probably be to move away from the integrated API functionality entirely and build a dedicated backend-server, which maintains the database connection but obviously this would be a last resort.
I haven't stress-tested this yet, but it appears that the mongodb next.js example, solves this problem by attaching the database connection to global in a helper function. The important bit in their example is here.
Since the pg connection is a bit more abstract than mongodb, it appears this approach just takes a few lines for us pg enthusiasts:
// eg, lib/db.js
const { Pool } = require("pg");
if (!global.db) {
global.db = { pool: null };
}
export function connectToDatabase() {
if (!global.db.pool) {
console.log("No pool available, creating new pool.");
global.db.pool = new Pool();
}
return global.db;
}
then in, eg, our API route, we can just:
// eg, pages/api/now
export default async (req, res) => {
const { pool } = connectToDatabase();
try {
const time = (await pool.query("SELECT NOW()")).rows[0].now;
res.end(`time: ${time}`);
} catch (e) {
console.error(e);
res.status(500).end("Error");
}
};

How can I verify I don't need the mLab add-on for my Heroku node.js app?

After reading through the mLab -> Atlas migration plan a few times, I decided I'd try a different way. My coding background is mainly asm on mcs51 so I'm something of a n00b in the node.js/mongo/heroku world. I barely understood half of the migration process.
So I wrote a small test app following this blog entry and then used what I'd learned to modify my actual app to talk to Atlas directly. I exported the collections from the old db to JSON, then imported them into the Atlas version to recreate the database. Everything appears to be working correctly; I don't see any data going into the old db and it looks like the new Atlas db is getting all the action.
But I'm leery of deleting the mLab add-on from Heroku until I've verified that it's truly not needed any more, because I'm pretty sure that I won't be able to recreate it if it turns out I've missed something.
So my question is, how can I ensure I'm no longer using the mLab add-on? I don't really understand what it was doing for me in the first place so I'm not sure how to verify I'm not using it any more.
Here are the relevant code snippets I'm using to access the Atlas db...
function myEncode(str) { // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent
return encodeURIComponent(str).replace(/[!'()*]/g, function(c) {
return '%' + c.charCodeAt(0).toString(16);
});
}
const ATLASURI = process.env.ATLASURI;
const ATLASDB = process.env.ATLASDB;
const ATLASUSER = process.env.ATLASUSER;
const ATLASPW = myEncode(process.env.ATLASPW); // wrapper needed to handle strong paswords...
const dbURL = "mongodb+srv://"+ATLASUSER+":"+ATLASPW+"#"+ATLASURI+"/"+ATLASDB+"?retryWrites=true&w=majority";
var GoogleStrategy = require('passport-google-oauth20').Strategy;
const {MongoClient} = require('mongodb');
const client = new MongoClient(dbURL, { useNewUrlParser: true, useUnifiedTopology: true });
var store = new MongoDBStore({uri: dbURL,collection: 'Sessions'});
var db = undefined;
client.connect(async function(err) {
if(err) {console.log("Error:\n"+String(err));}
db = await client.db(ATLASDB);
console.log("Connected to db!");
banner();
});

PouchDb in NodeJs: replication ceases after half an hour. Why?

I've developed a system with CouchDB 2.2.0 as the master database, PouchDB 7.0.0 in VueJS clients and a database monitor server using PouchDB under NodeJS 8.11.1.
I can change data in CouchDB using Fauxton and the browser and mobile (PWA) clients update quickly even if left running for days. This is NOT true of the server running PouchDB in NodeJS. It will faithfully respond to the same changes unless there are no changes for 20 minutes or more, after that it simply silently ignores any and all events in CouchDB
I am setting about preparing a skeletal implementation with NodeJS and Pouch and as few other dependencies as possible and will update this question if I discover something; in the meantime I would like to ask...
Is there some well known reason why this might be happening?
How can I track down the cause without starting from scratch and gradually rebuilding the complete app brick by brick until it fails?
Update 18-10-03
I seem to have solved the problem by using an fs writeStream instead of console.log, without really understanding why that should make a difference.
My complete test app looks like this :
const fs = require('fs');
const PouchDB = require('pouchdb');
const adptrMemory = require('pouchdb-adapter-memory');
var stream = fs.createWriteStream("/tmp/pouchLog", {flags:'a'});
const LG = (msg) => (stream.write(`${msg}
`));
const movesDB = process.env.LOCAL_DB;
LG(`Local :: ${movesDB}`);
LG(`Remote :: ${process.env.REMOTE_DB}`);
PouchDB.plugin(adptrMemory);
const movesDatabaseLocal = new PouchDB(movesDB);
const movesDatabaseRemote = new PouchDB(process.env.REMOTE_DB);
const repFromFilter = 'post_processing/by_new_inventory';
movesDatabaseLocal.replicate.from(movesDatabaseRemote, {
live: true,
retry: true,
filter: repFromFilter,
})
.on('change', (response) => {
LG(`${movesDB} *** NEW EXCHANGE REQUEST DELTA *** `);
LG(`Database replication from: ${response.docs.length} records.`);
})
.on('active', () => {
LG(`${movesDB} *** NEW EXCHANGE REQUEST REPLICATION RESUMED ***`);
})
.on('paused', () => {
LG(`${movesDB} *** NEW EXCHANGE REQUEST REPLICATION ON HOLD ***`);
})
.on('denied', (info) => {
LG(`${movesDB} *** NEW EXCHANGE REQUEST REPLICATION DENIED *** ${info}`);
})
.on('error', err => LG(`Database error ${err}`));
Note that I still have not built back all the original functionality. I can say that the failure after an idle period does occur if the above code uses console.log, but goes away after switching to streamed logging.

Is it necessary to close mongodb connection in nodejs?

I'm new to nodejs and mongodb. in mongodb native driver website they close connection after each request but it seems like to be very slow and problematic in high traffic websites. I'm just curious to know is it necessary to do that or I can declare a global variable and reference that to DB like this:
var mongodbClient = require('mongodb').MongoClient;
var db;
function connect() {
mongodbClient.connect('connection string', function (err, mdb) {
db = mdb;
});
}
connect();
function insert(query, collection, fn) {
db.collection(collection)
.insert(query, function (er, da) {
fn(er, da);
});
}
function find(query, collection, fn) {
db.collection(collection)
.find(query).toArray(function (er, da) {
fn(er, da);
});
}
I don't want to use mongoose and prefer to learn and understand what's going on under the hood.
The examples available in documentation are not actually good for real life use cases. If you are using a server framework you can normally connect to mongo and share reference to the connection throughout application. I use hapi and connect to server via a plugin which allows me to store the handle to open connection. This allows you to clean up on shutdown of server. Their are many modules for managing mongo such as mongoose, waterline or wadofgum-mongodb which I have recently written.

Resources