Nodejs application hangs on heavy requests - node.js

I am using, Nodejs express server with pg-promise. I have some queries in the database which takes alot of time to return result. For such queries I set a timeout for 3sec which fails the promise, if the query pg-promise query takes longer and the server returns an error. However, the issue is that if I send subsequent requests with same (heavy) queries, the application hangs and takes time to start processing the new request. It doesnot throw any error, that is why it is difficult to debug. I was wondering what can be the reason for the node application to hang?

Whenever somebody comes up with a question about queries execution taking too long at the very start, it always points at the misunderstanding of the fundamentals around development and implementation of database services.
Those issue typically root from the following problems:
Bad database design, or lack of essential performance considerations
Bad query execution planning, i.e. use of very inefficient query logic
Bad use of the connection pool, i.e. the database connectivity issues
Combinations of the above
So when you are trying address such a huge pool of possible problems with a brief problem description, and without any code examples, you will never get any usable answer. It is far too broad, and it would require to cover too many topics pertaining to writing database services.

Related

Entity framework core stress testing is slow

I build a .net core 2.1 application with EF core.
I have use Transaction with read uncommitted isolation level.
I build the async API and create a simple ef query async (get 5 fields of first user, not reference to other table).
[query user][1]
When i create a single request, the query take small time
When i stress test with 10 threads, ramp-up: 5, loop forever (using jmeter), the query time is same
However, when i stress test to the api using jmeter (100 threads, ramp-up: 20s, loop forever), some query take small time, some query take large time (maybe 5s, 10s, 25s ...), another query throw connection timeout exception
what should i do?
Issue resolved: Take some days to investigating, i tried with this solution and it's working well. So, i will share it on this post, if you have other solutions to increase the performance, pls tell me about it.
Creating database connections is an expensive process that takes time. You can specify that you want a minimum pool of connections that should be created and kept open for the lifetime of the application. These are then reused for each database call.
Should use transaction isolation level "Read Uncommitted"
Should use the same Database Connection for multiple operations on one request
All APIs, methods should be Async method, make sure do not mixing Async with Sync.
Thanks all !!!
First using JMeter, run your test in NON GUI mode to ensure you don't have wrong results and follow best-practices, see:
https://www.ubik-ingenierie.com/blog/jmeter_performance_tuning_tips/
Once you confirmed issues are real, check multiple things:
No N+1 Select issue (loops of queries)
Granularity of retrieved data, are you retrieving too much data
performances of SQL queries issued by looking at DB ?
Pool size
See some interesting blogs:
http://www.progware.org/Blog/post/Slow-Performance-Is-it-the-Entity-Framework-or-you.aspx
https://www.thereformedprogrammer.net/entity-framework-core-performance-tuning-a-worked-example/
https://medium.com/#hoagsie/youre-all-doing-entity-framework-wrong-ea0c40e20502

How can I instrument and log my KnexJS transactions?

I have a serious problem in production causing the application to become unresponsive and output the following error:
Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
A running hypothesis is some operations are holding onto long-running Knex transactions. Enough of them to reach the pool size, basically.
Is there a way to query the KnexJS API for how many pool connections are in use at any one time? Unfortunately since KnexJS occupies the max pool settings from the config, it can be hard to know how many are actually in use. From the postgres end, it seems like KnexJS is idling on all of its connections when they are not in use.
Is there a good way to instrument Knex transaction and transacting with some kind of middleware or hook? Another useful thing is to log the callstack of any transaction (or any longer than, say, 7 seconds). One challenge is I have calls to Knex transaction and transacting throughout my project. Maybe it's a long shot.
Any advice is greatly appreciated.
System Information
KnexJS version: 0.12.6 (we will update in the next month)
Database + version: Postgres 9.6
OS: Heroku Linux (Ubuntu?)
Easiest was to see whats happening on connection pool level is to run knex with DEBUG=knex:* environment variable set, which will print quite a lot debug info whats happening inside knex. Those logs shows for example when connections are fetched from pool and returned to there and every ran query too.
There are couple of global events that you can use to hookup to every query, but there is not any for hooking to transactions. Here is related question where I have written some example code how to actually measure transaction durations with query hooks though: Tracking DB querying time - Bookshelf/knex It probably leaks some memory, so its not very production ready solution, but for your debugging purposes it might be helpful.

Connection pool using pg-promise

I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
I also read that "more than 100 clients at a time is a very bad thing" (node-postgres).
I'm using pg-promise and wanted to know:
what is the recommended poolSize for a very big load of data.
what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)?
Does Postgres handles the order and makes the 101 request wait until it can run it?
I'm the author of pg-promise.
I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
There are several levels of optimization for database communications. The most important of them is to minimize the number of queries per HTTP request, because IO is expensive, so is the connection pool.
If you have to execute more than one query per HTTP request, always use tasks, via method task.
If your task requires a transaction, execute it as a transaction, via method tx.
If you need to do multiple inserts or updates, always use multi-row operations. See Multi-row insert with pg-promise and PostgreSQL multi-row updates in Node.js.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
node-postgres started using pg-pool from version 6.x, while pg-promise remains on version 5.x which uses the internal connection pool implementation. Here's the reason why.
I also read that "more than 100 clients at a time is a very bad thing"
My long practice in this area suggests: If you cannot fit your service into a pool of 20 connections, you will not be saved by going for more connections, you will need to fix your implementation instead. Also, by going over 20 you start putting additional strain on the CPU, and that translates into further slow-down.
what is the recommended poolSize for a very big load of data.
The size of the data got nothing to do with the size of the pool. You typically use just one connection for a single download or upload, no matter how large. Unless your implementation is wrong and you end up using more than one connection, then you need to fix it, if you want your app to be scalable.
what happens if poolSize = 100 and the application gets 101 request simultaneously
It will wait for the next available connection.
See also:
Chaining Queries
Performance Boost
what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)? Does Postgres handles the order and makes the 101 request wait until it can run it?
Right, the request will be queued. But it's not handled by Postgres itself, but by your app (pg-pool). So whenever you run out of free connections, the app will wait for a connection to release, and then the next pending request will be performed. That's what pools are for.
what is the recommended poolSize for a very big load of data.
It really depends on many factors, and no one will really tell you the exact number. Why not test your app under huge load and see in practise how it performs, and find the bottlenecks.
Also I find the node-postgres documentation quite confusing and misleading on the matter:
Once you get >100 simultaneous requests your web server will attempt to open 100 connections to the PostgreSQL backend and 💥 you'll run out of memory on the PostgreSQL server, your database will become unresponsive, your app will seem to hang, and everything will break. Boooo!
https://github.com/brianc/node-postgres
It's not quite true. If you reach the connection limit at Postgres side, you simply won't be able to establish a new connection until any previous connection is closed. Nothing will break, if you handle this situation in your node app.

MongoDB + NodeJS: MapReduce or manual calculation

I am creating a REST API in NodeJS that connects to MongoDB does a MapReduce and store the results on a different collection.
The code is pretty simple. It takes a User ID, gets all other users who are related to this user somehow using some algorithm, and then for each one, calculate a likeness percentage. Assuming there are 50k users in the test database, this MapReduce takes around 200-800ms. And that is ideal for me. If this were to get famous and have hundreds of concurrent requests like this, I'm pretty sure that will not be the case any more. I understand that MongoDB might need to be sharded as needed.
The other scenario is to just do a normal find(), loop over the cursor and do the same logic. It takes the same amount of time as MapReduce mind you. However, I just thought about this to try and put the heavy lifting of the calculations on the client side (NodeJS) and not on the server side like MapReduce. Does this idea even have merit? I thought that this way, I can scale APIs horizontally behind a load balancer or something.
It would be better to keep heavy lifting off of the server which processes each request and put it onto the database.
If you have 1000 requests and 200 of them require you to perform the calculation, 800 requests can be processed as normal by the server, so long as mongo does the the calculation with mapReduce or aggregation.
If you instead run the calculations manually on your node server, all requests will be affected by the server having to do the heavy lifting.
Mongo is also quite efficient at aggregation for sure and mapReduce also I would imagine.
I recently moved a ton of logic from my server onto mongoDB where I could and it made a world of difference.

When is blocking code acceptable in node.js?

I know that blocking code is discouraged in node.js because it is single-threaded. My question is asking whether or not blocking code is acceptable in certain circumstances.
For example, if I was running an Express webserver that requires a MongoDB connection, would it be acceptable to block the event loop until the database connection was established? This is assuming that all pages served by Express require a database query (which would fail if MongoDB was not initialized).
Another example would be an application that requires the contents of a configuration file before being initializing. Is there any benefit in using fs.readFile over fs.readFileSync in this case?
Is there a way to work around this? Is wrapping all the code in a callback or promise the best way to go? How would that be different from using blocking code in the above examples?
It is really up to you to decide what is acceptable. And you would do that by determining what the consequences of blocking would be ... on a case-by-case basis. That analysis would take into account:
how often it occurs,
how long the event loop is likely to be blocked, and
the impact that blocking in that context will have on usability1.
Obviously, there are ways to avoid blocking, but these tend to add complexity to your application. Really, you need to decide ... on a case-by-case basis ... whether that added complexity is warranted.
Bottom line: >>you<< need to decide what is acceptable based on your understanding of your application and your users.
1 - For example, in a game it would be more acceptable to block the UI while switching "levels" than during active play. Or for a general web service, "once off" blocking while a config file is loaded or a DB connection is established during webserver startup is more acceptable that if this happened on every request.
From my experience most tasks should be handled in a callback or by returning a promise. You DO NOT want to block code in a Node application. That's what makes it so nice! Mostly with MongoDB it will crash before it has a chance to connect if there is no connection. It won't' really have an effect on an API call because your server will be dead!
Source: I'm a developer at a bootcamp that teaches MEAN stack.
Your two examples are completely different. The distinction actually answers the question in and of itself.
Grabbing data from a database is dependent on being connected to that database. Any code that is dependent upon that data is then dependent upon that connection. These things have to happen serially for the app to function and be meaningful.
On the other hand, readFileSync will block ALL code, not just code that is reliant on it. You could start reading a csv file while simultaneously establishing a database connection. Once both are done, you could add that csv data to the database.

Resources