What can cause "idle in transaction" for "BEGIN" statements - node.js

We have a node.js application that connects via pg-promise to a Postgres 11 server - all processes are running on a single cloud server in docker containers.
Sometimes we hit a situation where the application does not react anymore.
The last time this happened, I had a little time to check the db via pgadmin and it showed that the connections were idle in transaction with statement BEGIN and an exclusive lock of virtualxid
I think the situation is like this:
the application has started a transaction by sending the BEGIN sql command to the db
the db got this command and started a new transaction and thus acquired an exclusive lock of mode virtualxid
now the db waits for the application to send the next statement/s (until it receives COMMIT or ROLLBACK) - and then it will release the exclusive lock of mode virtualxid
but for some reason it does not get anymore statements:
I think that the node.js event-loop is blocked - because at the time, when we see these locks, the node.js application does not log anymore statements. But the webserver still gets requests and reported some upstream timed out requests.
Does this make sense (I'm really not sure about 2. and 3.)?
Why would all transactions block at the beginning? Is this just coincidence or is the displayed SQL maybe wrong?
BTW: In this answer I found, that we can set idle_in_transaction_session_timeout so that these transactions will be released after a timeout - which is great, but I try to understand what's causing this issue.

The transactions are not blocking at all. The database is waiting for the application to send the next statement.
The lock on the transaction ID is just a technique for transactions to block each other, even if they are not contending for a table lock (for example, if they are waiting for a row lock): each transaction holds an exclusive lock on its own transaction ID, and if it has to wait for a concurrent transaction to complete, it can just request a lock on that transaction's ID (and be blocked).
If all transactions look like this, then the lock must be somewhere in your application; the database is not involved.
When looking for processes blocked in the database, look for rows in pg_locks where granted is false.

Your interpretation is correct. As for why it is happening, that is hard to say. It seems like there is some kind of bug (maybe an undetected deadlock) in your application, or maybe in nodes.js or pg-promise. You will have to debug at that level.

As expected the problems were caused by our application code. Transactions were used incorrectly:
One of the REST endpoints started a new transaction right away, using Database.tx().
This transaction was passed down multiple levels, but one function in the chain had an error and passed undefined instead of the transaction to the next level
the lowest repository level function started a new transaction (because the transaction parameter was undefined), by using Database.tx() a second time
This started to fail, under heavy load:
The connection pool size was set to 10
When there were many simultaneous requests for this endpoint, we had a situation where 10 of the requests started (opened the outer transaction) and had not yet reached the repository code that will request the 2nd transaction.
When these requests reached the repository code, they request a new (2nd) connection from the connection-pool. But this call will block because there are currently all connections in use.
So we have a nasty application level deadlock
So the solution was to fix the application code (the intermediate function must pass down the transaction correctly). Then everything works.
Moreover I strongly recommend to set a sensible idle_in_transaction_session_timeout and connection-timeout. Then, even if such an application-deadlock is introduced again in future versions, the application can recover automatically after this timeout.
Notes:
pg-postgres before v 10.3.4 contained a small bug #682 related to the connection-timeout
pg-promise before version 10.3.5 could not reocver from an idle-in-transaction-timeout and left the connection in a broken state: see pg-promise #680
Basically there was another issue: there was no need to use a transaction - because all functions were just reading data: so we can just use Database.task() instead of Database.tx()

Related

Do I really need to call client.shutdown() when finished with Cassandra in Node.js script?

I've been trying to find information about Cassandra sessions relating to the Node.js cassandra-driver by Datastax. I read something which said that cassandra-driver automatically manages a session and that I don't need to call client.shutdown().
I'm looking for general information about how cassandra-driver manages sessions, how can I see all active Cassandra sessions, and do I need to call shutdown() or is that counter productive having to reopen a session every time the script is run?
Based on "pm2 info" I don't see a ton of active handles so I don't think anything wrong is going on but I may be mistaken. Ram usage does seem a bit high for a small script (85mb).
In the DataStax drivers, Session is a stateful object handling a pool of connections and aware of the status of nodes in the Cluster at any time (avoiding sending request to unavailable node). TCP sockets are opened and it is a best practice to close when you don't need it anymore. See here to get more infos : https://docs.datastax.com/en/developer/nodejs-driver-dse/2.1/features/connection-pooling/
Now session.connect() may takes a bit of time: the more nodes you have in your cluster, the longer it will be to open connections to every single one. This is the reason why, it is better to init connections in a "cold start" when you work with FAAS (avoiding to open/close for each request)
So:
Always close your connections (shutdown()) when you don't need it anymore (shutdown hook in your applications)
Keep your connections "alive" as long as you need it, do not shutdown for each request, this is NOT stateless.
yes, it is "better" to connect the client outside of the handler function. to keep it state-Full.
however, AWS Lambda with nodeJS, by default function execution continues until the event loop is empty or the function times out.
create the client outside of handler, set the context.callbackWaitsForEmptyEventLoop = false and don't call client.shutdown.

Clustered socket.io server hangs

I'm writing a socket.io based server in Node.js (6.9.0). I am using the builtin cluster module to enable multiple processes. For now, there is only two process: a master and a worker. The master receives the connections and maintains an in-memory global data structure (which the worker can query via IPC). The worker process does the majority of work by handling each incoming connection.
I am finding a hanging condition that I cannot attribute to any internal failure when the server is stressed at 300 concurrent users. Under lower concurrency, I don't see the hanging condition.
I'm enabling all forms of debugging (using the debug module: socket.io:socket, socket.io:client as well as my own custom calls to debug).
The last activity I can see is in socket.io, however, the messages indicate that sockets are closing ("reason client namespace disconnect") due to their own "end of test" cycle. It just seems like incoming connections are not be serviced.
I'm using Artillery.io as the test client.
In the server application, I have handlers for uncaught exceptions and try-catch blocks around everything.
In a prior iteration, I also used cluster, but reversed the responsibilities so that the master process handled the connections (with the worker handling global data). That didn't exhibit the same failure. Not sure if something is wrong with the connection distribution. For that, I have also dumped internalMessage events to monitor the internal workings of cluster.
I am not using any other module for connection distribution or sticky sessions. As there is only a single process handling connections (at this time), it doesn't seem relevant.
I was able to remove the hanging condition by changing the cluster scheduling policy from Round Robin (SCHED_RR) to None, which is OS specific (SCHED_NONE). I can't tell whether this is due to a bug in connection distribution (or something else inherent in the scheduling policy), but this one change seems to prevent the hanging condition.

What happens internally on knex.transaction()

I would like to add a knex transaction to my request parameter from a middleware for every incoming request.
How is the performance of knex.transaction()? Does it do something costly like opening a database connection?
Thanks in advance! :)
Knex transactions are opened eagerly. So when you call knex.transaction it immediately reserves database connection from the pool, even if you are not sending any queries there.
Also if you are creating that implicitly opened transaction, remember to make sure that it will get committed / rolled back. Otherwise they will remain alive after request is handled and fill up the connection pool.
So instead of always opening transaction you might consider exposing req.trx()middleware, which will create singleton transaction lazily when you call it for the first time...

What assumptions can I make when using Redis with a single connection from a NodeJS server

I'm using SocketIO on a NodeJS instance with a single connection to a Redis cache. This cache is being used as a means to maintain state in a real time environment.
My premises include that concurrency issues will likely occur due to high volume of transactions occurring, however, I'm not sure exactly which concurrency issues I need to account for...
My initial design implements using Lua scripts and EVAL (a script called with EVAL is considered an atomic transaction to Redis) in order enable checks on the state of a given key, but aside from this I'm not sure if I need to implement locks anywhere else.
The main concern I have is when SocketIO catches a connection and subsequently an event to execute, what can I guarantee about the Redis EVAL that happens in that event. A specific use case:
1) Client A emits an event that is caught by the server
2) Server executes requested event, which includes a call to EVAL a Lua script on Redis
3) Client B emits an event that is caught by the server
4) Server executes requested event, which includes a call to the EVAL a different Lua script on Redis
Due to the asynchronous nature of NodeJS, am I able to assume that the EVAL from Client A will always be received by the Redis server before Client B's? Am I understanding the event loop completely wrong?
The answer completely depends on your code. Basically yes, for such type of events nodejs will process them in order they appear on the event loop queue.
However, you say that request processing includes a call to EVAL, that means if your processing includes other I/O (like querying persistent database), the order of steps in processing request from a Client A can interleave with steps in processing requests from Client B.
In general, you should try to avoid creating a logic that breaks on concurrency if possible. If something needs to be done in the exact same order, consider creating a processing queues (global queue where next item can be processed only after previous was completed) or localise critical parts in atomic sequences (like LUA script).

JDBC: Can I share a connection in a multithreading app, and enjoy nice transactions?

It seems like the classical way to handle transactions with JDBC is to set auto-commit to false. This creates a new transaction, and each call to commit marks the beginning the next transactions.
On multithreading app, I understand that it is common practice to open a new connection for each thread.
I am writing a RMI based multi-client server application, so that basically my server is seamlessly spawning one thread for each new connection.
To handle transactions correctly should I go and create a new connection for each of those thread ?
Isn't the cost of such an architecture prohibitive?
Yes, in general you need to create a new connection for each thread. You don't have control over how the operating system timeslices execution of threads (notwithstanding defining your own critical sections), so you could inadvertently have multiple threads trying to send data down that one pipe.
Note the same applies to any network communications. If you had two threads trying to share one socket with an HTTP connection, for instance.
Thread 1 makes a request
Thread 2 makes a request
Thread 1 reads bytes from the socket, unwittingly reading the response from thread 2's request
If you wrapped all your transactions in critical sections, and therefore lock out any other threads for an entire begin/commit cycle, then you might be able to share a database connection between threads. But I wouldn't do that even then, unless you really have innate knowledge of the JDBC protocol.
If most of your threads have infrequent need for database connections (or no need at all), you might be able to designate one thread to do your database work, and have other threads queue their requests to that one thread. That would reduce the overhead of so many connections. But you'll have to figure out how to manage connections per thread in your environment (or ask another specific question about that on StackOverflow).
update: To answer your question in the comment, most database brands don't support multiple concurrent transactions on a single connection (InterBase/Firebird is the only exception I know of).
It'd be nice to have a separate transaction object, and to be able to start and commit multiple transactions per connection. But vendors simply don't support it.
Likewise, standard vendor-independent APIs like JDBC and ODBC make the same assumption, that transaction state is merely a property of the connection object.
It's uncommon practice to open a new connection for each thread.
Usually you use a connection pool like c3po library.
If you are in an application server, or using Hibernate for example, look at the documentation and you will find how to configure the connection pool.
The same connection object can be used to create multiple statement objects and these statement objects can then used by different threads concurrently. Most modern DBs interfaced by JDBC can do that. The JDBC is thus able to make use of concurrent cursors as follows. PostgreSQL is no exception here, see for example:
http://doc.postgresintl.com/jdbc/ch10.html
This allows connection pooling where the connection are only used for a short time, namely to created the statement object and but after that returned to the pool. This short time pooling is only recommended when the JDBC connection does also parallelization of statement operations, otherwise normal connection pooling might show better results. Anyhow the thread can continue work with the statement object and close it later, but not the connection.
1. Thread 1 opens statement
3. Thread 2 opens statement
4. Thread 1 does something Thread 2 does something
5. ... ...
6. Thread 1 closes statement ...
7. Thread 2 closes statement
The above only works in auto commit mode. If transactions are needed there is still no need to tie the transaction to a thread. You can just partition the pooling along the transactions that is all and use the same approach as above. But this is only needed not because of some socket connection limitation but because the JDBC then equates the session ID with the transaction ID.
If I remember well there should be APIs and products around with a less simplistic design, where teh session ID and the transaction ID are not equated. In this APIs you could write your server with one single database connection object, even when it does
transactions. Will need to check and tell you later what this APIs and products are.

Resources