When does pyramid commit zodb transaction? - pyramid

I followed the tutorial on http://docs.pylonsproject.org/docs/pyramid/en/latest/tutorials/wiki/index.html
I know that when I add or change persistent objects (in this case Page objects), the change will not be persisted until transaction.commit() is called. And in order to cancel changes, I can call transaction.abort().
In the tutorial, however, these calls are not shown in the view callables. I assume that there is some middleware in place that will catch exceptions and call .abort() or call .commit() just before sending the HTTP response, but I don't see any mention of it anywhere in the code or config files.
Could you point me in the right direction? I just need to know what happens behind the scenes, so I know if I need to add something myself

The pyramid_tm package is used; it installs a Tween that manages the transaction.
It simply starts a transaction for every request, and if the request was successful the transaction is committed, and aborted otherwise.
From the documentation:
At the beginning of a request a new transaction is started using the transaction.begin() function. Once the request has finished all of its works (ie views have finished running), a few checks are tested:
Did some a transaction.doom() cause the transaction to become “doomed”? if so, transaction.abort().
Did an exception occur in the underlying code? if so, transaction.abort()
If the tm.commit_veto configuration setting was used, did the commit veto callback, called with the response generated by the application, return a result that evaluates to True? if so, transaction.abort().
If none of these checks calls transaction.abort() then the transaction is instead committed using transaction.commit().
It'll also retry requests (re-start them from the beginning) if there was a retryable exception (such as a ZODB commit conflict):
When the transaction manager calls the downstream handler, if the handler raises a “retryable” exception, the transaction manager can be configured to attempt to call the downstream handler again with the same request, in effect “replaying” the request.
This behaviour is disabled by default; you can set the tm.attempts option to a number larger than 1 to enable it.

Related

Preventing Potential Race Condition in Calls to an API?

There's an API that my node.js server accesses quite a bit. It requires me to change my password every 3 months. Fortunately, there's also an API call for changing the password. :) I have a cron job that runs regularly and changes the password when necessary.
If my app is accessing the API at the exact time the password is being changed, there's a potential race condition and the API call could fail. What are some good patterns for dealing with this?
I could put all the API calls into a queue, and use a cron job to pull the most recent one off the queue and run it. If the API call fails, it would stay in the queue and get run next time the cron job runs. But that seems like it might be overkill.
I could use a try/catch handler with the API call, inside a while loop, and just run the while loop until the API call completes successfully. But that's going to block the rest of my app.
I could use a try/catch handler with the API call, inside a setTimeOut, and just re-run the setTimeOut until the API call completes successfully. This way the API call would only run when the main thread is done with other work and gets around to it. But would this be a mistake if the server is under heavy load?
Is there a better pattern for dealing with this sort of thing?
The try/catch handlers would lose data in the event of a server crash, so I went with the cron job/queue approach. I'm using a queue maintained as a table in my db, so that if something interrupts the server, nothing will be lost.

What can cause "idle in transaction" for "BEGIN" statements

We have a node.js application that connects via pg-promise to a Postgres 11 server - all processes are running on a single cloud server in docker containers.
Sometimes we hit a situation where the application does not react anymore.
The last time this happened, I had a little time to check the db via pgadmin and it showed that the connections were idle in transaction with statement BEGIN and an exclusive lock of virtualxid
I think the situation is like this:
the application has started a transaction by sending the BEGIN sql command to the db
the db got this command and started a new transaction and thus acquired an exclusive lock of mode virtualxid
now the db waits for the application to send the next statement/s (until it receives COMMIT or ROLLBACK) - and then it will release the exclusive lock of mode virtualxid
but for some reason it does not get anymore statements:
I think that the node.js event-loop is blocked - because at the time, when we see these locks, the node.js application does not log anymore statements. But the webserver still gets requests and reported some upstream timed out requests.
Does this make sense (I'm really not sure about 2. and 3.)?
Why would all transactions block at the beginning? Is this just coincidence or is the displayed SQL maybe wrong?
BTW: In this answer I found, that we can set idle_in_transaction_session_timeout so that these transactions will be released after a timeout - which is great, but I try to understand what's causing this issue.
The transactions are not blocking at all. The database is waiting for the application to send the next statement.
The lock on the transaction ID is just a technique for transactions to block each other, even if they are not contending for a table lock (for example, if they are waiting for a row lock): each transaction holds an exclusive lock on its own transaction ID, and if it has to wait for a concurrent transaction to complete, it can just request a lock on that transaction's ID (and be blocked).
If all transactions look like this, then the lock must be somewhere in your application; the database is not involved.
When looking for processes blocked in the database, look for rows in pg_locks where granted is false.
Your interpretation is correct. As for why it is happening, that is hard to say. It seems like there is some kind of bug (maybe an undetected deadlock) in your application, or maybe in nodes.js or pg-promise. You will have to debug at that level.
As expected the problems were caused by our application code. Transactions were used incorrectly:
One of the REST endpoints started a new transaction right away, using Database.tx().
This transaction was passed down multiple levels, but one function in the chain had an error and passed undefined instead of the transaction to the next level
the lowest repository level function started a new transaction (because the transaction parameter was undefined), by using Database.tx() a second time
This started to fail, under heavy load:
The connection pool size was set to 10
When there were many simultaneous requests for this endpoint, we had a situation where 10 of the requests started (opened the outer transaction) and had not yet reached the repository code that will request the 2nd transaction.
When these requests reached the repository code, they request a new (2nd) connection from the connection-pool. But this call will block because there are currently all connections in use.
So we have a nasty application level deadlock
So the solution was to fix the application code (the intermediate function must pass down the transaction correctly). Then everything works.
Moreover I strongly recommend to set a sensible idle_in_transaction_session_timeout and connection-timeout. Then, even if such an application-deadlock is introduced again in future versions, the application can recover automatically after this timeout.
Notes:
pg-postgres before v 10.3.4 contained a small bug #682 related to the connection-timeout
pg-promise before version 10.3.5 could not reocver from an idle-in-transaction-timeout and left the connection in a broken state: see pg-promise #680
Basically there was another issue: there was no need to use a transaction - because all functions were just reading data: so we can just use Database.task() instead of Database.tx()

How to get the status of all requests to one API in nodejs

I want to get API server status in nodejs. I'm using nodejs to open an interface: "api/request?connId=50&timeout=90". This API will keep the request running for provided time on the server side. After the successful completion of the provided time it should return status/OK. And when we have multiple connection ids & timeout, we want the API return all the running requests on the server with their time left for completion, something like below, where 4 and 8 are the connId and 25 and 15 is the time remaining for the requests to complete (in seconds):
{"4":"25","8":"15"}
please help.
Node.js server uses async model in one single thread, which means at any time, only one request (connId) is under execution by Node (except... you have multiple node.js instance, but let's keep the scenario simple and ignore this case).
When one request is processed (running its handler code), it may start an async task such as read a file, and continue execution. The request itself's handler code would be executed without waiting for async task, and when this handler code is finished running, from Node.js point of view, the request handling itself is done -- the handling of async task's result is another thing in another time, node does not care about the progress of it.
Thus, in order to return remaining time of all requests -- I guess this is the remaining time of other request's async task, because remaining time of other request's handler code execution does not make any sense, there must be some place to store the information of all requests, including:
request's connId and startTime (the time when request is received).
request's timeout value, which is passed as parameter in URL.
request's estimated remaining time, this information is mission specific and must be retrieved from other async task related services (you can pull time by time using setInterval or make other services push the latest remaining time). Node.js doesn't know the remaining time information of any async task.
In this way, you can track all running requests and their remaining time. Before one request is returned, you can check the above "some place" to calculate all requests' remaining time. This "some place" could be global variable, memory database such as Redis, or even a plain database such as MySQL.
Please note: the calculated remaining time would not be accurate, as the read&calculation itself would cost time and introduce error.

What happens internally on knex.transaction()

I would like to add a knex transaction to my request parameter from a middleware for every incoming request.
How is the performance of knex.transaction()? Does it do something costly like opening a database connection?
Thanks in advance! :)
Knex transactions are opened eagerly. So when you call knex.transaction it immediately reserves database connection from the pool, even if you are not sending any queries there.
Also if you are creating that implicitly opened transaction, remember to make sure that it will get committed / rolled back. Otherwise they will remain alive after request is handled and fill up the connection pool.
So instead of always opening transaction you might consider exposing req.trx()middleware, which will create singleton transaction lazily when you call it for the first time...

Commit protocol

I'm building a REST web service that receives a request and must return "Ok" if the operation was done correctly. How could I deal with the possibility of the loose of the connection while returning this "Ok" message?
For example, a system like Amazon SimpleDB.
1) It receives a request.
2) Process the request (store and replicates the content).
3) Return a confirmation message.
If the connection was lost between phases 2 and 3, the client thinks the operation was not successful then submits again.
Thanks!
A system I reviewed earlier this year had a process similar to this. The solution they implemented was to have the client reply to the commit message, and clear a flag on the record at that point. There was a periodic process that checked every N minutes, and if an entry existed that was completed, but that the client hadn't acknowledged, that transaction was rolled back. This allowed a client to repost the transaction, but not have 2 'real' records committed on the server side.
In the event of the timeout scenario, you could do the following:
Send a client generated unique id with the initial request in a header.
If the client doesn't get a response, then it can resend the request with the same id.
The server can keep a list of ids successfully processed and return an OK, rather than repeating the action.
The only issue with this is that the server will need to eventually remove the client ids. So there would need to be a time window for the server to keep the ids before purging them.
Depends on the type of web service. The whole nature of HTTP and REST is that it's basically stateless.
e.g. In the SimpleDB case, if you're simply requesting a value for a given key. If in the process of returning it the client connection is dropped then the client can simply re-request the data at a later time. That data is likely to have been cached by the db engine or the operating system disk cache anyway.
If you're storing or updating a value and the data is identical then quite often the database engines know the data hasn't changed and so the update won't take very long at all.
Even complex queries can run quicker the second time on some database engines.
In short, I wouldn't worry about it unless you can prove there is a performance problem. In which case, start caching the results of some recent queries yourself. Some REST based frameworks will do this for you. I suspect you won't even find it to be an issue in practice though.

Resources