IndexedDB Thread Safety - multithreading

IndexedDB Thread Safety - multithreading

In my current webapp the display can contain multiple editable objects - the data for which is either fetched from the server (and then stored for future use)or picked up from the local IndexedDB objectstore.
This I have implemented and it works perfectly. However, now I want to go a step further - fetching data that are not available locally when the user needs to work with them is liable to break the rythm of work for the user.
So I am thinking of implementing a lookahead that gets server side data prior to the user wanting to work with them. The way this would work
When the app is launched I spawn a web worker that watches an entry, call it PreFetch, in an IndexedDB that it shares with the main app.
The user hovers over an editable item bearing, say, the HTML id, abcd1234.
In the app I add this id to the IndexedDB PreFetch key value - which is a comma separated list of ids.
The web worker periodically picks up the PreFetch CSV list, then resets it, and fetches those data that are not locally available and stores them in the objectstore.
IndexedDB is nice - no doubts about that. However, it is not clear to me that what I am planning - having two threads updating the same objectstore will not create a deadlock (or worse - bring the whole house crashing down around my ears).
Given the asynchronous nature of IndexedDB operations I am concerned about two kinds of issues
a. The main thread is writing the PreFetch key whilst the worker is deleting its contents.
b. The main thread attempts to fetch data from IndexedDB and decides "it is not there" while at the same time the worker has just fetched those data and is storing them.
The former is liable to defeat the purpose of doing worker driven data prefetches whilst the latter is liable to trigger unnecessary server traffic to fetch information that has already been fetched.
The former I can probably avoid by using localStorage to share the PreFetch list. The latter I cannot control.
My question then - are IndexedDB methods threadsafe? Googling for IndexedDB and thread safety has not yielded anything terribly useful other than one or two posts on this forum.
I have thought of a way to avoid this issue - the main thread and the worker both check a flag variable in localStorage prior to attempting to read/write the objectstore. However, it is not clear to me that I need to do this.

Javascript has only one thread, so it is thread safe.
Use transaction for synchronization lock creating an object store of task id as key and value of enum 'pending' , 'working', 'done'.
Producer thread create task with pending value if not existing in object store. Consumer thread take pending task to working and change done after finish. It should work.
You can use webstorage change event for sync lock among pages, but not with indexeddb.

Related

What's a good design for making sure the Node.js Event Loop isn't blocked when adding potentially hundreds of records?

I just read this article from Node.js: Don't Block the Event Loop
The Ask
I'm hoping that someone can read over the use case I describe below and tell me whether or not I'm understanding how the event loop is blocked, and whether or not I'm doing it. Also, any tips on how I can find this information out for myself would be useful.
My use case
I think I have a use case in my application that could potentially cause problems. I have a functionality which enables a group to add members to their roster. Each member that doesn't represent an existing system user (the common case) gets an account created, including a dummy password.
The password is hashed with argon2 (using the default hash type), which means that even before I get to the need to wait on a DB promise to resolve (with a Prisma transaction) that I have to wait for each member's password to be generated.
I'm using Prisma for the ORM and Sendgrid for the email service and no other external packages.
A take-away that I get from the article is that this is blocking the event loop. Since there could potentially be hundreds of records generated (such as importing contacts from a CSV or cloud contact service), this seems significant.
To sum up what the route in question does, including some details omitted before:
Remove duplicates (requires one DB request & then some synchronous checking)
Check remaining for existing user
For non-existing users:
Synchronously create many records & push each to a separate array. One of these records requires async password generation for each non-existing user
Once the arrays are populated, send a DB transaction with all records
Once the transaction is cleared, create invitation records for each member
Once the invitation records are created, send emails in a MailData[] through SendGrid.
Clearly, there are quite a few tasks that must be done sequentially. If it matters, the asynchronous functions are also nested: createUsers calls createInvites calls sendEmails. In fact, from the controller, there is: updateRoster calls createUsers calls createInvites calls sendEmails.

There are architectural patterns that are aimed at avoiding issues brought by potentially long-running operations. Note here that while your example is specific, any long running process would possibly be harmful here.
The first obvious pattern is the cluster. If your app is handled by multiple concurrent independent event-loops of a cluster, blocking one, ten or even thousand of loops could be insignificant if your app is scaled to handle this.
Imagine an example scenario where you have 10 concurrent loops, one is blocked for a longer time but 9 remaining are still serving short requests. Chances are, users would not even notice the temporary bottleneck caused by the one long running request.
Another more general pattern is a separated long-running process service or the Command-Query Responsibility Segregation (I'm bringing the CQRS into attention here as the pattern description could introduce more interesting ideas you could be not familiar with).
In this approach, some long-running operations are not handled directly by backend servers. Instead, backend servers use a Message Queue to send requests to yet another service layer of your app, the layer that is solely dedicated to running specific long-running requests. The Message Queue is configured so that it has specific throughput so that if there are multiple long-running requests in short time, they are queued, so that possibly some of them are delayed but your resources are always under control. The backend that sends requests to the Message Queue doesn't wait synchronously, instead you need another form of return communication.
This auxiliary process service can be maintained and scaled independently. The important part here is that the service is never accessed directly from the frontend, it's always behind a message queue with controlled throughput.
Note that while the second approach is often implemented in real-life systems and it solves most issues, it can still be incapable of handling some edge cases, e.g. when long-running requests come faster than they are handled and the queue grows infintely.
Such cases require careful maintenance and you either scale your app to handle the traffic or you introduce other rules that prevent users from running long processes too often.

What can cause "idle in transaction" for "BEGIN" statements

We have a node.js application that connects via pg-promise to a Postgres 11 server - all processes are running on a single cloud server in docker containers.
Sometimes we hit a situation where the application does not react anymore.
The last time this happened, I had a little time to check the db via pgadmin and it showed that the connections were idle in transaction with statement BEGIN and an exclusive lock of virtualxid
I think the situation is like this:
the application has started a transaction by sending the BEGIN sql command to the db
the db got this command and started a new transaction and thus acquired an exclusive lock of mode virtualxid
now the db waits for the application to send the next statement/s (until it receives COMMIT or ROLLBACK) - and then it will release the exclusive lock of mode virtualxid
but for some reason it does not get anymore statements:
I think that the node.js event-loop is blocked - because at the time, when we see these locks, the node.js application does not log anymore statements. But the webserver still gets requests and reported some upstream timed out requests.
Does this make sense (I'm really not sure about 2. and 3.)?
Why would all transactions block at the beginning? Is this just coincidence or is the displayed SQL maybe wrong?
BTW: In this answer I found, that we can set idle_in_transaction_session_timeout so that these transactions will be released after a timeout - which is great, but I try to understand what's causing this issue.

The transactions are not blocking at all. The database is waiting for the application to send the next statement.
The lock on the transaction ID is just a technique for transactions to block each other, even if they are not contending for a table lock (for example, if they are waiting for a row lock): each transaction holds an exclusive lock on its own transaction ID, and if it has to wait for a concurrent transaction to complete, it can just request a lock on that transaction's ID (and be blocked).
If all transactions look like this, then the lock must be somewhere in your application; the database is not involved.
When looking for processes blocked in the database, look for rows in pg_locks where granted is false.

Your interpretation is correct. As for why it is happening, that is hard to say. It seems like there is some kind of bug (maybe an undetected deadlock) in your application, or maybe in nodes.js or pg-promise. You will have to debug at that level.

As expected the problems were caused by our application code. Transactions were used incorrectly:
One of the REST endpoints started a new transaction right away, using Database.tx().
This transaction was passed down multiple levels, but one function in the chain had an error and passed undefined instead of the transaction to the next level
the lowest repository level function started a new transaction (because the transaction parameter was undefined), by using Database.tx() a second time
This started to fail, under heavy load:
The connection pool size was set to 10
When there were many simultaneous requests for this endpoint, we had a situation where 10 of the requests started (opened the outer transaction) and had not yet reached the repository code that will request the 2nd transaction.
When these requests reached the repository code, they request a new (2nd) connection from the connection-pool. But this call will block because there are currently all connections in use.
So we have a nasty application level deadlock
So the solution was to fix the application code (the intermediate function must pass down the transaction correctly). Then everything works.
Moreover I strongly recommend to set a sensible idle_in_transaction_session_timeout and connection-timeout. Then, even if such an application-deadlock is introduced again in future versions, the application can recover automatically after this timeout.
Notes:
pg-postgres before v 10.3.4 contained a small bug #682 related to the connection-timeout
pg-promise before version 10.3.5 could not reocver from an idle-in-transaction-timeout and left the connection in a broken state: see pg-promise #680
Basically there was another issue: there was no need to use a transaction - because all functions were just reading data: so we can just use Database.task() instead of Database.tx()

Cache model for often requesting items

I have a bunch of user-generated messages with timestamps, text messages, profile images respectively and other stuff. All clients (phones) who are using my Web API are able to request last messages then scroll them down and request oldest items. Obviously, top messages are hottest data in whole list. Obviously, I want to make a cache, which has caching policy and clear undestanding about new requested messages - are requsted messages hot, or not?
I created a stateless service with MemoryCache and now use it for my purposes. Is there are any underwater stones which I should take into account during my work with it? Except point, of course, that I have five nodes, and user is able to make a request to service which has no cache inside. In that case this service goes to data-layer-service then gets and loads some data from it.
UPD #1
Forgot mention that this list of messages updates time out of time with new entries.
UPD #2
I wrapped MemoryCache in IReliableDictionary implementation and palm off it under a stateful Service with my own StateManager implementation. Every time a request didn't find an item in the collection I go to the Azure Storage and retrieve actual data. After I had finished I realized that my experiment is not useful because there is no way for scaling such approach. I mean if my app has fixed partitioned Reliable Services working as cache, I do not have possibility to grow them up with upscaling my Service Fabric. In case of load increase after some time this fact hits me in my face :)
I still do not know how to make a cache for my super hot most readable messages more efficient way. And I still doubt in Reliable Actors approach. It creates a huge amount of replicated data.

I think this is an ideal use of an actor.
The actor will be garbage collected after a period of time, so data won't stay in memory.
One actor per user.

Proper methodology to make threads use central database connection

I'm building a multi-threaded service application in Delphi XE2. Each thread serves its own purpose apart from the other ones. The main service thread is only responsible for keeping the other threads going and saving a log file, etc. Each of these threads reports back to the main service thread through synchronized event triggers. These threads are created when the service starts and destroyed when the service ends.
I'd like to introduce a separate thread as a centralized database connection to avoid having to create many instances of TADOConnection. My service code can call standard functions such as UserListDataSet := DBThread.GetUserList(SomeUserListDataSet); or it would also be nice if I could send direct SQL statements like SomeDataSet := DBThread.Get(MySqlText);. I'd also like to avoid too many occasions of CoInitialize() etc.
The job threads will need to use this db thread. I need to figure out how to "ask" it for certain data, "wait" for a response, and "acquire" that response back in the thread which requested it. I'm sure there are many approaches to this, but I need to know which one is best suited for my scenario. Windows messages? Events? Should I have some sort of queue? Should it send data sets or something else? Is there already something that can do this? I need to figure out how to structure this DB thread in a way that it can be re-used from other threads.
The structure looks like this:
+ SvcThread
+ DBThread
+ TADOConnection
+ Thread1
+ Thread2
+ Thread3
I need threads 1 2 and 3 to send requests to the DBThread. When a thread sends any request to it, it needs to wait until it gets a response. Once there's a response, the DB Thread needs to notify the asking thread. Each of the threads might send a request to this DB Thread at the same time too.
A good tutorial on how to accomplish this would be perfect - it just needs to be a suitable fit for my scenario. I don't need to know just "how to make two threads talk together" but rather "how to make many threads talk to a centralized database thread". These job threads are created as children of the main service thread, and are not owned by the db thread. The db thread has no knowledge of the job threads.

Normally, you'd have a request queue where all the requests are stored. Your database thread reads a request from the queue, handles it, then invokes a callback routine specified by the requester to handle the result. Not sure how this maps to Delphi paradigms, but the basics should be the same.

Do any of the "requesting" threads have anything profitable that they could be doing while they are waiting for a response to be obtained from the database? If the answer is "no," as I suspect that it is quite likely to be, then perhaps you can simplify your situation quite a bit by eliminating the need for "a DB thread" completely. Perhaps all of the threads can simply share a single database-connection in turn, employing a mutual-exclusion object to cause them to "wait their turn."
Under this scenario, there would be one database-connection, and any thread which needed to use it would do so. But they would be obliged to obtain a mutex object first, hold on to the mutex during the time they were doing database queries, and then release the mutex so that the next thread could have its turn.
If you decide that it is somehow advantageous (or a necessity...) to dedicate a thread to managing the connection, then perhaps you could achieve the result using (a) a mutex to serialize the requests, as before; and (b) one event-object to signal the DB-thread that a new request has been posted, and (c) another event-object to signal the requester that the request has been completed.
In either case, if you have indeed determined that the requester threads have nothing useful that they could be doing in the meantime, you have the threads "simply sleeping" until their turn comes up. Then, they do their business, either directly or indirectly. There are no "queues," no complicated shared data-structures, simply because you have (say...) determined that there is no need for them.

I think using a DB connection pool would be a better fit for your problem. This would also allow you to scale your application later on without having to then create additional DB thread and then having to manage "load balancing" for those DB threads.
Since you are mentioning using TADOConnection please have a look at this implementation made by Cary Jensen http://cc.embarcadero.com/item/19975.
I am successfully using this DB connection pool in several applications. I have modified it in several ways, including using an ini file to control: maximum number of connections, cleanup time, timeout times etc.
Cary has written several articles that serves as documentation for it. One is here http://edn.embarcadero.com/article/30027.

Working with TADOQuery in thread

I'm writing the application, which connects to the DB and repetitively (1 minute interval) reads the data from a database. It's something like RSS feed reader, but with local DB. If the data reading fails, I try to reestablish the connection. I've designed it with TADOConnection and TADOQuery placed on the form (so with no dynamic creation). My aim is to keep the application "alive" from the user's point of view, so I placed the connection and the reading part into a single thread. The question is, how to do it best way ?
My design looks like this:
application start, the TADOConnection and TADOQuery are created along with the form
open connection in a separate thread (TADOConnection)
if the connection is established, suspend the connection thread, start the timer on the form, which periodically resumes another thread for data reading
if the reading thread succeeds, nothing happens and form timer keeps going, if it fails, the thread stops the timer and resume connection thread
Is it better to create TADOConnection or TADOQuery dynamically or it doesn't matter ? Is it better to use e.g. critical section in the threads or something (I have only one access to the component at the same time and only one thread) ?
Thanks for your suggestions

This question is fairly subjective, probably not subjective enough to get closed but subjective any way. Here's why I'd go for dynamically created ADO objects:
Keeps everything together: the code and the objects used to access the code. Using data access objects created on the form requires the Thread to have intimate knowledge of the Form's inner workings, that's never a good idea.
It's safer because you can't access those objects from other threads (including the main VCL thread). Sure, you're not planing on using those connections for anything else, you're not planning on using multiple threads etc, but maybe you'll some day forget about those restrictions.
It's future-proof. You might want to use that same thread from an other project. You might want to add an second thread accesing some other data to the same app.
I have a personal preference for creating data access objects dynamically from code. Yes, an subjective answer to a subjective question.

Run everything in the thread. Have a periodic timer in the thread that opens the DB connection, reads the data, "posts" it back to the main thread, and then disconnects. The thread needs to "sleep" while waiting for the time, e.g. on a Windows even that is signalled by the timer. The DB components, which are local and private to the thread, can be created inside the thread when thread executions starts (on application startup), and freed when thread execution finishes (on application shutdown). This will always work, regardless of whether the DB conncetion is temporarily available or not, and the main thread does not even have to communicate with the "DB thread". It is an architcture that I use all the time and is absolulutely bullet-proof.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string