SQLite: Fsyncing with journal_mode = wal and synchronous = normal

SQLite: Fsyncing with journal_mode = wal and synchronous = normal - linux

I have an interesting issue using SQLite 3.7.13 on Debian 7.
I'm using SQLite with "PRAGMA journal_mode = wal" and "PRAGMA synchronous = normal" to try and avoid fsyncs in the main event loop of a Python program. As suggested by the documentation I have disabled automatic checkpoints and I am periodically running "PRAGMA wal_checkpoint" in a different thread to sync the most recent data to disk.
This is working however I have found that the first insert operation after a checkpoint in the main program thread is causing a one-off fsync call for the WAL file itself. Any further insert operations do not cause an fsync. I have verified this using strace.
Looking at the SQLite docs it says:
WAL file header is synchronized when a WAL file begins to be reused after a checkpoint
I'm looking for a way to prevent any fsyncs occurring in the main thread, but that still allows me to perform periodic checkpoints from another thread. Is there something more that can be done in the check point thread to avoid the fsync in the main thread?
I have looked at "synchronous = off", however that also blocks the fsyncs for the checkpoints.
Note: I have a separate connection to the database for each of the two threads (in case that is relevant)
Further note: the documentation seems to say elsewhere that there shouldn't be an fsync, but observed behavior obviously differs:
Note that with PRAGMA synchronous set to NORMAL, the checkpoint is the only operation to issue an I/O barrier or sync operation (fsync() on unix or FlushFileBuffers() on windows). If an application therefore runs checkpoint in a separate thread or process, the main thread or process that is doing database queries and updates will never block on a sync operation.
Thanks

To move that fsync into the other thread, do the first operation that changes the database in that thread.
You can use something harmless like PRAGMA user_version = 42 (assuming you don't use the user version).

Related

File writing from multiple threads.

I have an application A which calls another application B which does some calculation and writes to a file File.txt
A invokes multiple instances of B through multiple threads and each instances tries to write to same file File.txt
Here comes the actual problem :
Since multiple threads tries to access the same file , the file access throws out which is common.
I tried an approach of using a concurrent queue in a singleton class and each instances of B adds to the queue And another thread in this class takes care of dequeing the items from queue and writes to the file File.txt. The queue is fetched synchronously and write operation succeeded . This works fine .
If I have too many threads and too many items in queue the file writing works but if for some reason my queue crashes or stops abruptly all the information which is supposed to be written to file is lost .
If I make the file writing synchronous from the B without using the queue then it will be slow as it needs to check for file locking but here there are less chances of data being missed as after B immediately writes to file.
What could be there best approach or design to handle this scenario? I don't need the response after file writing is completed . I can't make B wait for the file writing to be completed.
Would async await file writing could be of any use here ?

I think what you've done is the best that can be done. You may have to tune your producer/consumer queue solution if there are still problems, but it seems to me that you've done rather well with this approach.
If an in-memory queue isn't the answer, perhaps externalizing that to a message queue and a pool of listeners would be an improvement.
Relational databases and transaction managers are born to solve this problem. Why continue with a file based solution? Is it possible to explore an alternative?

is there a better approach or design to handle this scenario?
You can make each producer thread write to it's own rolling file instead of queuing the operation. Every X seconds the producers move to new files and some aggregation thread wakes up, read the previous files (of each producer) and writes the results to the final File.txt output file. No read / write locks are required here.
This ensures safe recovery since the rolling files exist until you process and delete them.
This also mean that you always write to disk, which is much slower than queuing tasks in memory and write to disk in bulks. But that's the price you pay for consistency.
Would async await file writing could be of any use here ?
Using asynchronous IO has nothing to do with this. The problems you mentioned were 1) shared resources (the output file) and 2) lack of consistency (when the queue crash), none of which async programming is about.
Why the async is in picture is because I dont want to delay the existing work by B because of this file writing operation
async would indeed help you with that. Whatever pattern you choose to implement (to solve the original problem) it can always be async by merely using the asynchronous IO api's.

Sqlite : Modifying locking criteria inside begin - commit

As per sqlite documentation, when we are using deferred transaction using begin - commit, database is locked since the first write.
And most probably this lock is there till the transaction is commited. So If I did begin and did the first write, and commit comes 180 seconds later, my database is locked till this time.Hence, I cannot do write operations till this time from another thread.
Is there any way that I can tell Sqlite to not hold locks till the commit and acquire locks only when its writing within the transaction? So that I have some chances of concurrent writing from another thread during that transaction. Or is there any solution?
I am using C Sqlite library in an embedded environment.

Allowing others to write data that you are reading would results in inconsistent data.
To allow a writer and readers at the same time, enable WAL mode.

Understanding the Event-Loop in node.js

I've been reading a lot about the Event Loop, and I understand the abstraction provided whereby I can make an I/O request (let's use fs.readFile(foo.txt)) and just pass in a callback that will be executed once a particular event indicates completion of the file reading is fired. However, what I do not understand is where the function that is doing the work of actually reading the file is being executed. Javascript is single-threaded, but there are two things happening at once: the execution of my node.js file and of some program/function actually reading data from the hard drive. Where does this second function take place in relation to node?

The Node event loop is truly single threaded. When we start up a program with Node, a single instance of the event loop is created and placed into one thread.
However for some standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely. So they will not block the main loop or event loop. Instead they make use of something called a thread pool that thread pool is a series of (by default) four threads that can be used for running computationally intensive tasks. There are ONLY FOUR things that use this thread pool - DNS lookup, fs, crypto and zlib. Everything else execute in the main thread.

"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js." via http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/

Its like using, setTimeout(function(){/*file reading code here*/},1000);. JavaScript can run multiple things side by side like, having three setInterval(function(){/*code to execute*/},1000);. So in a way, JavaScript is multi-threading. And for actually reading from/or writing to the hard drive, in NodeJS, if you use:
var child=require("child_process");
function put_text(file,text){
child.exec("echo "+text+">"+file);
}
function get_text(file){
//JQuery code for getting file contents here (i think)
return JQueryResults;
}
These can also be used for reading and writing to/from the hard drive using NodeJS.

postgresql concurrent queries debug

There is a multithreaded application executing some PL/pgsql function. That function produces record inserts to a critically important resource( table ). Also it executes some select/update/etc operations while executing.
The issue is, sometimes we face duplicate( 2-3 ) records each one passed to the function in a parallel thread. And they all are inserted into table as a function execution result, while they should not.
It happens, because both transactions are executed in parallel, and have no idea that the same record is being prepared to insert in a parallel transaction.
The table is critically important and all kinds of LOCK TABLE are extremely not welcomed (LOCK FOR SHARE MODE meanwhile gave as some useful experience).
So, the question is, is there any best practice how to organize PL/pgsql function working with a critical resource (table) to be executed by multithreaded app and producing no harmful locks on this resource?
PS. I know, that some thread partinioning by record.ID in the app is a possible solution. But I.m interested in a PL/pgsql solution first of all.

Sometimes you can use a advisory locks - http://www.postgresql.org/docs/current/static/explicit-locking.html .With these locks some subset of numbers. I used it for synchronization of parallel inserts with success.

Thread-safety and concurrent modification of a table in SQLite3

Does thread-safety of SQLite3 mean different threads can modify the same table of a database concurrently?

No - SQLite does not support concurrent write access to the same database file. SQLite will simply block one of the transactions until the other one has finished.

note that if you're using python, to access a sqlite3 connection from different threads you need to disable the check_same_thread argument, e.g:
sqlite.connect(":memory:", check_same_thread = False)
as of the 24th of may 2010, the docs omit this option. the omission is listed as a bug here

Not necessarily. If sqlite3 is compiled with the thread safe macro (check via the int sqlite3_threadsafe(void) function), then you can try to access the same DB from multiple threads without the risk of corruption. Depending on the lock(s) required, however, you may or may not be able to actually modify data (I don't believe sqlite3 supports row locking, which means that to write, you'll need to get a table lock). However, you can try; if one threads blocks, then it will automatically write as soon as the other thread finishes with the DB.

You can use SQLite in 3 different modes:
http://www.sqlite.org/threadsafe.html
If you decide to multi-thread mode or serialized mode, you can easy use SQLite in multi-thread application.
In those situations you can read from all your threads simultaneously anyway. If you need to write simultaneously, the opened table will be lock automatycally for current writing thread and unlock after that (next thread will be waiting (mutex) for his turn until the table will be unlocked). In all those cases, you need to create separate connection string for every thread (.NET Data.Sqlite.dll). If you're using other implementation (e.g. any Android wrapper) sometimes the things are different.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string