As per sqlite documentation, when we are using deferred transaction using begin - commit, database is locked since the first write.
And most probably this lock is there till the transaction is commited. So If I did begin and did the first write, and commit comes 180 seconds later, my database is locked till this time.Hence, I cannot do write operations till this time from another thread.
Is there any way that I can tell Sqlite to not hold locks till the commit and acquire locks only when its writing within the transaction? So that I have some chances of concurrent writing from another thread during that transaction. Or is there any solution?
I am using C Sqlite library in an embedded environment.
Allowing others to write data that you are reading would results in inconsistent data.
To allow a writer and readers at the same time, enable WAL mode.
Related
My scenario is as follows:
I have 10 datasets I need to process. I will be using 10 threads to process them all in parallel (each can take up to an hour). Once I find some info in the dataset I want, I will write it to the sqlite database. I might also have to update an existing row. I won't be doing any selects or deletes until after all datasets are finished being processed.
From what I understand sqlite will not handle this scenario well since only 1 thread can lock the file to write and I don't want to hold up other threads to wait until the lock is aquired.
So my idea is that I create another thread to handle all these writes. When a processing thread finds something it wants to write to the db, it sends it to the writer thread. The writer thread can then create a new thread to write it to the db so it can handle if another request comes in and add it to a queue if something is already writing it to the db. Therefore, we only have 1 thread trying to actually write to the db.
My main question is as follows:
Will this work / is this sane? Also is there something that does this already?
I'm using python if that matters.
Thanks
I have an interesting issue using SQLite 3.7.13 on Debian 7.
I'm using SQLite with "PRAGMA journal_mode = wal" and "PRAGMA synchronous = normal" to try and avoid fsyncs in the main event loop of a Python program. As suggested by the documentation I have disabled automatic checkpoints and I am periodically running "PRAGMA wal_checkpoint" in a different thread to sync the most recent data to disk.
This is working however I have found that the first insert operation after a checkpoint in the main program thread is causing a one-off fsync call for the WAL file itself. Any further insert operations do not cause an fsync. I have verified this using strace.
Looking at the SQLite docs it says:
WAL file header is synchronized when a WAL file begins to be reused after a checkpoint
I'm looking for a way to prevent any fsyncs occurring in the main thread, but that still allows me to perform periodic checkpoints from another thread. Is there something more that can be done in the check point thread to avoid the fsync in the main thread?
I have looked at "synchronous = off", however that also blocks the fsyncs for the checkpoints.
Note: I have a separate connection to the database for each of the two threads (in case that is relevant)
Further note: the documentation seems to say elsewhere that there shouldn't be an fsync, but observed behavior obviously differs:
Note that with PRAGMA synchronous set to NORMAL, the checkpoint is the only operation to issue an I/O barrier or sync operation (fsync() on unix or FlushFileBuffers() on windows). If an application therefore runs checkpoint in a separate thread or process, the main thread or process that is doing database queries and updates will never block on a sync operation.
Thanks
To move that fsync into the other thread, do the first operation that changes the database in that thread.
You can use something harmless like PRAGMA user_version = 42 (assuming you don't use the user version).
I have 2 processes that connect to the same DB.
The first one is used to read from the DB and the second is used to write to the DB.
The first process sends write procedures for executing to the second process via message-queue on linux.
Every SQL-statement is taken in the prepare, step, finalize routine; Where the prepare and step are made in loop of 10000 times till it succedd (did this to overcome DB locked issues).
To add a table i do the next procedure:
the first process sends request via msg-q to the second process to add a table and insert garbage in it's rows in a journal_mode=OFF mode.
then the first process checks for the existing table so it could continue in its algorithm. (It checks it in a loop with usleep command between iterations.)
The problem is that the second process is stuck in the step execute of 'PRAGMA journal_mode=OFF;' because it says the DB is locked (Here too, i use a loop of 10000 iterations with usleep to check 10000 time for the DB to be free, as i mentioned before).
When i add to the first process in the 'check for existing table' loop, the operation of closing the connection, the second process is ok. But now when i add tables and values sometime i get 'callback requested query abort' in the Step Statement.
Any help of whats happening here ?
Use WAL mode. It allows one writer and any number of readers without any problems. You don't need to check for the locked state and do retrys etc.
WAL limitation: The DB has to be on the local drive.
Performance: Large transactions (1000s of inserts or similar) are slower than classic rollback journal, but apart of that the speed is very similar, sometimes even better. Perceived performance (UI waiting for DB write to finish) improves dramatically.
WAL is a new technology, but already used in Firefox, Adroid/iOS phones etc. I did tests with 2 threads running at full speed - one writing and the other one reading - and did not encounter a single problem.
You may be able to simplify your app when adopting the WAL mode.
In our scenario,
the consumer takes at least half-a-second to complete a cycle of process (against a row in a data table).
Producer produces at least 8 items in a second (no worries, we don't mind about the duration of a consuming).
the shared data is simply a data table.
we should never ask producer to wait (as it is a server and we don't want it to wait on this)
How can we achieve the above without locking the data table at all (as we don't want producer to wait in any way).
We cannot use .NET 4.0 yet in our org.
There is a great example of a producer/consumer queue using Monitors at this page under the "Producer/Consumer Queue" section. In order to synchronize access to the underlying data table, you can have a single consumer.
That page is probably the best resource for threading in .NET on the net.
Create a buffer that holds the data while it is being processed.
It takes you half a second to process, and you get 8 items a second... unless you have at least 4 processors working on it, you'll have a problem.
Just to be safe I'd use a buffer at least twice the side needed (16 rows), and make sure it's possible with the hardware.
There is no magic bullet that is going to let you access a DataTable from multiple threads without using a blocking synchronization mechanism. What I would do is to hold the lock for as short a duration as possible. Keep in mind that modifying any object in the data table's hierarchy will require locking the whole data table. This is because modifying a column value on a DataRow can change the internal indexing structures inside the parent DataTable.
So what I would do is from the producer acquire a lock, add a new row, and release the lock. Then in the conumser you will acquire the same lock, copy data contained in a DataRow into a separate data structure, and then release the lock immediately. Now, you can operate on the copied data without synchronization mechanisms since it is isolated. After you have completed the operation on it you will again acquire the lock, merge the changes back into the DataRow, and then release the lock and start the process all over again.
Does thread-safety of SQLite3 mean different threads can modify the same table of a database concurrently?
No - SQLite does not support concurrent write access to the same database file. SQLite will simply block one of the transactions until the other one has finished.
note that if you're using python, to access a sqlite3 connection from different threads you need to disable the check_same_thread argument, e.g:
sqlite.connect(":memory:", check_same_thread = False)
as of the 24th of may 2010, the docs omit this option. the omission is listed as a bug here
Not necessarily. If sqlite3 is compiled with the thread safe macro (check via the int sqlite3_threadsafe(void) function), then you can try to access the same DB from multiple threads without the risk of corruption. Depending on the lock(s) required, however, you may or may not be able to actually modify data (I don't believe sqlite3 supports row locking, which means that to write, you'll need to get a table lock). However, you can try; if one threads blocks, then it will automatically write as soon as the other thread finishes with the DB.
You can use SQLite in 3 different modes:
http://www.sqlite.org/threadsafe.html
If you decide to multi-thread mode or serialized mode, you can easy use SQLite in multi-thread application.
In those situations you can read from all your threads simultaneously anyway. If you need to write simultaneously, the opened table will be lock automatycally for current writing thread and unlock after that (next thread will be waiting (mutex) for his turn until the table will be unlocked). In all those cases, you need to create separate connection string for every thread (.NET Data.Sqlite.dll). If you're using other implementation (e.g. any Android wrapper) sometimes the things are different.