QSQLite Error: Database is locked - multithreading

I am new to Qt development, the way it handles threads (signals and slots) and databases (and SQLite at that). It has been 4 weeks that I have started working on the mentioned technologies. This is the first time I'm posting a question on SO and I feel I have done research before coming to you all. This may look a little long and possibly a duplicate, but I request you all to read it thoroughly once before dismissing it off as a duplicate or tl;dr.
Context:
I am working on a Windows application that performs a certain operation X on a database. The application is developed in Qt and uses QSQLite as database engine. It's a single threaded application, i.e., the tables are processed sequentially. However, as the DB size grows (in number of tables and records), this processing becomes slower. The result of this operation X is written in a separate results table in the same DB. The processing being done is immaterial to the problem, but in basic terms here's what it does:
Read a row from Table_X_1
Read a row from Table_X_2
Do some operations on the rows (only read)
Push the results in Table_X_Results table (this is the only write being performed on the DB)
Table_X_1 and Table_X_2 are identical in number and types of columns and number of rows, only the data may differ.
What I'm trying to do:
In order to improve the performance, I am trying to make the application multi-threaded. Initially I am spawning two threads (using QtConcurrentRun). The two tables can be categorized in two types, say A and B. Each thread will take care of the tables of two types. Processing within the threads remains same, i.e., within each thread the tables are being processed sequentially.
The function is such that it uses SELECT to fetch rows for processing and INSERT to insert result in results table. For inserting the results I am using transactions.
I am creating all the intermediate tables, result tables and indices before starting my actual operation. I am opening and closing connections everytime. For the threads, I create and open a connection before entering the loop (one for each thread).
THE PROBLEM:
Inside my processing function, I get following (nasty, infamous, stubborn) error:
QSqlError(5, "Unable to fetch row", "database is locked")
I am getting this error when I'm trying to read a row from DB (using SELECT). This is in the same function in which I'm performing my INSERTs into results table. The SELECT and the INSERT are in the same transaction (begin and commit pair). For INSERT I'm using prepared statement (SQLiteStatement).
Reasons for seemingly peculiar things that I am doing:
I am using QtConcurrentRun to create the threads because it is straightforward to do! I have tried using QThread (not subclassing QThread, but the other method). That also leads to same problem.
I am compiling with DSQLITE_THREADSAFE=0 to avoid application from crashing. If I use the default (DSQLITE_THREADSAFE=1), my application crashes at SQLiteStatement::recordSet->Reset(). Also, with the default option, internal SQLITE sync mechanism comes into play which may not be reliable. If the need be, I'll employ explicit sync.
Making the application multi-threaded to improve performance, and not doing this. I'm taking care of all the optimizations recommended there.
Using QSqlDatabase::setConnectOptions with QSQLITE_BUSY_TIMEOUT=0. A link suggested that it will prevent the DB to get locked immediately and hence may give my thread(s) appropriate amount of time to "die peacefully". This failed: the DB got locked much frequently than before.
Observations:
The database goes into lock only and as soon as when one of the threads return. This behavior is consistent.
When compiling with DSQLITE_THREADSAFE=1, the application crashes when one of the threads return. Call stack points at SQLiteStatement::recordSet->Reset() in my function, and at winMutexEnter() (called from EnterCriticalSection()) in sqlite3.c. This is consistent as well.
The threads created using QtConcurrentRun do not die immediately.
If I use QThreads, I can't get them to return. That is to say, I feel the thread never returns even though I have connected the signals and the slots correctly. What is the correct way to wait for threads and how long it takes them to die?
The thread that finishes execution never returns, it has locked the DB and hence the error.
I checked for SQLITE_BUSY and tried to make the thread sleep but could not get it to work. What is the correct way to sleep in Qt (for threads created with QtConcurrentRun or QThreads)?
When I close my connections, I get this warning:
QSqlDatabasePrivate::removeDatabase: connection 'DB_CONN_CREATE_RESULTS' is still in use, all queries will cease to work.
Is this of any significance? Some links suggested that this warning arises because of using local QSqlDatabase, and will not arise if the connection is made a class member. However, could it be the reason for my problem?
Further experiments:
I am thinking of creating another database which will only contain results table (Table_X_Results). The rationale is that while the threads will read from one DB (the one that I have currently), they will get to write to another DB. However, I may still face the same problem. Moreover, I read on the forums and wikis that it IS possible to have two threads doing read and write on same DB. So why can I not get this scenario to work?
I am currently using SQLITE version 3.6.17. Could that be the problem? Will things be better if I used version 3.8.5?
I was trying to post the web resources that I have already explored, but I get a message saying "I'd need 10 reps to post more than 2 links". Any help/suggestions would be much appreciated.

Related

Conceptual approach of threads in Delphi

Over 2 years ago, Remy Lebeau gave me invaluable tips on threads in Delphi. His answers were very useful to me and I feel like I made great progress thanks to him. This post can be found here.
Today, I now face a "conceptual problem" about threads. This is not really about code, this is about the approach one should choose for a certain problem. I know we are not supposed to ask for personal opinions, I am merely asking if, on a technical point a view, one of these approach must be avoided or if they are both viable.
My application has a list of unique product numbers (named SKU) in a database. Querying an API with theses SKUS, I get back a JSON file containing details about these products. This JSON file is processed and results are displayed on screen, and saved in database. So, at one step, a download process is involved and it is executed in a worker thread.
I see two different approaches possible for this whole procedure :
When the user clicks on the start button, a query is fired, building a list of SKUs based on the user criteria. A Tstringlist is then built and, for each element of the list, a thread is launched, downloads the JSON, sends back the result to the main thread and terminates.
This can be pictured like this :
When the user clicks on the start button, a query is fired, building a list of SKUs based on the user criteria. Instead of sending SKU numbers one after another to the worker thread, the whole list is sent, and the worker thread iterates through the list, sending back results for displaying and saving to the main thread (via a synchronize event). So we only have one worker thread working the whole list before terminating.
This can be pictured like this :
I have coded these two different approaches and they both work... with each their downsides that I have experienced.
I am not a professional developer, this is a hobby and, before working my way further down a path or another for "polishing", I would like to know if, on a technical point of view and according to your knowledge and experience, one of the approaches I depicted should be avoided and why.
Thanks for your time
Mathias
Another thing to consider in this case is latency to your API that is producing the JSON. For example, if it takes 30 msec to go back and forth to the server, and 0.01 msec to create the JSON on the server, then querying a single JSON record per request, even if each request is in a different thread, does not make much sense. In that case, it would make sense to do fewer requests to the server, returning more data on each request, and partition the results up among different threads.
The other thing is that threads are not a solution to every problem. I would question why you need to break each sku into a single thread. how long is each individual thread running and how much processing is each thread doing? In general, creating lots of threads, for each thread to work for a fraction of a msec does not make sense. You want the threads to be alive for as long as possible, processing as much data as they can for the job. You don't want the computer to be using as much time creating/destroying threads as actually doing useful work.

multithread database is locked Sqlite windows phone 8.1 RT

I can't Access database with multithread. It's Exception database is locked or database is busy. I dont understand why database is locked when I read or write in different table.
I try code below to multithread
SQLite3.Config(SQLite3.ConfigOption.MultiThread);
It's not working. Anyone know? I need it so much!
If you have multi threaded application, then both thread have the liberty to update the DB. But inside DB, The first update will take lock on the rows you are trying to update, and if the second update also tries to work on the locked rows, then you have the possibility of getting "locked" or "busy", if the first update request take more the x amount of time, where "x" is configurable.
From the SQLite web site:
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.
So, you could use SQL from different threads for reading, but not for writing concurrently. There are many answers for this in stackoverflow. See for instance: How to use SQLite in a multi-threaded application?

Designing concurrency in a Python program

I'm designing a large-scale project, and I think I see a way I could drastically improve performance by taking advantage of multiple cores. However, I have zero experience with multiprocessing, and I'm a little concerned that my ideas might not be good ones.
Idea
The program is a video game that procedurally generates massive amounts of content. Since there's far too much to generate all at once, the program instead tries to generate what it needs as or slightly before it needs it, and expends a large amount of effort trying to predict what it will need in the near future and how near that future is. The entire program, therefore, is built around a task scheduler, which gets passed function objects with bits of metadata attached to help determine what order they should be processed in and calls them in that order.
Motivation
It seems to be like it ought to be easy to make these functions execute concurrently in their own processes. But looking at the documentation for the multiprocessing modules makes me reconsider- there doesn't seem to be any simple way to share large data structures between threads. I can't help but imagine this is intentional.
Questions
So I suppose the fundamental questions I need to know the answers to are thus:
Is there any practical way to allow multiple threads to access the same list/dict/etc... for both reading and writing at the same time? Can I just launch multiple instances of my star generator, give it access to the dict that holds all the stars, and have new objects appear to just pop into existence in the dict from the perspective of other threads (that is, I wouldn't have to explicitly grab the star from the process that made it; I'd just pull it out of the dict as if the main thread had put it there itself).
If not, is there any practical way to allow multiple threads to read the same data structure at the same time, but feed their resultant data back to a main thread to be rolled into that same data structure safely?
Would this design work even if I ensured that no two concurrent functions tried to access the same data structure at the same time, either for reading or for writing?
Can data structures be inherently shared between processes at all, or do I always explicitly have to send data from one process to another as I would with processes communicating over a TCP stream? I know there are objects that abstract away that sort of thing, but I'm asking if it can be done away with entirely; have the object each thread is looking at actually be the same block of memory.
How flexible are the objects that the modules provide to abstract away the communication between processes? Can I use them as a drop-in replacement for data structures used in existing code and not notice any differences? If I do such a thing, would it cause an unmanageable amount of overhead?
Sorry for my naivete, but I don't have a formal computer science education (at least, not yet) and I've never worked with concurrent systems before. Is the idea I'm trying to implement here even remotely practical, or would any solution that allows me to transparently execute arbitrary functions concurrently cause so much overhead that I'd be better off doing everything in one thread?
Example
For maximum clarity, here's an example of how I imagine the system would work:
The UI module has been instructed by the player to move the view over to a certain area of space. It informs the content management module of this, and asks it to make sure that all of the stars the player can currently click on are fully generated and ready to be clicked on.
The content management module checks and sees that a couple of the stars the UI is saying the player could potentially try to interact with have not, in fact, had the details that would show upon click generated yet. It produces a number of Task objects containing the methods of those stars that, when called, will generate the necessary data. It also adds some metadata to these task objects, assuming (possibly based on further information collected from the UI module) that it will be 0.1 seconds before the player tries to click anything, and that stars whose icons are closest to the cursor have the greatest chance of being clicked on and should therefore be requested for a time slightly sooner than the stars further from the cursor. It then adds these objects to the scheduler queue.
The scheduler quickly sorts its queue by how soon each task needs to be done, then pops the first task object off the queue, makes a new process from the function it contains, and then thinks no more about that process, instead just popping another task off the queue and stuffing it into a process too, then the next one, then the next one...
Meanwhile, the new process executes, stores the data it generates on the star object it is a method of, and terminates when it gets to the return statement.
The UI then registers that the player has indeed clicked on a star now, and looks up the data it needs to display on the star object whose representative sprite has been clicked. If the data is there, it displays it; if it isn't, the UI displays a message asking the player to wait and continues repeatedly trying to access the necessary attributes of the star object until it succeeds.
Even though your problem seems very complicated, there is a very easy solution. You can hide away all the complicated stuff of sharing you objects across processes using a proxy.
The basic idea is that you create some manager that manages all your objects that should be shared across processes. This manager then creates its own process where it waits that some other process instructs it to change the object. But enough said. It looks like this:
import multiprocessing as m
manager = m.Manager()
starsdict = manager.dict()
process = Process(target=yourfunction, args=(starsdict,))
process.run()
The object stored in starsdict is not the real dict. instead it sends all changes and requests, you do with it, to its manager. This is called a "proxy", it has almost exactly the same API as the object it mimics. These proxies are pickleable, so you can pass as arguments to functions in new processes (like shown above) or send them through queues.
You can read more about this in the documentation.
I don't know how proxies react if two processes are accessing them simultaneously. Since they're made for parallelism I guess they should be safe, even though I heard they're not. It would be best if you test this yourself or look for it in the documentation.

oracle row contention causing deadlock errors in high throughtput JMS application

Summary:
I am interested in knowing what's the best practice for high throughput applications that have bulk messages trying to update the same row and get oracle deadlock errors. I know you cannot avoid those errors but how do you recover from them gracefully without getting bogged down by such deadlock errors happening over and over again.
Details:
We are building a high throughput JMS messaging application. Production environment will be two weblogic 11g nodes (running 6 MDB listener instances each). We were getting Oracle deadlock errors (ORA-00060) when we get around 1000 messages all trying to update the same row in oracle database. Java synchronization across nodes is not possible in standard java threading API (unless there's no other solution we don't want to use any 3rd party solutions like terracotta etc).
We were hoping Oracle "select for update WAIT n secs" statement will help because that will essentially make the competing threads (for the same row) wait few seconds before the first thread (who got the lock on the row first) gets done with it.
First issue with "SELECT FOR UPDATE WAIT n" is it doesn't allow using milliseconds for wait times. This starts negatively affecting our application's throughput because putting 1 sec WAIT (least wait time) causes delays on the messages.
Second thing we are fiddling with weblogic queue re-delivery delay parameter (30 secs in our case). Whenever a thread bounces back because of the deadlock error, it will wait 30 seconds before being re-tried.
In our experience 1000 competing messages, in a lot of situations take forever to get processed because the deadlock keeps on happening over and over.
I understand that with the current architecture we are supposed to get deadlock errors regardless ( in case of 1000 competing messages) but application should be resilient enough to recover from these errors after retrying the looping messages.
Any idea what we are missing here ? anybody who has dealt with similar issues before?
I am looking for some design ideas that can make this work resiliently so that it recovers from this deadlock situation and eventually processes all messages in reasonable amount of time without using much additional hardware.
COMPUTATION DETAILS:
These 1000 messages will EACH create 4 objects of 4 different position types each having a quantity associated with it. These quantities will have to merged into those 4 different slots (depending on the position type). The deadlock is happening when those 4 individual slots are being updated by each individual thread. We have already ordered those individual updates in a specific order before being applied to the database rows to avoid any possible race conditions.
A deadlock implies that each thread is trying to update multiple rows in a single transaction and that those updates are being done in a different order across threads. The simplest possible answer, therefore, would be to modify the code so that messages within the same transaction are applied in some defined order (i.e. in order of the primary key). That would ensure that you would never get a deadlock though you'd still get blocking locks while one thread waits for another thread to commit its transaction.
Taking a step back, though, it seems unlikely that you would really want many threads updating the same row in a table when you can't predict the order of the updates. It seems highly likely that would lead to lots of lost updates and some rather unpredictable behavior. What, exactly, is your application doing that would make this sort of thing sensible? Are you doing something like updating aggregate tables after inserting rows into a detail table (i.e. updating the count of the number of views a post has in addition to logging information about a particular view)? If so, do those operations really need to be synchronous? Or could you update the view count periodically in another thread by aggregating the views over the past N second?
As for the MDB
Let it consume the messages, and update instance variables which contain the delta of the quantities of the processed messages (an MDB can carry state in its instance variables across multiple messages).
A #Schedule method in the same MDB persists the quantities in a single database transaction using a single SQL statement every second (for example)
update x set q1 = q1 + delta1, q2 = q2 + delta2, ...
I have done some tests:
It takes 6s to create 1000 messages (JBoss 7 using HornetQ)
During that time, 840 messages were already persisted.
It takes another 2s to persist the remaining ones (the scheduled method ran every second)
This required seven SQL update commands in seven DB transcations
The load is completely caused by creating the messages; there is not real load on the DB
Notes
You need another #PreDestroy method to persist the pending deltas to make sure that nothing gets lost
If you must guarantee transactional correctness, this approach is not suitable. In that case I suggest using a normal queue receiver (= no MDB), transacted session and receive(timeout) to collect 100 - 10000 messages (or until a timeout), do one DB transaction, and right after that the commit on the queue session. This is better, but it's still not XA transactional. If you need this, both commits need to be coordinated by a single XA transaction.

Sqlite thread modes and sqlite misuse paradox

I have a project where i should use multiple tables to avoid keeping dublicated data in my sqlite file(Even though i knew usage of several tables was nightmare).
In my application i am reading data from one table in some method and inserting data into another table in some other method. When i do this i am getting from sqlite step function, error code 21 which is sqlite misuse.
Accoding to my researches that was because i was not able to reach tables from multi threads.
Up to now, i read the sqlite website and learned that there are 3 modes to configurate sqlite database:
1) singlethread: you have no chances to call several threads.
2) multithread: yeah multi thread; but there are some obstacles.
3) serialized: this is the best match with multithread database applications.
if sqlite3_threadsafe() == 2 returns true then yes your sqlite database is serialized and this returned true, so i proved it for myself.
then i have a code to configurate my sqlite database for serialized to take it under guarantee.
sqlite3_config(SQLITE_CONFIG_SERIALIZED);
when i use above codes in class where i read and insert data from 1 table works perfectly :). But if i try to use it in class where i read and insert data from 2 tables (actually where i really need it) problem sqlite misuse comes up.
I checked my code where i open and close database, there is no problem with them. they work unless i delete the other.
I am using ios5 and this is really a big problem for my project. i heard that instagram uses postgresql may be this was the reason ha? Would you suggest postgresql or sqlite at first?
It seems to me like you've got two things mixed up.
Single vs. multi-threaded
Single threaded builds are only ever safe to use from one thread of your code because they lack the mechanisms (mutexes, critical sections, etc.) internally that permit safe use from several. If you are using multiple threads, use a multi-threaded build (or expect “interesting” trouble; you have been warned).
SQLite's thread support is pretty simple. With a multi-threaded build, particular connections should only be used from a single thread (except that they can be initially opened in another).
All recent (last few years?) SQLite builds are happy with access to a single database from multiple processes, but the degree of parallelism depends on the…
Transaction type
SQL in general supports multiple types of transaction. SQLite supports only a subset of them, and its default is SERIALIZABLE. This is the safest mode of access; it simulates what you would see if only one thing could happen at a time. (Internally, it's implemented using a scheme that lets many readers in at once, but only one writer; there's some cleverness to prevent anyone from starving anyone else.)
SQLite also supports read-uncommitted transactions. This increases the amount of parallelism available to code, but at the risk of readers seeing information that's not yet been guaranteed to persist. Whether this matters to you depends on your application.

Resources