Sqlite thread modes and sqlite misuse paradox - multithreading

I have a project where i should use multiple tables to avoid keeping dublicated data in my sqlite file(Even though i knew usage of several tables was nightmare).
In my application i am reading data from one table in some method and inserting data into another table in some other method. When i do this i am getting from sqlite step function, error code 21 which is sqlite misuse.
Accoding to my researches that was because i was not able to reach tables from multi threads.
Up to now, i read the sqlite website and learned that there are 3 modes to configurate sqlite database:
1) singlethread: you have no chances to call several threads.
2) multithread: yeah multi thread; but there are some obstacles.
3) serialized: this is the best match with multithread database applications.
if sqlite3_threadsafe() == 2 returns true then yes your sqlite database is serialized and this returned true, so i proved it for myself.
then i have a code to configurate my sqlite database for serialized to take it under guarantee.
sqlite3_config(SQLITE_CONFIG_SERIALIZED);
when i use above codes in class where i read and insert data from 1 table works perfectly :). But if i try to use it in class where i read and insert data from 2 tables (actually where i really need it) problem sqlite misuse comes up.
I checked my code where i open and close database, there is no problem with them. they work unless i delete the other.
I am using ios5 and this is really a big problem for my project. i heard that instagram uses postgresql may be this was the reason ha? Would you suggest postgresql or sqlite at first?

It seems to me like you've got two things mixed up.
Single vs. multi-threaded
Single threaded builds are only ever safe to use from one thread of your code because they lack the mechanisms (mutexes, critical sections, etc.) internally that permit safe use from several. If you are using multiple threads, use a multi-threaded build (or expect “interesting” trouble; you have been warned).
SQLite's thread support is pretty simple. With a multi-threaded build, particular connections should only be used from a single thread (except that they can be initially opened in another).
All recent (last few years?) SQLite builds are happy with access to a single database from multiple processes, but the degree of parallelism depends on the…
Transaction type
SQL in general supports multiple types of transaction. SQLite supports only a subset of them, and its default is SERIALIZABLE. This is the safest mode of access; it simulates what you would see if only one thing could happen at a time. (Internally, it's implemented using a scheme that lets many readers in at once, but only one writer; there's some cleverness to prevent anyone from starving anyone else.)
SQLite also supports read-uncommitted transactions. This increases the amount of parallelism available to code, but at the risk of readers seeing information that's not yet been guaranteed to persist. Whether this matters to you depends on your application.

Related

multithread database is locked Sqlite windows phone 8.1 RT

I can't Access database with multithread. It's Exception database is locked or database is busy. I dont understand why database is locked when I read or write in different table.
I try code below to multithread
SQLite3.Config(SQLite3.ConfigOption.MultiThread);
It's not working. Anyone know? I need it so much!
If you have multi threaded application, then both thread have the liberty to update the DB. But inside DB, The first update will take lock on the rows you are trying to update, and if the second update also tries to work on the locked rows, then you have the possibility of getting "locked" or "busy", if the first update request take more the x amount of time, where "x" is configurable.
From the SQLite web site:
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.
So, you could use SQL from different threads for reading, but not for writing concurrently. There are many answers for this in stackoverflow. See for instance: How to use SQLite in a multi-threaded application?

QSQLite Error: Database is locked

I am new to Qt development, the way it handles threads (signals and slots) and databases (and SQLite at that). It has been 4 weeks that I have started working on the mentioned technologies. This is the first time I'm posting a question on SO and I feel I have done research before coming to you all. This may look a little long and possibly a duplicate, but I request you all to read it thoroughly once before dismissing it off as a duplicate or tl;dr.
Context:
I am working on a Windows application that performs a certain operation X on a database. The application is developed in Qt and uses QSQLite as database engine. It's a single threaded application, i.e., the tables are processed sequentially. However, as the DB size grows (in number of tables and records), this processing becomes slower. The result of this operation X is written in a separate results table in the same DB. The processing being done is immaterial to the problem, but in basic terms here's what it does:
Read a row from Table_X_1
Read a row from Table_X_2
Do some operations on the rows (only read)
Push the results in Table_X_Results table (this is the only write being performed on the DB)
Table_X_1 and Table_X_2 are identical in number and types of columns and number of rows, only the data may differ.
What I'm trying to do:
In order to improve the performance, I am trying to make the application multi-threaded. Initially I am spawning two threads (using QtConcurrentRun). The two tables can be categorized in two types, say A and B. Each thread will take care of the tables of two types. Processing within the threads remains same, i.e., within each thread the tables are being processed sequentially.
The function is such that it uses SELECT to fetch rows for processing and INSERT to insert result in results table. For inserting the results I am using transactions.
I am creating all the intermediate tables, result tables and indices before starting my actual operation. I am opening and closing connections everytime. For the threads, I create and open a connection before entering the loop (one for each thread).
THE PROBLEM:
Inside my processing function, I get following (nasty, infamous, stubborn) error:
QSqlError(5, "Unable to fetch row", "database is locked")
I am getting this error when I'm trying to read a row from DB (using SELECT). This is in the same function in which I'm performing my INSERTs into results table. The SELECT and the INSERT are in the same transaction (begin and commit pair). For INSERT I'm using prepared statement (SQLiteStatement).
Reasons for seemingly peculiar things that I am doing:
I am using QtConcurrentRun to create the threads because it is straightforward to do! I have tried using QThread (not subclassing QThread, but the other method). That also leads to same problem.
I am compiling with DSQLITE_THREADSAFE=0 to avoid application from crashing. If I use the default (DSQLITE_THREADSAFE=1), my application crashes at SQLiteStatement::recordSet->Reset(). Also, with the default option, internal SQLITE sync mechanism comes into play which may not be reliable. If the need be, I'll employ explicit sync.
Making the application multi-threaded to improve performance, and not doing this. I'm taking care of all the optimizations recommended there.
Using QSqlDatabase::setConnectOptions with QSQLITE_BUSY_TIMEOUT=0. A link suggested that it will prevent the DB to get locked immediately and hence may give my thread(s) appropriate amount of time to "die peacefully". This failed: the DB got locked much frequently than before.
Observations:
The database goes into lock only and as soon as when one of the threads return. This behavior is consistent.
When compiling with DSQLITE_THREADSAFE=1, the application crashes when one of the threads return. Call stack points at SQLiteStatement::recordSet->Reset() in my function, and at winMutexEnter() (called from EnterCriticalSection()) in sqlite3.c. This is consistent as well.
The threads created using QtConcurrentRun do not die immediately.
If I use QThreads, I can't get them to return. That is to say, I feel the thread never returns even though I have connected the signals and the slots correctly. What is the correct way to wait for threads and how long it takes them to die?
The thread that finishes execution never returns, it has locked the DB and hence the error.
I checked for SQLITE_BUSY and tried to make the thread sleep but could not get it to work. What is the correct way to sleep in Qt (for threads created with QtConcurrentRun or QThreads)?
When I close my connections, I get this warning:
QSqlDatabasePrivate::removeDatabase: connection 'DB_CONN_CREATE_RESULTS' is still in use, all queries will cease to work.
Is this of any significance? Some links suggested that this warning arises because of using local QSqlDatabase, and will not arise if the connection is made a class member. However, could it be the reason for my problem?
Further experiments:
I am thinking of creating another database which will only contain results table (Table_X_Results). The rationale is that while the threads will read from one DB (the one that I have currently), they will get to write to another DB. However, I may still face the same problem. Moreover, I read on the forums and wikis that it IS possible to have two threads doing read and write on same DB. So why can I not get this scenario to work?
I am currently using SQLITE version 3.6.17. Could that be the problem? Will things be better if I used version 3.8.5?
I was trying to post the web resources that I have already explored, but I get a message saying "I'd need 10 reps to post more than 2 links". Any help/suggestions would be much appreciated.

Are there greenDAO thread safety best practices?

I'm having a go with greenDAO and so far it's going pretty well. One thing that doesn't seem to be covered by the docs or website (or anywhere :( ) is how it handles thread safety.
I know the basics mentioned elsewhere, like "use a single dao session" (general practice for Android + SQLite), and I understand the Java memory model quite well. The library internals even appear threadsafe, or at least built with that intention. But nothing I've seen covers this:
greenDAO caches entities by default. This is excellent for a completely single-threaded program - transparent and a massive performance boost for most uses. But if I e.g. loadAll() and then modify one of the elements, I'm modifying the same object globally across my app. If I'm using it on the main thread (e.g. for display), and updating the DB on a background thread (as is right and proper), there are obvious threading problems unless extra care is taken.
Does greenDAO do anything "under the hood" to protect against common application-level threading problems? For example, modifying a cached entity in the UI thread while saving it in a background thread (better hope they don't interleave! especially when modifying a list!)? Are there any "best practices" to protect against them, beyond general thread safety concerns (i.e. something that greenDAO expects and works well with)? Or is the whole cache fatally flawed from a multithreaded-application safety standpoint?
I've no experience with greenDAO but the documentation here:
http://greendao-orm.com/documentation/queries/
Says:
If you use queries in multiple threads, you must call forCurrentThread() on the query to get a Query instance for the current thread. Starting with greenDAO 1.3, object instances of Query are bound to their owning thread that build the query. This lets you safely set parameters on the Query object while other threads cannot interfere. If other threads try to set parameters on the query or execute the query bound to another thread, an exception will be thrown. Like this, you don’t need a synchronized statement. In fact you should avoid locking because this may lead to deadlocks if concurrent transactions use the same Query object.
To avoid those potential deadlocks completely, greenDAO 1.3 introduced the method forCurrentThread(). This will return a thread-local instance of the Query, which is safe to use in the current thread. Every time, forCurrentThread() is called, the parameters are set to the initial parameters at the time the query was built using its builder.
While so far as I can see the documentation doesn't explicitly say anything about multi threading other than this this seems pretty clear that it is handled. This is talking about multiple threads using the same Query object, so clearly multiple threads can access the same database. Certainly it's normal for databases and DAO to handle concurrent access and there are a lot of proven techniques for working with caches in this situation.
By default GreenDAO caches and returns cached entity instances to improve performance. To prevent this behaviour, you need to call:
daoSession.clear()
to clear all cached instances. Alternatively you can call:
objectDao.detachAll()
to clear cached instances only for the specific DAO object.
You will need to call these methods every time you want to clear the cached instances, so if you want to disable all caching, I recommend calling them in your Session or DAO accessor methods.
Documentation:
http://greenrobot.org/greendao/documentation/sessions/#Clear_the_identity_scope
Discussion: https://github.com/greenrobot/greenDAO/issues/776

Nodejs - How to maintain a global datastructure

So I have a backend implementation in node.js which mainly contains a global array of JSON objects. The JSON objects are populated by user requests (POSTS). So the size of the global array increases proportionally with the number of users. The JSON objects inside the array are not identical. This is a really bad architecture to begin with. But I just went with what I knew and decided to learn on the fly.
I'm running this on a AWS micro instance with 6GB RAM.
How to purge this global array before it explodes?
Options that I have thought of:
At a periodic interval write the global array to a file and purge. Disadvantage here is that if there are any clients in the middle of a transaction, that transaction state is lost.
Restart the server every day and write the global array into a file at that time. Same disadvantage as above.
Follow 1 or 2, and for every incoming request - if the global array is empty look for the corresponding JSON object in the file. This seems absolutely absurd and stupid.
Somehow I can't think of any other solution without having to completely rewrite the nodejs application. Can you guys think of any .. ? Will greatly appreciate any discussion on this.
I see that you are using memory as a storage. If that is the case and your code is synchronous (you don't seem to use database, so it might), then actually solution 1. is correct. This is because JavaScript is single-threaded, which means that when one code is running the other cannot run. There is no concurrency in JavaScript. This is only a illusion, because Node.js is sooooo fast.
So your cleaning code won't fire until the transaction is over. This is of course assuming that your code is synchronous (and from what I see it might be).
But still there are like 150 reasons for not doing that. The most important is that you are reinventing the wheel! Let the database do the hard work for you. Using proper database will save you all the trouble in the future. There are many possibilites: MySQL, PostgreSQL, MongoDB (my favourite), CouchDB and many many other. It shouldn't matter at this point which one. Just pick one.
I would suggest that you start saving your JSON to a non-relational DB like http://www.couchbase.com/.
Couchbase is extremely easy to setup and use even in a cluster. It uses a simple key-value design so saving data is as simple as:
couchbaseClient.set("someKey", "yourJSON")
then to retrieve your data:
data = couchbaseClient.set("someKey")
The system is also extremely fast and is used by OMGPOP for Draw Something. http://blog.couchbase.com/preparing-massive-growth-revisited

Thread-safety and concurrent modification of a table in SQLite3

Does thread-safety of SQLite3 mean different threads can modify the same table of a database concurrently?
No - SQLite does not support concurrent write access to the same database file. SQLite will simply block one of the transactions until the other one has finished.
note that if you're using python, to access a sqlite3 connection from different threads you need to disable the check_same_thread argument, e.g:
sqlite.connect(":memory:", check_same_thread = False)
as of the 24th of may 2010, the docs omit this option. the omission is listed as a bug here
Not necessarily. If sqlite3 is compiled with the thread safe macro (check via the int sqlite3_threadsafe(void) function), then you can try to access the same DB from multiple threads without the risk of corruption. Depending on the lock(s) required, however, you may or may not be able to actually modify data (I don't believe sqlite3 supports row locking, which means that to write, you'll need to get a table lock). However, you can try; if one threads blocks, then it will automatically write as soon as the other thread finishes with the DB.
You can use SQLite in 3 different modes:
http://www.sqlite.org/threadsafe.html
If you decide to multi-thread mode or serialized mode, you can easy use SQLite in multi-thread application.
In those situations you can read from all your threads simultaneously anyway. If you need to write simultaneously, the opened table will be lock automatycally for current writing thread and unlock after that (next thread will be waiting (mutex) for his turn until the table will be unlocked). In all those cases, you need to create separate connection string for every thread (.NET Data.Sqlite.dll). If you're using other implementation (e.g. any Android wrapper) sometimes the things are different.

Resources