adding elements to a query using multithreading - multithreading

Consider you are running simulations and each simulation writes results to a output.txt file. I want to run thousands of simulations using multithreading, while doing that even though if i use locking, unlocking, i was still having errors when multiple threads access to the file at the same time.
To solve this, I am going to add result texts to a query that stores them. That is, each thread will add result to this query instead of writing it to the output.txt file. And in the end, i'll take stored texts from query and write to output.txt
My question here is: whenever multiple threads are adding such items to a query, do you think an error might happen in the end, like, missing simulation ? I've come up to this question because whenever you increase single value by multithreads, that value will not be incremented as much as you want if you don't be careful. (i.e, in a multithreaded for loop, add +1 to previously declared int a for 1000 times; then in the end, a will not be 1000 (ofc this can be prevented by other things))

Related

Recalculation of exchanges done twice

When using the function ParameterManager.recalculate() to get the actualised values of all the parameterized exchanges of my database, the functions ActivityParameter.recalculate() and ActivityParameter.recalculate_exchanges() are applied to all the parameters groups. But it seems that the function ActivityParameter.recalculate_exchanges() is ran twice because it is also used inside the function ActivityParameter.recalculate(). When deleting one, I get the same results but twice faster (that is what I was looking for because otherwise my calculation is a bit long). Is there a reason for running the function twice ? Is it right to delete one to get faster results ? Would there be a way to reduce the duration of this calculation ?
You are totally right - ParameterManager.recalculate calls both ActivityParameter.recalculate and ActivityParameter.recalculate_exchanges, but ActivityParameter.recalculate already calls ActivityParameter.recalculate_exchanges. This makes things slower, but doesn't break anything.
This duplication has been removed; you can safely do the same.

Celery chains: Is it necessary to wait before getting the results?

So, I have a chain of tasks in Python 3 that a celery worker runs. Currently, I use the following piece of code to get and print the final result of the chain :
while not result.ready():
pass
print(result.get())
I have run the code with and without the while-loop, and it seems that the while-loop is redundant.
My question is: "is it necessary to have that while-loop?"
If by redundant, you mean that the code works fine without the while loop, then I would venture to say that the loop is not necessary. If, however, you throw an error without the loop because you're trying to print something that doesn't exist yet, then you should keep it. This can be a problem, though, because an empty while loop means you're just checking the same variable as fast as your computer can physically handle it, which tends to eat up your CPU. I recommend something like the following:
import time
t = 1 #The number of seconds you want to wait between checking if the result is ready
while not result.ready():
time.sleep(t)
print(result.get())
You can set t to whatever makes sense. If the task you're running takes several hours, maybe set it to 60, and you'll get the result within a minute. If you want the result faster, you can make the interval smaller. This will keep the program from dragging down the rest of your computer. However, if you don't mind your fans blowing and you absolutely need to know the moment the result is ready, ignore all of the above and leave your code the way it is :)

Bi-Threaded processing in Matlab

I have a Large-Scale Gradient Descent optimization problem that I am running using Matlab. The code has got two parts:
A Sequential update part that fires every iteration that updates the parameter vector.
A validation error computation part that fires every 10 iterations or so using the parameter value at the end of the corresponding iteration in which its fired.
The way that I am running this now is to do (1) and (2) sequentially. But (2) takes a lot of time and its not the core part of my routine - I made it just to check the progress and plot the error of my model. Is it possible in Matlab to run (2) in a parallel manner to (1) ? Please note that (1) cannot be run in parallel since it performs sequential update. So a simple 'parfor' usage is not a solution, unless there is a really smart way of doing that.
I don't think Matlab has any way of multi-threading outside of the (rather restricted) parallel computing toolbox. There is a work over which may help you though:
Open 2 sessions of Matlab, sessions A and B (or instances, or workspaces, however you call it)
Matlab session A:
Calculate the 10 iterations of your sequential process (1)
Saves the result in a file (adequately and uniquely named)
Goes on to calculate the next 10 iterations (back to the top of this loop basically)
In parralel:
Matlab session B:
Check periodically for the existence of the file written by process A (define a timer that will do that at the time interval which make sense for your process, a few seconds or a few minutes ...)
If the file exist => load it then do the validation computation (your process (2)) and display/report the results.
note: This only works if process (1) doesn't need the result of process (2) to run its iterations, but if it is the case I don't know how you could parallelise anyway.
If you have multiple cores on your machine that should run smoothly, if you have a single core then the 2 sessions will have to share and you will see a performance impact.

QSQLite Error: Database is locked

I am new to Qt development, the way it handles threads (signals and slots) and databases (and SQLite at that). It has been 4 weeks that I have started working on the mentioned technologies. This is the first time I'm posting a question on SO and I feel I have done research before coming to you all. This may look a little long and possibly a duplicate, but I request you all to read it thoroughly once before dismissing it off as a duplicate or tl;dr.
Context:
I am working on a Windows application that performs a certain operation X on a database. The application is developed in Qt and uses QSQLite as database engine. It's a single threaded application, i.e., the tables are processed sequentially. However, as the DB size grows (in number of tables and records), this processing becomes slower. The result of this operation X is written in a separate results table in the same DB. The processing being done is immaterial to the problem, but in basic terms here's what it does:
Read a row from Table_X_1
Read a row from Table_X_2
Do some operations on the rows (only read)
Push the results in Table_X_Results table (this is the only write being performed on the DB)
Table_X_1 and Table_X_2 are identical in number and types of columns and number of rows, only the data may differ.
What I'm trying to do:
In order to improve the performance, I am trying to make the application multi-threaded. Initially I am spawning two threads (using QtConcurrentRun). The two tables can be categorized in two types, say A and B. Each thread will take care of the tables of two types. Processing within the threads remains same, i.e., within each thread the tables are being processed sequentially.
The function is such that it uses SELECT to fetch rows for processing and INSERT to insert result in results table. For inserting the results I am using transactions.
I am creating all the intermediate tables, result tables and indices before starting my actual operation. I am opening and closing connections everytime. For the threads, I create and open a connection before entering the loop (one for each thread).
THE PROBLEM:
Inside my processing function, I get following (nasty, infamous, stubborn) error:
QSqlError(5, "Unable to fetch row", "database is locked")
I am getting this error when I'm trying to read a row from DB (using SELECT). This is in the same function in which I'm performing my INSERTs into results table. The SELECT and the INSERT are in the same transaction (begin and commit pair). For INSERT I'm using prepared statement (SQLiteStatement).
Reasons for seemingly peculiar things that I am doing:
I am using QtConcurrentRun to create the threads because it is straightforward to do! I have tried using QThread (not subclassing QThread, but the other method). That also leads to same problem.
I am compiling with DSQLITE_THREADSAFE=0 to avoid application from crashing. If I use the default (DSQLITE_THREADSAFE=1), my application crashes at SQLiteStatement::recordSet->Reset(). Also, with the default option, internal SQLITE sync mechanism comes into play which may not be reliable. If the need be, I'll employ explicit sync.
Making the application multi-threaded to improve performance, and not doing this. I'm taking care of all the optimizations recommended there.
Using QSqlDatabase::setConnectOptions with QSQLITE_BUSY_TIMEOUT=0. A link suggested that it will prevent the DB to get locked immediately and hence may give my thread(s) appropriate amount of time to "die peacefully". This failed: the DB got locked much frequently than before.
Observations:
The database goes into lock only and as soon as when one of the threads return. This behavior is consistent.
When compiling with DSQLITE_THREADSAFE=1, the application crashes when one of the threads return. Call stack points at SQLiteStatement::recordSet->Reset() in my function, and at winMutexEnter() (called from EnterCriticalSection()) in sqlite3.c. This is consistent as well.
The threads created using QtConcurrentRun do not die immediately.
If I use QThreads, I can't get them to return. That is to say, I feel the thread never returns even though I have connected the signals and the slots correctly. What is the correct way to wait for threads and how long it takes them to die?
The thread that finishes execution never returns, it has locked the DB and hence the error.
I checked for SQLITE_BUSY and tried to make the thread sleep but could not get it to work. What is the correct way to sleep in Qt (for threads created with QtConcurrentRun or QThreads)?
When I close my connections, I get this warning:
QSqlDatabasePrivate::removeDatabase: connection 'DB_CONN_CREATE_RESULTS' is still in use, all queries will cease to work.
Is this of any significance? Some links suggested that this warning arises because of using local QSqlDatabase, and will not arise if the connection is made a class member. However, could it be the reason for my problem?
Further experiments:
I am thinking of creating another database which will only contain results table (Table_X_Results). The rationale is that while the threads will read from one DB (the one that I have currently), they will get to write to another DB. However, I may still face the same problem. Moreover, I read on the forums and wikis that it IS possible to have two threads doing read and write on same DB. So why can I not get this scenario to work?
I am currently using SQLITE version 3.6.17. Could that be the problem? Will things be better if I used version 3.8.5?
I was trying to post the web resources that I have already explored, but I get a message saying "I'd need 10 reps to post more than 2 links". Any help/suggestions would be much appreciated.

Does joblib's memoization support being called from multiple tasks?

if a function that is being memoized called in parallel from two jobs, what happens? One call's result is saved and other is retrieved or both run without using each other results? Or this case is not supported at all?
Couldn't find a reference to this in the documentation
If a result has already been computed and saved (by the same process or by a concurrent process) it is reused.
If 2 concurrent processes compute the same result for the first time, the first process to complete saves the result on the drive for later reuse and the second process use its own computation result the first time and later can reuse the cached result.
Also the cache is preserved on the hard drive after a Python program ends so that it can be reused when the same script / program is restarted later.

Resources