I am a newbie to flask. I need to implement connection pooling in flask, the following is my software stack:
1. Flask - 0.12.2
2. Python version - 3.4.3
3. Python couchbase driver version - 2.2.1
4. Couchbase version - 4.5.0-2601 Community Edition (build-2601)
My requirement is - at server startup - repeatedly run 3-4 processes after every 'n' seconds i.e. some process runs every 1 second, some run every 5 seconds etc. I used the "threading" module in python to execute running these parallel processes.
I did not want to create & close connection for above processes, so I create required connections to couchbase buckets at flask start up as below:
(The following lines of code are in "app.py" file)
cbBkt1Conn = Couchbase.connect(host=<host_name>,
bucket=<bucket1>, password=<bucket1Password>)
cbBkt2Conn = Couchbase.connect(host=<host_name>,
bucket=<bucket3>, password=<bucket3Password>)
cbBkt3Conn = Couchbase.connect(host=<host_name>,
bucket=<bucket3>, password=<bucket3Password>)
etc
But, when flask starts after a 3-4 runs I get the following exception:
File "/usr/local/lib/python3.4/dist-packages/couchbase/n1ql.py", line 384,
in __iter__
self._start()
File "/usr/local/lib/python3.4/dist-packages/couchbase/n1ql.py", line 297,
in _start
cross_bucket=self._params.cross_bucket)
couchbase.exceptions.ObjectThreadError: <Couldn't lock. If LOCKMODE_WAIT
was passed, then this means that something has gone wrong internally.
Otherwise, this means you are using the Connection object from multiple
threads. This is not allowed (without an explicit lockmode=LOCKMODE_WAIT
constructor argument, C Source=(src/oputil.c,428)>
On further investigation I found that since I am using the same connection object across multiple threads, this was not permitted (as per above error trace & links below):
http://docs.couchbase.com/sdk-api/couchbase-python-client-2.2.1/api/threads.html
The links suggest to use "threading" with "LOCKMODE_WAIT" option, but this defeats the entire purpose.
I also came across SQLALCHEMY, but this does not supports couchbase
Query:
1) How do I implement connection pooling in Flask?
Please guide me to an example / reference to implement this.
Thanks,
Sachin Vyas.
You're better off creating new connections in a threading environment. (And maybe think about why you don't want to share to begin with).
It is best to think of connections like an old-style house lines. If one person picks up and dials in the living room, then anyone who wants to make a call on the same line will receive the same conversation. If he picks up the phone and starts calling for pizza, then the people that are using the phone will be interrupted and he won't get his pizza.
There are two options: either the second person who wants to use the house will need to wait until the line is free, or open a new connection for each person in the house.
Related
I'm writing a game in Rust where each player can submit some python scripts to the server in order to automate various tasks in the game. I plan on using pyo3 to run the python from rust.
However, I can see an issue arising if a player submits a script like this:
def on_event(e):
while True:
pass
Now when the server calls the function (using something like PyAny::call1()) the thread will hang as it reaches the infinite loop.
My first thought was to have pyo3 execute the python one statement at a time, therefore being able to exit if the script been running for over a certain threshold, but I don't think pyo3 supports this.
My next idea was to give each player their own thread to run their own scripts on, that way if one of their scripts got stuck it only affected their gameplay. However, I still have the issue of not being able to kill a thread when it gets stuck in an infinite loop - if a lot of players submitted scripts that just looped, lots of threads would start using a lot of CPU time.
All I need is way to execute python scripts in a way such that if one of them does loop, it does not affect the server's performance at all.
Thanks :)
One solution is to restrict the time that you give each user script to run.
You can do it via PyThreadState_SetAsyncExc, see here for some code. It uses C calls of the interpreter, which you probably can access in Rust (with PyO3 FFI magic).
Another way would be to do it on the OS level: if you spawn a process for the user script, and then kill it when it runs for too long. This might be more secure if you limit what a process can access (with some OS calls), but requires some boilerplate to communicate between the host.
I am designing a program in Python which
reads data via usb in two second interval from arduino to a Sqlite table (128kB each read outs)
Process the incoming data and store the results on another table
finally query the data on the table and show them on GUI created by tkinter and send the same data on the network to a server.
The question is for which part should I use multiprocessing or threading? Do I even need them? If I run first part from a separate Python file on background does it use necessarily different cpu core?
EDIT:
I found about pickling, now the question is:
is it good idea to pickle a 1kb string every 3 seconds, of course in the ramdrive? and depickle in another script.
I tested already two script and it is working, but I am not sure if this solution can be used in long term running?
It looks promising! specially when I dont see my selft stuck in multithreading or multiprocessing modules and seems like OS will assign necessary cores and threads.
In my flask application there are some REST end points which takes too much time to respond. When invoked, it mostly carries out some CRUD operations in database, some of which could be made asynchronous. I do not have any issue if it sends the response to the client and database inserts keep going on in the background. I wanted to use asyncio, but heard that flask does not support asyncio. In that event I am left with just the choice of threading. Any suggestions? I do not have the option of dumping flask. I do not want to use Celery, as it would be too big of a change.
When using threading at some places it works, at others it does not.Looks like it is not finding the application context.
RuntimeError: Working outside of application context.
This typically means that you attempted to use functionality that needed
to interface with the current application object in some way. To solve
this, set up an application context with app.app_context(). See the
documentation for more information.
In the thread my 1st line of code is - and it is here where it fails:
user: AuthUser = g.logged_in_user
#Edit
In the view function I have to do multiple things - some of them are chained, so cannot be made asynchronous - they need to happen in order as the result returned from one database call is used in invoking the next call & the final result is used in composing the json output the method returns. There is only one database call which is independent from others & being insertion of # 1K records is the greatest contributor in slowing down the api response.
This method if I place at the end it says that the psycopg2 connection has already closed - though I never closed the connection explicitly - only may be the view function has returned the json payload.
If I place the heavy method at the beginning of the view function it works.
#Edit2
From the view function I pass application context and user.Instead of passing the database connection to the thread, I connect to the PostgreSql database from inside the thread.
threading.Thread(target=set_responder_contacts,
args=(user, template_id,),
kwargs={'app':current_app._get_current_object()})
.start()
Code snippet from the function the thread invokes:
def set_responder_contacts(user: AuthUser, template_id: int, app=None):
with app.app_context():
try:
db = dbConn()
#database insertion code
asyncio wouldn't directly help here.
If you trust your server process to remain up for the duration of the background function, then sure, just spin up a thread and do the background work there:
def heavy_work(some_id):
pass
#app.post(...)
def view(...):
some_id = create_thing(...)
threading.Thread(target=heavy_work, args=(some_id,)).start()
return "Okay (though processing in the background)"
There are caveats:
As I alluded to earlier, if the WSGI server process is killed for some reason (for instance, a memory or request count limit is exceeded, or it outright crashes), the background operation will be taken with it.
If the heavy operation is heavily CPU-bound, it may affect the performance of other requests being served by the same server process.
I am new to Qt development, the way it handles threads (signals and slots) and databases (and SQLite at that). It has been 4 weeks that I have started working on the mentioned technologies. This is the first time I'm posting a question on SO and I feel I have done research before coming to you all. This may look a little long and possibly a duplicate, but I request you all to read it thoroughly once before dismissing it off as a duplicate or tl;dr.
Context:
I am working on a Windows application that performs a certain operation X on a database. The application is developed in Qt and uses QSQLite as database engine. It's a single threaded application, i.e., the tables are processed sequentially. However, as the DB size grows (in number of tables and records), this processing becomes slower. The result of this operation X is written in a separate results table in the same DB. The processing being done is immaterial to the problem, but in basic terms here's what it does:
Read a row from Table_X_1
Read a row from Table_X_2
Do some operations on the rows (only read)
Push the results in Table_X_Results table (this is the only write being performed on the DB)
Table_X_1 and Table_X_2 are identical in number and types of columns and number of rows, only the data may differ.
What I'm trying to do:
In order to improve the performance, I am trying to make the application multi-threaded. Initially I am spawning two threads (using QtConcurrentRun). The two tables can be categorized in two types, say A and B. Each thread will take care of the tables of two types. Processing within the threads remains same, i.e., within each thread the tables are being processed sequentially.
The function is such that it uses SELECT to fetch rows for processing and INSERT to insert result in results table. For inserting the results I am using transactions.
I am creating all the intermediate tables, result tables and indices before starting my actual operation. I am opening and closing connections everytime. For the threads, I create and open a connection before entering the loop (one for each thread).
THE PROBLEM:
Inside my processing function, I get following (nasty, infamous, stubborn) error:
QSqlError(5, "Unable to fetch row", "database is locked")
I am getting this error when I'm trying to read a row from DB (using SELECT). This is in the same function in which I'm performing my INSERTs into results table. The SELECT and the INSERT are in the same transaction (begin and commit pair). For INSERT I'm using prepared statement (SQLiteStatement).
Reasons for seemingly peculiar things that I am doing:
I am using QtConcurrentRun to create the threads because it is straightforward to do! I have tried using QThread (not subclassing QThread, but the other method). That also leads to same problem.
I am compiling with DSQLITE_THREADSAFE=0 to avoid application from crashing. If I use the default (DSQLITE_THREADSAFE=1), my application crashes at SQLiteStatement::recordSet->Reset(). Also, with the default option, internal SQLITE sync mechanism comes into play which may not be reliable. If the need be, I'll employ explicit sync.
Making the application multi-threaded to improve performance, and not doing this. I'm taking care of all the optimizations recommended there.
Using QSqlDatabase::setConnectOptions with QSQLITE_BUSY_TIMEOUT=0. A link suggested that it will prevent the DB to get locked immediately and hence may give my thread(s) appropriate amount of time to "die peacefully". This failed: the DB got locked much frequently than before.
Observations:
The database goes into lock only and as soon as when one of the threads return. This behavior is consistent.
When compiling with DSQLITE_THREADSAFE=1, the application crashes when one of the threads return. Call stack points at SQLiteStatement::recordSet->Reset() in my function, and at winMutexEnter() (called from EnterCriticalSection()) in sqlite3.c. This is consistent as well.
The threads created using QtConcurrentRun do not die immediately.
If I use QThreads, I can't get them to return. That is to say, I feel the thread never returns even though I have connected the signals and the slots correctly. What is the correct way to wait for threads and how long it takes them to die?
The thread that finishes execution never returns, it has locked the DB and hence the error.
I checked for SQLITE_BUSY and tried to make the thread sleep but could not get it to work. What is the correct way to sleep in Qt (for threads created with QtConcurrentRun or QThreads)?
When I close my connections, I get this warning:
QSqlDatabasePrivate::removeDatabase: connection 'DB_CONN_CREATE_RESULTS' is still in use, all queries will cease to work.
Is this of any significance? Some links suggested that this warning arises because of using local QSqlDatabase, and will not arise if the connection is made a class member. However, could it be the reason for my problem?
Further experiments:
I am thinking of creating another database which will only contain results table (Table_X_Results). The rationale is that while the threads will read from one DB (the one that I have currently), they will get to write to another DB. However, I may still face the same problem. Moreover, I read on the forums and wikis that it IS possible to have two threads doing read and write on same DB. So why can I not get this scenario to work?
I am currently using SQLITE version 3.6.17. Could that be the problem? Will things be better if I used version 3.8.5?
I was trying to post the web resources that I have already explored, but I get a message saying "I'd need 10 reps to post more than 2 links". Any help/suggestions would be much appreciated.
I would like to distribute the data on multiple machines connected by TCP/IP network using OpenMPI.. can anyone point me to the right resources and direction. I am new to OpenMPI.
Thanks
It depends on the language you're going to write the software. But basically, openMPI application look like this:
Call MPI_INIT for MPI to initialize necessary communications for you between the nodes.
Use MPI_Send, MPI_RECV functions to send or to receive data. There are blocking and non-blocking calls for these functions, along with several others - broadcasting (send to everyone), scatter (distribute data from an array in equal portions to every host) etc.
Use MPI_FINALIZE to finish the communication process.
In MPI, there's almost always following workflow is included:
Master host is assigned - usually the one with processId = 0. It's function is to coordinate the work of slave hosts. Basically, if you have to get the maximum value from array in parallel, it's his job to take the array, distribute it in equal portions to slaves, gather the results from slaves and choose the max number from the list.
Slave host - waits for data to receive, performs handling, sends the results back to master.
I'd recommend this MPI tutorial for C++ development and also check out this so post regarding books on the topic.
Here's just one of the many MPI tutorials on the net; I'm surprised you didn't find this yourself.