So I am developing this online telnet-like game and it's not very popular (who knows, one day), so the database connection of my game engine is not used for hours at night. It is one script that waits for events, so it keeps running.
The first time a query is done after several hours of inactivity, I receive the mariadb.DatabaseError when trying to execute the cursor. If I redo the query, it works again. So while the function throws the exception that the connection is lost, it does repair it.
My question: how should I handle this?
These are things I see as possible solutions, but in my opinion, they are not very good:
wrapping every query inside a try-except structure, makes the code bulky with mostly unnecessary and repetitive code
writing my own 'decorator' function to execute a query, which will then reinitialize the database when I get mariadb.DatabaseError, which seems better, but makes me write wrapper functions around (almost) perfectly working library functions
doing a mostly pointless 'ping' query every N minutes, which is stressing on the db which is useless 99.9% of the time.
Here is some code to illustrate:
import mariadb
class Db:
...
def __init__(self):
self.conn = mariadb.connect(user=self.__db_user, password=self.__db_pass, host=self.__db_host, port=self.__db_port, database=self.__db_name)
def one_of_many_functions(self, ...):
cur = self.conn.cursor()
cur.execute('SELECT ...') # Here is where the mariadb.DatabaseError happens after long inactivity, and otherwise runs fine
...
I actually really don't understand why python's mariadb implementation doesn't handle this. When the connection is lost, cur.execute will throw a mariadb.DatabaseError, but no action is to be taken, because if I requery with that same database connection, it works again. So the connection does repair itself. Why does the component make me requery while it 'repairs' the connection itself and could query again?
But as it is what it is, my question is: what is the nicest way to handle this?
If you set a long time out value, there is even no guarantee, that the connection will drop due to other reasons (client timeout, 24 hr disconnect, ...)
An option would be to set auto_reconnect, as in the following example:
import mariadb
conn1= mariadb.connect()
conn2= mariadb.connect()
# Force MariaDB/Connector Python to reconnect
conn2.auto_reconnect= True
cursor1= conn1.cursor()
print("Connid of connection 2: %s" % conn2.connection_id);
# Since we don't want to wait, we kill the conn2 intentionally
cursor1.execute("KILL %s" % conn2.connection_id)
cursor2= conn2.cursor()
cursor2.execute("select connection_id()")
row= cursor2.fetchall()
print("Connid of connection 2: %s" % conn2.connection_id);
print(row)
Output:
Connid of connection 2: 174
Connid of connection 2: 175
[(175,)]
So after connection 2 was killed, next cursor.execute will establish a new connection before executing the statement. This solution will not work if you use an existing open cursor, since the internal statement handle becomes invalid.
Are you using a socket or TCP/IP for connection?
TCP/IP connections are designed to be cleaned up after a period of no traffic. You might say it's idiotic, but there's really no better way to know if a program crashes.
For the same reason, databases have their own timeout mechanism. For MySQL it's called wait_timeout.
Normally, a connection object (or its wrapper) would take care of running some no-op query if there is nothing else going on with the connection, something like select 1. This is a standard practice. Check the documentation for your connection object - it might already be there, you just need to configure it. Use something like 30-60 seconds.
If not, you will have to implement it yourself. It doesn't matter how, the point is that you cannot expect connections to stay open forever. Either make connections short-lived (open it only when you need it and close it afterwards), or implement a timer that will insert some no-op query periodically. In the latter case note that you will need to implement synchronization mechanism to make sure that your application query never runs at the same time as no-op query.
Have you considered using a connection pool.
# Create Connection Pool
pool = mariadb.ConnectionPool(
#...,
pool_size=1
)
Then in your connection method.
try:
pconn = pool.get_connection()
except mariadb.PoolError as e:
# Report Error
print(f"Error opening connection from pool: {e}")
The documentation doesn't say what happens when connections are closed or broken. I expect that it takes care of that, and always tries to provide a valid connection ( as long as your not asking for more connections than are in the pool.)
I got the code from their docs
Related
I am trying to run multiple queries in parallel with PyGreSQL and multiprocessing, but below code hangs without returning:
from pg import DB
from multiprocessing import Pool
from functools import partial
def create_query(table_name):
return f"""create table {table_name} (id integer);
CREATE INDEX ON {table_name} USING BTREE (id);"""
my_queries = [ create_query('foo'), create_query('bar'), create_query('baz') ]
def execute_query(conn_string, query):
con = DB(conn_string)
con.query(query)
con.close()
rs_conn_string = "host=localhost port=5432 dbname=postgres user=postgres password="
pool = Pool(processes=len(my_queries))
pool.map(partial(execute_query,rs_conn_string), my_queries)
Is there any way to make it work? Also is it possible make the 3 running queries in same "transaction" in case one query fails and the other get rolled back?
One obvious problem is that you always run the pool.map, not only in the main process, but also when the interpreters used in the parallel sub-processes import the script. You should do something like this instead:
def run_all():
with Pool(processes=len(my_queries)) as pool:
pool.map(partial(execute_query,rs_conn_string), my_queries)
if __name__ == '__main__':
run_all()
Regarding your second question, that's not possible since the transaction are per connection, which live in separate processes if you do it like that.
Asynchronous command processing might be what you want, but it is not yet supported by PyGreSQL. Psygopg + aiopg is probably better suited for doing things like that.
PyGreSql added async with the connection.poll() method. As far as pooling, I like to override MySQL.connectors pooling wrappers to handle pgdb connection objects. There’s a few ‘optional’ connection method calls that will fail that you have to comment out (I.e. checking connection status, etc. these can be implemented on the Pgdb connection object level if you want them, but the calls don’t match MySQL.connectors api interface). There’s probably some low-level bugs associated as the libs are only abstracted similarly, but this solution has been running in prod for a few months now without any problems.
I am using redis-py 2.10.6 and redis 4.0.11.
My application uses redis for both the db and the pubsub. When I shut down I often get either hanging or a crash. The latter usually complains about a bad file descriptor or an I/O error on a file (I don't use any) which happens while handling a pubsub callback, so I'm guessing the underlying issue is the same: somehow I don't get disconnected properly and the pool used by my redis.Redis object is alive and kicking.
An example of the output of the former kind of error (during _read_from_socket):
redis.exceptions.ConnectionError: Error while reading from socket: (9, 'Bad file descriptor')
Other times the stacktrace clearly shows redis/connection.py -> redis/client.py -> threading.py, which proves that redis isn't killing the threads it uses.
When I star the application I run:
self.redis = redis.Redis(host=XXXX, port=XXXX)
self.pubsub = self.redis.pubsub()
subscriptions = {'chan1': self.cb1, 'chan2': self.cb2} # cb1 and cb2 are functions
self.pubsub.subscribe(**subscriptions)
self.pubsub_thread = self.pubsub.run_in_thread(sleep_time=1)
When I want to exit the application the last instruction I execute in main is a call to a function in my redis using class, whose implementation is:
self.pubsub.close()
self.pubsub_thread.stop()
self.redis.connection_pool.disconnect()
My understanding is that in theory I do not even need to do any of these 'closing' calls, and yet, with or without them, I still can't guarantee a clean shutdown.
My question is, how am I supposed to guarantee a clean shutdown?
I ran into this same issue and it's largely caused by improper handling of the shutdown by the redis library. During the cleanup, the thread continues to process new messages and doesn't account for situations where the socket is no longer available. After scouring the code a bit, I couldn't find a way to prevent additional processing without just waiting.
Since this is run during a shutdown phase and it's a remedy for a 3rd party library, I'm not overly concerned about the sleep, but ideally the library should be updated to prevent further action while shutting down.
self.pubsub_thread.stop()
time.sleep(0.5)
self.pubsub.reset()
This might be worth an issue log or PR on the redis-py library.
PubSubWorkerThread class check for self._running.is_set() inside the loop.
To do a "clean shutdown" you should call self.pubsub_thread._running.clean() to set the thread event to false and it will stop.
Check how it work here:
https://redis.readthedocs.io/en/latest/_modules/redis/client.html?highlight=PubSubWorkerThread#
Using Delphi 7 & UIB, I'm running database operations in a background thread to eliminate problems like:
Timeout
Priority
Immediate Force-reconnect after network-loss
Non-blocked UI
Keeping an opened DB connection alive
User canceling
I've read ALL related topics here, and realized: using while isMyThreadStillRuning and not UserCanceled do sleep(100); end; isn't the recommended way to do this, but rather using TEvent.WaitFor(3000)....
The solutions here are either about sending signals FROM or TO... the thread, or doing it with messages, but never both ways.
Reading the help file, I've also found TSimpleEvent, which seems to be easier to use.
So what is the recommended way to communicate between Main-UI + DB-Thread in both ways?
Should I simply create 2+2 TSimpleEvent?
to start a new transaction (thread should stop sleeping)
force-STOP execution
to signal back if it's moved to a new stage (transaction started / executed / commited=done)
to signal back if there is any error happened
or should there be only 1 TEvent?
Update 2:
First tests show:
2x TSimpleEvent is enough (1 for Thread + 1 for Gui)
Both created as public properties of the background thread
Force-terminating the thread does not work. (Too many errors impossible to handle..)
Better to set a variable like (Stop_yourself) and let it cancel and free itself, (while creating a new instance from the same class and try again.)
(still work in progress...)
You should move the query to a TThread. Unfortunately, anonymous threads are not available in D7 so you need to write your own TThread derived class. Inside, you need its own DB connection to prevent shared resources. From the caller method, you can wait for the thread to end. The results should be stored somewhere in the caller class. Ensure that the access to parameters of the query and for storing the result of the query is handled thread-safe by using a TMutex or TMonitor.
Recently observed a rather odd behaviour that only happens in Linux but not freeBSD and was wondering whether anyone had an explanation or at least a guess of what might really be going on.
The problem:
The socket creation method, socket.socket(), sometimes fails. This only happens when multiple threads are creating the sockets, single-threaded works just fine.
To expand on socket.socket() fails, most of the time I get "error 13: Permission denied", but I have also seen "error 93: Protocol not supported".
Notes:
I have tried this on Ubuntu 18.04 (bug is there) and freeBSD 12.0 (bug is not there)
It only happens when multiple threads are creating sockets
I've used UDP as a protocol for the sockets, although that seems to be more fault-tolerant. I have tried it with TCP as well, it even goes haywire faster with similar errors.
It only happens sometimes, so multiple-runs might be required or as in the case I provided below - a bloated number of threads should also do the trick.
Code:
Here's some minimal code that you can use to reproduce that:
from threading import Thread
import socket
def foo():
udp = socket.getprotobyname('udp')
try:
send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
except Exception as e:
print type(e)
print repr(e)
def main():
for _ in range(6000):
t = Thread(target=foo)
t.start()
main()
Note:
I have used an artificially large number of threads just to maximize the probability that you'd hit that error at least once within a run with UDP. As I said earlier, if you try TCP you'll see A LOT of errors with that number of threads. But in reality, even a more real-world number of threads like 20 or even 10 would trigger the error, you'd just likely need multiple runs in order to observe it.
Surrounding the socket creation with while, try/except will cause all subsequent calls to also fail.
Surrounding the socket creation with try/except and in the "exception handing" bit restarting the function, i.e. calling it again would work and will not fail.
Any ideas, suggestions or explanations are welcome!!!
P.S.
Technically I know I can get around my problem by having a single thread create as many sockets as I need and pass them as arguments to my other threads, but that is not the point really. I am more interested in why this is happening and how to solve it, rather than what workarounds there might be, even though these are also welcome. :)
I managed to solve it. The problem comes from getprotobyname() not being thread safe!
See:
The Linux man page
On another note, looking at the freeBSD man page also hints that this might cause problems with concurrency, however my experiments prove that it does not, maybe someone can follow up?
Anyway, a fixed version of the code for anyone interested would be to get the protocol number in the main thread (seems sensible and should have done that in the first place) and then pass it as an argument. It would both reduce the system calls that you perform and fix any concurrency-related problems with that within the program. The code would look as follows:
from threading import Thread
import socket
def foo(proto_num):
try:
send_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, proto_num)
except Exception as e:
print type(e)
print repr(e)
def main():
proto_num = socket.getprotobyname('udp')
for _ in range(6000):
t = Thread(target=foo, args=(proto_num,))
t.start()
main()
Exceptions with socket creation in the form of "Permission denied" or "Protocol not supported" will not be reported this way. Also, note that if you use SOCK_DGRAM the proto_number is redundant and might be skipped altogether, however the solution would be more relevant in case someone wants to create a SOCK_RAW socket.
I am using PostgreSQL as a job queue. Following is my query to retrieve a job and update its state:
UPDATE requests AS re
SET
started_at = NOW(),
finished_at = NULL
FROM (
SELECT
_re.*
FROM requests AS _re
WHERE
_re.state = 'pending'
AND
_re.started_at IS NULL
LIMIT 1
FOR UPDATE SKIP LOCKED
) AS sub
WHERE re.id = sub.id
RETURNING
sub.*
Now, I have several machines, on each machine I have 1 process with several threads, and on each thread I have a worker. All workers in the same process shared a connection pool, typically having 10 - 20 connections.
The problem is, the above query will return some rows more than once!
I cannot find any reasons. Could anyone help?
To be more detailed, I am using Python3 and psycopg2.
Update:
I have tried #a_horse_with_no_name's answer, but seems not work.
I noticed that, one request is retrieved by two queries with the started_at updated to:
2016-04-21 14:23:06.970897+08
and
2016-04-21 14:23:06.831345+08
which are only differed by 0.14s.
I am wondering if at the time those two connections executes the inner SELECT subquery, both locks are not established yet?
Update:
To be more precise, I have 200 workers (i.e. 200 threads) in 1 process on 1 machine.
Please also note that it's essential that each thread has it's own connection if you do not want them to get in each others way.
If your application uses multiple threads of execution, they cannot
share a connection concurrently. You must either explicitly control
access to the connection (using mutexes) or use a connection for each
thread. If each thread uses its own connection, you will need to use
the AT clause to specify which connection the thread will use.
from: http://www.postgresql.org/docs/9.5/static/ecpg-connect.html
All kinds of wierd things happen if two threads share the same connection. I believe this is what is happening in your case. If you take a lock with one connection, all other threads that use the same connection will have access to the locked objects.
Permit me to suggest an alternative approach, that is really simple. The use of redis as a queue. You can either simply make use of redis-py and the lpush/rpop methods or use python-rq.
There is a chance a locking transaction is not yet issued at the time of the select, or the lock is lost by the time the results of the select are ready and the update statement begins. Have you tried explicitly beginning a transaction?
BEGIN;
WITH req AS (
SELECT id
FROM requests AS _re
WHERE _re.state = 'pending' AND _re.started_at IS NULL
LIMIT 1 FOR UPDATE SKIP LOCKED
)
UPDATE requests SET started_at = NOW(), finished_at = NULL
FROM req
WHERE requests.id = req.id;
COMMIT;