Connection pool sqlalchemy always return a cached result - python-3.x

I use the following code to create connection pooling and run SQL query.
engine = sqlalchemy.create_engine(
f'mysql+pymysql://{usr}:{pw}#{db_config["host"]}:{db_config["port"]}/'
f'{sub_db}', pool_size=24, max_overflow=40, poolclass=pool.QueuePool)
connection = engine.connect()
pandas.read_sql(sql=my_query, con=connection)
However, I found that connection I created sometime returns a cached DB return:
E.g. I run SQL function validatePW to validate tokens based on a DB table. I have two users: A and B with different passwords. User A's password is 'PW1'.
If I run pandas.read_sql(sql="select validatePW('A', 'PW1')", con=connection), I got a True return which is as expected. and if I first run pandas.read_sql(sql="select validatePW('B', 'PW1')", con=connection), I will get a False return (B's password is not 'PW1').
Now if I run these two queries sequentially, the results will become funny
pd.read_sql(sql="select validatePW('A', 'PW1')", con=connection)
pd.read_sql(sql="select validatePW('B', 'PW1')", con=connection)
I will get two True return, although I am expecting the second query would return a False. It seems like connection is returning a cached result from the previous run. Why is that? Is there anyway to avoid it? Thanks

Related

relation does not exist in postgreSQL but already exist

I've read a lot of articles about my problem but no one solve it. So u can see my code here
DATABASE_URL = os.environ.get('url_of_my_db')
con = None
try:
con = psycopg2.connect(DATABASE_URL)
cur = con.cursor()
print('PostgreSQL database version:')
#cur.execute('SELECT version()')
#cur.execute('SELECT * FROM qwerty')
#cur.execute(sql.SQL('SELECT * FROM {}').format(sql.Identifier('qwerty')))
#cur.execute(sql.SQL("INSERT INTO {} (chat_id, username, created_on) VALUES (8985972942, vovakirdan, 2022-01-05)").format(sql.Identifier('users')))
cur.execute("""INSERT INTO users (chat_id, username, created_on)
VALUES (3131,
vovakirdan,
2022-01-05)""")
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)
# close the communication with the HerokuPostgres
cur.close()
except Exception as error:
print('Cause: {}'.format(error))
finally:
# close the communication with the database server by calling the close()
if con is not None:
con.close()
print('Database connection closed.')
and in my DB table named "users" (named without quotes) are exist, but I still have this error:
error
...relation "users" does not exist
all the #commented code doesn't work and send the same error besides SELECT version(), it works perfectly that proves that connection works.
The problem was that PostgreDB wants me to use SELECT colum FROM schema.table instead of SELECT colum FROM table. And that's all. Thanks everyone

Python3 pika channel.basic_consume() causing MySQL too many connections

I had using pika to make a connection to RabbitMQ and consume message, once I start the script on ubuntu prod environment it is working as expected but is opening mysql connection and never closes them and ends up in Too many connection on mysql server.
Will appreciate any recommendation on the code below, as well can not understand what is going wrong. Thanking you in advance.
The flow is the following
Starting pika on Python3
Subscribe to a channel and waiting for messages
In callback i do various validation and save or update data inside MySql
The result that is showing the problem is the at the end of question a screenshot from ubuntu htop, that is showing new connection on MySql and keep adding them on the top
Pika Verion = 0.13.0
For MySql I use pymysql.
Pika Script
def main():
credentials = pika.PlainCredentials(tunnel['queue']['name'], tunnel['queue']['password'])
while True:
try:
cp = pika.ConnectionParameters(
host=tunnel['queue']['host'],
port=tunnel['queue']['port'],
credentials=credentials,
ssl=tunnel['queue']['ssl'],
heartbeat=600,
blocked_connection_timeout=300
)
connection = pika.BlockingConnection(cp)
channel = connection.channel()
def callback(ch, method, properties, body):
if 'messageType' in properties.headers:
message_type = properties.headers['messageType']
if events.get(message_type):
result = Descriptors._reflection.ParseMessage(events[message_type]['decode'], body)
if result:
result = protobuf_to_dict(result)
model.write_response(external_response=result, message_type=message_type)
else:
app_log.warning('Message type not in allowed list = ' + str(message_type))
app_log.warning('continue listening...')
channel.basic_consume(callback, queue=tunnel['queue']['name'], no_ack=True)
try:
channel.start_consuming()
except KeyboardInterrupt:
channel.stop_consuming()
connection.close()
break
except pika.connection.exceptions.ConnectionClosed as e:
app_log.error('ConnectionClosed :: %s' % str(e))
continue
except pika.connection.exceptions.AMQPChannelError as e:
app_log.error('AMQPChannelError :: %s' % str(e))
continue
except Exception as e:
app_log.error('Connection was closed, retrying... %s' % str(e))
continue
if __name__ == '__main__':
main()
Inside the script i have a model that doing inserts or updated in the database, code below
def write_response(self, external_response, message_type):
table_name = events[message_type]['table_name']
original_response = external_response[events[message_type]['response']]
if isinstance(original_response, list):
external_response = []
for o in original_response:
record = self.map_keys(o, message_type, events[message_type].get('values_fix', {}))
external_response.append(self.validate_fields(record))
else:
external_response = self.map_keys(original_response, message_type, events[message_type].get('values_fix', {}))
external_response = self.validate_fields(external_response)
if not self.mysql.open:
self.mysql.ping(reconnect=True)
with self.mysql.cursor() as cursor:
if isinstance(original_response, list):
for e in external_response:
id_name = events[message_type]['id_name']
filters = {id_name: e[id_name]}
self.event(
cursor=cursor,
table_name=table_name,
filters=filters,
external_response=e,
message_type=message_type,
event_id=e[id_name],
original_response=e # not required here
)
else:
id_name = events[message_type]['id_name']
filters = {id_name: external_response[id_name]}
self.event(
cursor=cursor,
table_name=table_name,
filters=filters,
external_response=external_response,
message_type=message_type,
event_id=external_response[id_name],
original_response=original_response
)
cursor.close()
self.mysql.close()
return
On ubuntu i use systemd to run the script and restart in case something goes wrong, below is systemd file
[Unit]
Description=Pika Script
Requires=stunnel4.service
Requires=mysql.service
Requires=mongod.service
[Service]
User=user
Group=group
WorkingDirectory=/home/pika_script
ExecStart=/home/user/venv/bin/python pika_script.py
Restart=always
[Install]
WantedBy=multi-user.target
Image from ubuntu htop, how the MySql keeps adding in the list and never close it
Error
tornado_mysql.err.OperationalError: (1040, 'Too many connections')
i have found the issue, posting if will help somebody else.
the problem was that mysqld went into infinite loop trying to create indexing to a specific database, after found to which database was trying to create the indexes and never succeed and was trying again and again.
solution was to remove the database and recreate it, and the mysqld process went back to normal. and the infinite loop to create indexes dissapeared as well.
I would say increasing connection may solve your problem temperately.
1st find out why the application is not closing the connection after completion of task.
2nd Any slow queries/calls on the DB and fix them if any.
3rd considering no slow queries/calls on DB and also application is closing the connection/thread after immediately completing the task, then consider playing with "wait_timeout" on mysql side.
According to this answer, if you have MySQL 5.7 and 5.8 :
It is worth knowing that if you run out of usable disc space on your
server partition or drive, that this will also cause MySQL to return
this error. If you're sure it's not the actual number of users
connected then the next step is to check that you have free space on
your MySQL server drive/partition.
From the same thread. You can inspect and increase number of MySQL connections.

Elastic Search via python gives wrong count

I’m new to python and I need to get connected to “Kibana” via python. we’re using Kibana 7.4.1. The requirement is to get them just the count (hits).
Due to some restrictions, I need to use Python 3.6 only. I’ve added the “ElasticSearch” & “ElasticSearch-dsl” library.
I’m able to get connected to the Kibana via the client, but I’m getting the wrong hits count.
Code:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import MultiSearch, Search
from elasticsearch_dsl.query import QueryString, Range, SimpleQueryString
es = Elasticsearch(['host2', 'host2'], http_auth=('usr', 'pass'), port=9200)
s = Search(using=es, index='c*')
s.filter(SimpleQueryString(query="tags:prod AND severity:INFO AND service: finder AND msg:* is processed"))
s.filter(Range(** {'#timestamp': {'gte': 'now-5m', 'lt': 'now'}}))
response = s.execute()
print("Got %d Hits:" % response['hits']['total']['value']) # Always coming as 1000 so this is wrong
Can I get some help with this, please?
First of all a little clarification. You are connecting to Elasticsearch and not Kibana (Kibana is a client, like the program you are writing).
You are receiving always 10000 as result, because your index has more than 10000 hits. It is a documented feature. Indeed, since the count computation is expensive in the general case it is performed only when needed. In order to obtain the right number of results you have two possibilities
to set the query parameter track_total_hits to true
use the count API.
track_total_hits
You can add this extra parameter to the search object as reported here as follows:
s = Search(using=es, index='c*')
s = s.extra(track_total_hits=True)
<the-rest of your code>
Count API approach
Instead of invoking the execute() function, you can simply use the count() function:
s = Search(using=es, index='c*')
s.filter(SimpleQueryString(query="tags:prod AND severity:INFO AND service: finder AND msg:* is processed"))
s.filter(Range(** {'#timestamp': {'gte': 'now-5m', 'lt': 'now'}}))
response = s.cpunt()
print("Got %d Hits:" % response)
Kind regards

Is there any way to put timeout in pandas read_sql function?

I connect to a DB2 server through ODBC connection in my python code. The DB2 server gets reboot for maintainence or disconnects me while running specific server side tasks, happens 1 or 2 times in a day. At that time if my code has started executing the pandas read_sql function to fetch result of query, it goes into a infinite wait even when the server is up after lets say 1 hour.
I want to put a timeout in the execution of read_sql and whenever that timeout occurs I want to refresh the connection with DB2 server so that a fresh connection is made again before continuing the query.
I have tried making a while loop and picking chunks of data from DB2 instead of pulling whole result at once, but problem is if DB2 disconnects in pulling chunk python code still goes into infinite wait.
chunk_size = 1000
offset = 0
while True:
sql = "SELECT * FROM table_name limit %d offset %d" % (chunk_size,offset)
df = pd.read_sql(sql, conn)
df.index += (offset+1)
offset += chunk_size
sys.stdout.write('.')
sys.stdout.flush()
if df.shape[0] < chunk_size:
break
I need the read_sql to throw some exception or return a value if the sql execution takes more than 3 minutes. If that happenes I need the connection to DB2 to refresh.
You could use the package func-timeout. You can install via pip as below:
pip install func-timeout
So, for example, if you have a function “doit(‘arg1’, ‘arg2’)” that you want to limit to running for 5 seconds, with func_timeout you can call it like this:
from func_timeout import func_timeout, FunctionTimedOut
try:
doitReturnValue = func_timeout(5, doit, args=(‘arg1’, ‘arg2’))
except FunctionTimedOut:
print ( “doit(‘arg1’, ‘arg2’) could not complete within 5 seconds, hence terminated.\n”)
except Exception as e:
# Handle any exceptions that doit might raise here

Dump series back into InfluxDB after querying with replaced field value

Scenario
I want to send data to an MQTT Broker (Cloud) by querying measurements from InfluxDB.
I have a field in the schema which is called status. It can either be 1 or 0. status=0 indicated that series has not been sent to the cloud. If I get an acknowlegdment from the MQTT Broker then I wish to rewrite the query back into the database with status=1.
As mentioned in FAQs for InfluxDB regarding Duplicate data If the information has the same timestamp as the previous query but with a different field value => then the update field will be shown.
In order to test this I created the following:
CREATE DATABASE dummy
USE dummy
INSERT meas_1, type=t1, status=0,value=123 1536157064275338300
query:
SELECT * FROM meas_1
provides
time status type value
1536157064275338300 0 t1 234
now if I want to overwrite the series I do the following:
INSERT meas_1, type=t1, status=1,value=123 1536157064275338300
which will overwrite the series
time status type value
1536157064275338300 1 t1 234
(Note: this is not possible via Tags currently in InfluxDB)
Usage
Query some information using the client with "status"=0.
Restructure JSON to be sent to the cloud
Send the information to cloud
If successful then write the output from Step 1. back into the DB but with status=1.
I am using the InfluxDBClient Python3 to create the Application (MQTT + InfluxDB)
Within the write_points API there is a parameter which mentions batch_size which require int as input.
I am not sure how can I use this with the Application that I want. Can someone guide me with this or with the Schema of the DB so that I can upload actual and non-redundant information to the cloud ?
The batch_size is actually the length of the list of the measurements that needs to passed to write_points.
Steps
Create client and query from measurement (here, we query gps information)
client = InfluxDBClient(database='dummy')
op = client.query('SELECT * FROM gps WHERE "status"=0', epoch='ns')
Make the ResultSet into a list:
batch = list(op.get_points('gps'))
create an empty list for update
updated_batch = []
parse through each measurement and change the status flag to 1. Note, default values in InfluxDB are float
for each in batch:
new_mes = {
'measurement': 'gps',
'tags': {
'type': 'gps'
},
'time': each['time'],
'fields': {
'lat': float(each['lat']),
'lon': float(each['lon']),
'alt': float(each['alt']),
'status': float(1)
}
}
updated_batch.append(new_mes)
Finally dump the points back via the client with batch_size as the length of the updated_batch
client.write_points(updated_batch, batch_size=len(updated_batch))
This overwrites the series because it contains the same timestamps with status field set to 1

Resources