fastapi leaving Postgres idle connection - python-3.x

I want to decrease the number of idle PostgreSQL requests coming in from fast-api calls. I am not able to figure out what exactly is leaving this many idle connections as multiple people are using this DB and APIs associated with it.
Can someone suggest what I might have done wrong to leave this many idle connections or an efficient way to figure out what is causing this so that I can accordingly fix that portion and decrease it somehow.
Not sure if I have provided sufficient information to explain this but if anything else is required, I will be more than happy to provide that information.
postgresql idle connection screenshot
This is how I am creating a PostgreSQL object via fastapi
class postgres:
def __init__(self, config):
try:
SQLALCHEMY_DATABASE_URL = "postgresql://" + \
config['postgresql']['user']+":"+config['postgresql']['password']+"#" + \
config['postgresql']['host']+":5432" + \
"/"+config['postgresql']['database']
# print(SQLALCHEMY_DATABASE_URL)
engine = create_engine(SQLALCHEMY_DATABASE_URL, future=True)
self.SessionLocal = sessionmaker(
autocommit=False, autoflush=True, bind=engine)
except Exception as e:
raise
def get_db(self):
"""
Function to return session variable used by ORM
:return: SessionLocal
"""
try:
db = self.SessionLocal()
yield db
finally:
db.close()

I believe this is caused by the get_db(self), as it creates (yields) a new connection/session object every time you call SessionLocal().

Related

Updating tables on Postgres when AWS RDS causing connection freezes

I have been using AWS for about 6 months now for some simple tasks like using EC2 to setup python scripts and cronjobs, updating tables on postgres. So far everything has been great but recently everything went nasty after the cronjobs that update tables increased upto 20, so there are multiple collisions of them simultaneously trying to connect to the database modify the tables(but no 2 scripts try to modify the same table at once). they started failing randomly and I am starting to see most of them fail, I need to reboot the aws rds instance for the cronjobs to work properly but again the same things repeat however. As you can see here, the db connections keep increasing and once I restart it, it's back to normal, same with the current activity sessions(wonder how it could be a float to be honest)
Error:
Exception: Unexpected exception type: ReadTimeout
I am wondering if the way I connect to the database is the problem or what? I can't even figure out if it's some measure from RDS to freeze any modifications to the tables because of many connection attempts. The thing is I am sure the script works, because when I stop all cronjobs, reboot the rds instance and run this script it works fine, but when I start all of them again, it's failing. Can someone please help me with some leads on how to figure this out?
Most of my scripts look pretty similar to the following, there are scripts that run once in a day and that runs once every 5 mins too
def get_daily_data(coin, start_date, time_period):
return df
def main():
coin1 = 'bitcoin'
time_period = '1d'
DATABASE_URI = f'postgresql+psycopg2://{USR}:{token}#{ENDPOINT}:{PORT}/{DBNAME}'
engine = create_engine(DATABASE_URI, echo=False, poolclass=NullPool)
connection = engine.connect()
if connection is not None:
old_btc_data = pd.read_sql("SELECT * FROM daily_data", connection)
start_date= str(old_btc_data['datetime'].iloc[-1].date() - datetime.timedelta(days=7))
latest_data_btc = get_daily_data(coin1, start_date, time_period)
if latest_data_btc is not None:
try:
latest_data_btc = latest_data_btc.reset_index()
latest_data_btc['datetime'] = latest_data_btc['datetime'].dt.tz_localize(None)
latest_data_btc = pd.concat([old_btc_data, latest_data_btc], ignore_index=True)
latest_data_btc = latest_data_btc.drop_duplicates(keep='last')
latest_data_btc = latest_data_btc.set_index('datetime')
latest_data_btc.to_sql(daily_data, if_exists='replace', con=connection)
print(f"Pushed BTC daily data to the database")
except:
print(f"BTC daily data to the database update FAILED!!!!!!")
connection.close()
if __name__ == '__main__':
main()
I learnt that the error is to related to creating a connection, somehow creating a connection using (connection = engine.connect()) is freezing any other modifications to the database table, it started working out for me by using the engine as the connection directly as follows:
connection = create_engine(DATABASE_URI, echo=False, poolclass=NullPool)
Edit: Much better to use sessions with psycopg2 cursors, exception handling would be easy.

Azure timer function. When to add cursor.close

I am very new to Azure functions and have a question. I am working on an Azure timer function that pulls data via an API and inserts it into an Azure SQL db. I am able to do all that part successfully. However, at the end of the script, I get the following error:
Exception: ProgrammingError: Attempt to use a closed cursor.
My question is, when would I include cursor.close? Should I have that in there at all? I assume yes, but, if so, where do I use that?
If I comment it out, it works fine, but I feel like I should have that in there.
Here's my code:
def main(mytimer: func.TimerRequest) -> None:
gp_data=get_properties()
for index, row in gp_data.iterrows():
cursor.execute("""INSERT INTO dbo.get_properties3 (propertyid, property_name, street_address,
city, state_code, zip_code, phone, email, manager, currentperiod_start,
currentperiod_end, as_of_date) values(?,?,?,?,?,?,?,?,?,?,?,?)""", \
row.propertyid, row.property_name, row.street_address, row.city, row.state_code, row.zip_code, \
row.phone, row.email, row.manager, \
row.currentperiod_start, row.currentperiod_end,row.as_of_date)
cnxn.commit()
# cursor.close()
Any advice would be greatly appreciated.
Thanks!
In my opinion, I think the line of code cursor.close() is unnecessary because it can be garbage collected like any other object in python. Each running instance of your timer function will not be affected even though you don't add cursor.close().

Get username from local session Telethon

I'm using telethon library to crawl some telegram channels. While crawling, i need to resolve many join links, usernames and channel ids. To resolve these items, i used method client.get_entity() but after a while telegram servers banned my crawler for resolving too many usernames. I searched around and found from this issue, i should use get_input_entity() instead of get_entity(). Actually telethon saves entities inside a local SQLite file and whenever a call to get_input_entity() is made, it first searches the local SQLite database, if no match found it then sends request to telegram servers. So far so good but i have two problems with this approach:
get_input_entity() just returns two attributes: ID and hash but there are other columns like username, phone and name in the SQLite database. I need a method to not just return ID and hash, but to return other columns too.
I need to control the number of resolve requests sent to telegram server but get_input_entity() sends request to telegram servers whenever founds no match in the local database. The problem is that i can't control this method when to request telegram servers. Actually i need a boolean argument for this method indicating whether or not the method should send a request to telegram servers when no match in the local database is found.
I read some of the telethon source codes, mainly get_input_entity() and wrote my own version of get_input_entity():
def my_own_get_input_entity(self, target, with_info: bool = False):
if self._client:
if target in ('me', 'self'):
return types.InputPeerSelf()
def get_info():
nonlocal self, result
res_id = 0
if isinstance(result, InputPeerChannel):
res_id = result.channel_id
elif isinstance(result, InputPeerChat):
res_id = result.chat_id
elif isinstance(result, InputPeerUser):
res_id = result.user_id
return self._sqlite_session._execute(
'select username, name from entities where id = ?', res_id
)
try:
result = self._client.session.get_input_entity(target)
info = get_info() if with_info else None
return result, info
except ValueError:
record_current_time()
try:
# when we are here, we are actually going to
# send request to telegram servers
if not check_if_appropriate_time_elapsed_from_last_telegram_request():
return None
result = self._client.get_input_entity(target)
info = get_info() if with_info else None
return result, info
except ChannelPrivateError:
pass
except ValueError:
pass
except Exception:
pass
But my code is somehow performance problematic because it makes redundant queries to SQLite database. For example, if the target is actually an entity inside the local database and with_info is True, it first queries the local database in line self._client.session.get_input_entity(target) and checks if with_info is True, then queries the database again to get username and name columns. In another situation, if target is not found inside the local database, calling self._client.get_input_entity(target) makes a redundant call to local database.
Knowing these performance issues, i delved deeper in telethon source codes but as i don't know much about asyncio, i couldn't write any better code than above.
Any ideas how to solve the problems?
client.session.get_input_entity will make no API call (it can't), and fails if there is no match in the local database, which is probably the behaviour you want.
You can, for now, access the client.session._conn private attribute. It's a sqlite3.Connection object so you can use that to make all the queries you want. Note that this is prone to breaking since you're accessing a private member although no changes are expected soon. Ideally, you should subclass the session file to suit your needs. See Session Files in the documentation.

Sqlalchemy Snowflake not closing connection after successfully retrieving results

I am connecting to snowflake datawarehouse from Python and I encounter a weird behavior. The Python program exits successfully if I retrieve fewer number of rows from SnowFlake but hangs in there in-definitely if I try to retrieve more than 200K rows. I am 100% sure that there are no issues with my machine because I am able to retrieve 5 to 10 million rows from other type of database systems such as Postgres.
My Python environment is Python 3.6 and I use the following version of the libraries -> SQLAlchemy 1.1.13, snowflake-connector-python 1.4.13, snowflake-sqlalchemy 1.0.7,
The following code prints the total number of rows and closes the connection.
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(URL(
account=xxxx,
user=xxxxx,
password=xxxxx,
database=xxxxx,
schema=xxxxxx,
warehouse=xxxxx))
query = """SELECT * FROM db_name.schema_name.table_name LIMIT 1000"""
results = engine.execute(query)
print (results.rowcount)
engine.dispose()
The following code prints the total number of rows but the connection doesn't close, it just hangs in there until I manually kill the Python process.
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(URL(
account=xxxx,
user=xxxxx,
password=xxxxx,
database=xxxxx,
schema=xxxxxx,
warehouse=xxxxx))
query = """SELECT * FROM db_name.schema_name.table_name LIMIT 500000"""
results = engine.execute(query)
print (results.rowcount)
engine.dispose()
I tried multiple different tables and I encounter the same issue with SnowFlake. Did anyone encounter similar issues?
Can you check the query status from UI? "History" page should include the query. If the warehouse is not ready, it may take a couple of minutes to start the query. (I guess that's very unlikely, though).
Try changing the connection to this:
connection = engine.connect()
results = connection.execute(query)
print (results.rowcount)
connection.close()
engine.dispose()
SQLAlchemy's dispose doesn't close the connection if the connection is not explicitly closed. I inquired before, but so far the workaround is just close the connection.
https://groups.google.com/forum/#!searchin/sqlalchemy/shige%7Csort:date/sqlalchemy/M7IIJkrlv0Q/HGaQLBFGAQAJ
Lastly, if the issue still persist, add the logger to top:
import logging
for logger_name in ['snowflake','botocore']:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.FileHandler('log')
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
and collect log.
If the output is too long to fit into here, I can take it at the issue page at https://github.com/snowflakedb/snowflake-sqlalchemy.
Note I tried it myself but cannot reproduce the issue so far.
Have you tried using a with statement instead to make your connection
instead of this:
engine = create_engine(URL(account=xxxx,user=xxxxx,password=xxxxx,database=xxxxx,schema=xxxxxx,warehouse=xxxxx))
results = engine.execute(query)
do the following:
with create_engine(URL(account=xxxx,user=xxxxx,password=xxxxx,database=xxxxx,schema=xxxxxx,warehouse=xxxxx)) as engine:
# do work
results = engine.execute(query)
...
After the with .. the engine object should be automatically closed out.

How to handle the Loss connection error and BaseSSHTunnelForwarderError: Could not establish session to SSH gateway

This is my code shown below that i use to connect to the database and get some tables. I have used this for a while without problem but recently I keep on getting this two errors after it runs for a few times. I assume it is something to do with the database.
Basically what happens is that i can connect to the databases, retrieve my table but since my data is very huge I do it on a for loop where I loop through dates within a given range and get the data. For example I start from 20160401 and count one by one until 20170331(about 365 loops).
However after i run like 20 loops i get this error s
OperationError "Lost connection to MySQL server during query (%s)" %
(e,))
and sometimes this one below
BaseSSHTunnelForwarderError: Could not establish session to SSH
gateway
I would like to solve this errors but if it is not possible I would like to wait for a few seconds then reconnect to the databases automatically then execute the same query. Any help in solving this will be highly appreciated.
import pymysql
from sshtunnel import SSHTunnelForwarder
with SSHTunnelForwarder(
(ssh_host, ssh_port),
ssh_username=ssh_user,
ssh_pkey=mypkey,
remote_bind_address=(sql_ip, sql_port)) as tunnel:
conn = pymysql.connect(host='127.0.0.1',
user=sql_username,
passwd=sql_password,
port=tunnel.local_bind_port,
db=db,charset='ujis')
data = pd.read_sql_query(query_2, conn,)
This is what I have tried so far but it is not working
while True:
try:
with SSHTunnelForwarder(
(ssh_host, ssh_port),
ssh_username=ssh_user,
ssh_pkey=mypkey,
remote_bind_address=(sql_ip, sql_port)) as tunnel:
conn = pymysql.connect(host='127.0.0.1',
user=sql_username,
passwd=sql_password,
port=tunnel.local_bind_port,
db=db, charset='ujis')
data = pd.read_sql_query(query, conn)
break
except pymysql.err.OperationalError as e:
conn.ping(reconnect=True)

Resources