I am connecting to snowflake datawarehouse from Python and I encounter a weird behavior. The Python program exits successfully if I retrieve fewer number of rows from SnowFlake but hangs in there in-definitely if I try to retrieve more than 200K rows. I am 100% sure that there are no issues with my machine because I am able to retrieve 5 to 10 million rows from other type of database systems such as Postgres.
My Python environment is Python 3.6 and I use the following version of the libraries -> SQLAlchemy 1.1.13, snowflake-connector-python 1.4.13, snowflake-sqlalchemy 1.0.7,
The following code prints the total number of rows and closes the connection.
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(URL(
account=xxxx,
user=xxxxx,
password=xxxxx,
database=xxxxx,
schema=xxxxxx,
warehouse=xxxxx))
query = """SELECT * FROM db_name.schema_name.table_name LIMIT 1000"""
results = engine.execute(query)
print (results.rowcount)
engine.dispose()
The following code prints the total number of rows but the connection doesn't close, it just hangs in there until I manually kill the Python process.
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(URL(
account=xxxx,
user=xxxxx,
password=xxxxx,
database=xxxxx,
schema=xxxxxx,
warehouse=xxxxx))
query = """SELECT * FROM db_name.schema_name.table_name LIMIT 500000"""
results = engine.execute(query)
print (results.rowcount)
engine.dispose()
I tried multiple different tables and I encounter the same issue with SnowFlake. Did anyone encounter similar issues?
Can you check the query status from UI? "History" page should include the query. If the warehouse is not ready, it may take a couple of minutes to start the query. (I guess that's very unlikely, though).
Try changing the connection to this:
connection = engine.connect()
results = connection.execute(query)
print (results.rowcount)
connection.close()
engine.dispose()
SQLAlchemy's dispose doesn't close the connection if the connection is not explicitly closed. I inquired before, but so far the workaround is just close the connection.
https://groups.google.com/forum/#!searchin/sqlalchemy/shige%7Csort:date/sqlalchemy/M7IIJkrlv0Q/HGaQLBFGAQAJ
Lastly, if the issue still persist, add the logger to top:
import logging
for logger_name in ['snowflake','botocore']:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.FileHandler('log')
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
and collect log.
If the output is too long to fit into here, I can take it at the issue page at https://github.com/snowflakedb/snowflake-sqlalchemy.
Note I tried it myself but cannot reproduce the issue so far.
Have you tried using a with statement instead to make your connection
instead of this:
engine = create_engine(URL(account=xxxx,user=xxxxx,password=xxxxx,database=xxxxx,schema=xxxxxx,warehouse=xxxxx))
results = engine.execute(query)
do the following:
with create_engine(URL(account=xxxx,user=xxxxx,password=xxxxx,database=xxxxx,schema=xxxxxx,warehouse=xxxxx)) as engine:
# do work
results = engine.execute(query)
...
After the with .. the engine object should be automatically closed out.
Related
I have been using AWS for about 6 months now for some simple tasks like using EC2 to setup python scripts and cronjobs, updating tables on postgres. So far everything has been great but recently everything went nasty after the cronjobs that update tables increased upto 20, so there are multiple collisions of them simultaneously trying to connect to the database modify the tables(but no 2 scripts try to modify the same table at once). they started failing randomly and I am starting to see most of them fail, I need to reboot the aws rds instance for the cronjobs to work properly but again the same things repeat however. As you can see here, the db connections keep increasing and once I restart it, it's back to normal, same with the current activity sessions(wonder how it could be a float to be honest)
Error:
Exception: Unexpected exception type: ReadTimeout
I am wondering if the way I connect to the database is the problem or what? I can't even figure out if it's some measure from RDS to freeze any modifications to the tables because of many connection attempts. The thing is I am sure the script works, because when I stop all cronjobs, reboot the rds instance and run this script it works fine, but when I start all of them again, it's failing. Can someone please help me with some leads on how to figure this out?
Most of my scripts look pretty similar to the following, there are scripts that run once in a day and that runs once every 5 mins too
def get_daily_data(coin, start_date, time_period):
return df
def main():
coin1 = 'bitcoin'
time_period = '1d'
DATABASE_URI = f'postgresql+psycopg2://{USR}:{token}#{ENDPOINT}:{PORT}/{DBNAME}'
engine = create_engine(DATABASE_URI, echo=False, poolclass=NullPool)
connection = engine.connect()
if connection is not None:
old_btc_data = pd.read_sql("SELECT * FROM daily_data", connection)
start_date= str(old_btc_data['datetime'].iloc[-1].date() - datetime.timedelta(days=7))
latest_data_btc = get_daily_data(coin1, start_date, time_period)
if latest_data_btc is not None:
try:
latest_data_btc = latest_data_btc.reset_index()
latest_data_btc['datetime'] = latest_data_btc['datetime'].dt.tz_localize(None)
latest_data_btc = pd.concat([old_btc_data, latest_data_btc], ignore_index=True)
latest_data_btc = latest_data_btc.drop_duplicates(keep='last')
latest_data_btc = latest_data_btc.set_index('datetime')
latest_data_btc.to_sql(daily_data, if_exists='replace', con=connection)
print(f"Pushed BTC daily data to the database")
except:
print(f"BTC daily data to the database update FAILED!!!!!!")
connection.close()
if __name__ == '__main__':
main()
I learnt that the error is to related to creating a connection, somehow creating a connection using (connection = engine.connect()) is freezing any other modifications to the database table, it started working out for me by using the engine as the connection directly as follows:
connection = create_engine(DATABASE_URI, echo=False, poolclass=NullPool)
Edit: Much better to use sessions with psycopg2 cursors, exception handling would be easy.
I'm trying to set up some simple web service with Python and Flask and SQlite3. It doesn't work.
The DB connection without web service works; the web service without DB connections works. Together they don't.
if I run this, it works:
import sqlite3
conn = sqlite3.connect('scuola.db')
sql = "SELECT matricola,cognome,nome FROM studenti"
cur = conn.cursor()
cur.execute(sql)
risultato = cur.fetchall()
conn.close()
print(risultato)
(so query is correct)
and if I run this, it works
import flask
app = flask.Flask(__name__)
def funzione():
return 'Applicazione Flask'
app.add_url_rule('/', 'funzione', funzione)
but if I run this...
from flask import Flask
import sqlite3
app = Flask(__name__)
#app.route('/',methods=['GET'])
def getStudenti():
conn = sqlite3.connect('scuola.db')
sql = "SELECT matricola,cognome,nome FROM studenti"
cur = conn.cursor()
cur.execute(sql)
risultato = cur.fetchall()
conn.close()
return risultato
It returns Internal Server Error in the browser, and
sqlite3.OperationalError: no such table: studenti
on the DOS prompt.
Thank you for your help!
You haven't provided the internal server error output - but my first guess is that you're trying to return the raw list object returned from fetchall.
When returning from a view function you need to send the results either by returning a template, or by jsonifying the output to make it a proper HTTP response that the browser can receive.
You need to add
from flask import jsonify
in your imports, then when returning;
return jsonify(risultato)
If you get errors like something is not JSON serializable if means you're trying to send an instance of a class or similar. You'll need to make sure you're returning only plain python data structures (e.g. list/dict/str etc).
For the command line problem, you need to make sure you've ran a CREATE TABLE command to first generate the table in the database, before you select from it. Also check you're accessing the correct sqlite database file with the table in it.
I'm not sure, but from the look of things I don't think you've configured the flask app to support the db you created There should be some sort of app.config() that integrates the db.
There are two Python scripts. Script A keeps reading the database for records and then send out messages. Script B keeps finding data and inserting into the database. Both of them keeps running without being terminated.
If there are unhandled records in the database, when script A starts, script A can see these records, and handle them correctly. Script A is now idle. After some time, script B insert some new records. However, script A cannot see these newly added records.
In script B, after inserting the records, I have connection.commit(). The records do in the database.
script A does not work correctly when:
persist = PersistentDB(pymysql, host=DB_HOST, port=DB_PORT, user=DB_USERNAME, passwd=DB_PASSWORD, db=DB_DATABASE, charset='utf8mb4')
while True:
connection = persist.connection()
cursor = connection.cursor()
cursor.exec("SELECT...")
script A works correctly when:
while True:
persist = PersistentDB(pymysql, host=DB_HOST, port=DB_PORT, user=DB_USERNAME, passwd=DB_PASSWORD, db=DB_DATABASE, charset='utf8mb4')
connection = persist.connection()
cursor = connection.cursor()
cursor.exec("SELECT...")
Should I really use the PersistentDB in this way? It seems to override the persist connection pool everytime I declare it. Please tell me what is the correct way to do it.
P.S.
Connecting to MySQL 8 with Python 3, pymysql and DBUtils
This is my code shown below that i use to connect to the database and get some tables. I have used this for a while without problem but recently I keep on getting this two errors after it runs for a few times. I assume it is something to do with the database.
Basically what happens is that i can connect to the databases, retrieve my table but since my data is very huge I do it on a for loop where I loop through dates within a given range and get the data. For example I start from 20160401 and count one by one until 20170331(about 365 loops).
However after i run like 20 loops i get this error s
OperationError "Lost connection to MySQL server during query (%s)" %
(e,))
and sometimes this one below
BaseSSHTunnelForwarderError: Could not establish session to SSH
gateway
I would like to solve this errors but if it is not possible I would like to wait for a few seconds then reconnect to the databases automatically then execute the same query. Any help in solving this will be highly appreciated.
import pymysql
from sshtunnel import SSHTunnelForwarder
with SSHTunnelForwarder(
(ssh_host, ssh_port),
ssh_username=ssh_user,
ssh_pkey=mypkey,
remote_bind_address=(sql_ip, sql_port)) as tunnel:
conn = pymysql.connect(host='127.0.0.1',
user=sql_username,
passwd=sql_password,
port=tunnel.local_bind_port,
db=db,charset='ujis')
data = pd.read_sql_query(query_2, conn,)
This is what I have tried so far but it is not working
while True:
try:
with SSHTunnelForwarder(
(ssh_host, ssh_port),
ssh_username=ssh_user,
ssh_pkey=mypkey,
remote_bind_address=(sql_ip, sql_port)) as tunnel:
conn = pymysql.connect(host='127.0.0.1',
user=sql_username,
passwd=sql_password,
port=tunnel.local_bind_port,
db=db, charset='ujis')
data = pd.read_sql_query(query, conn)
break
except pymysql.err.OperationalError as e:
conn.ping(reconnect=True)
I'm trying to log all SQLAlchemy queries to the console while parsing the query and filling in the parameters (e.g. translating :param_1 to 123). I managed to find this answer on SO that does just that. The issue I'm running into is that parameters don't always get translated.
Here is the event I'm latching onto -
#event.listens_for(Engine, 'after_execute', named=True)
def after_cursor_execute(**kw):
conn = kw['conn']
params = kw['params']
result = kw['result']
stmt = kw['clauseelement']
multiparams = kw['multiparams']
print(literalquery(stmt))
Running this query will fail to translate my parameters. Instead, I'll see :param_1 in the output -
Model.query.get(123)
It yields a CompileError exception with message Bind parameter '%(38287064 param)s' without a renderable value not allowed here..
However, this query will translate :param_1 to 123 like I would expect -
db.session.query(Model).filter(Model.id == 123).first()
Is there any way to translate any and all queries that are run using SQLAlchemy?
FWIW I'm targeting SQL Server using the pyodbc driver.
If you set up the logging framework, you can get the SQL statements logged by setting the sqlalchemy.engine logger at INFO level, e.g.:
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)