Base live update in Postgres - python-3.x

Base live update in Postgres
Hello guys.
I have a Postgres database where I create my dataframe.
So my graphics work, my calculations work, almost everything is beautiful.
The only problem is that if the database changes I need to stop the Flask service and restart, only then my dashboard is updated.
I tried a lot before coming back here with this answer.
I'm taking a beating with update database.
I can't get dash to update my charts and tables by itself.
I tried on other forums but was not successful.
I'll post my code here updated and minimized.
There will be two links to the drive, one with the postgres database and the other with the code.
the code is relatively small the part that doesn't work properly.
I will send you two links:
1 - Containing the complete code:
https://drive.google.com/file/d/1c0e8UelUgGVx_IrWLFncwc2JRPraUS79/view?usp=share_link
I have a few observations for this code to work:
a) Install the libraries highlighted in the teste.py file before the imports.
b) The database access configuration I made in a file called pg_config.json, just edit and inform your bank credentials.
2 - Containing the Postgres database:
https://drive.google.com/file/d/1Fwrc0xAMfVnv0_lUSHjNTO3HnQIkUvHN/view?usp=sharing
teste.py is a table that gets information from df_data_generals

I did something like below with mysql. Please refer it and check does it match your requirment.
dcc.Store(id='store-data', data=[], storage_type='memory'), # 'local' or 'session'
dcc.Interval(id='update', n_intervals = 0, interval=1000*30)
])
#app.callback(Output('store-data','data'),
[Input('update', 'n_intervals')])
def update_data(n):
global jobs_2
db = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="indeed_data_dump")
cur = db.cursor()
cur.execute("SELECT * FROM jobs")
columns = [col[0] for col in mycursor.description]
data = mycursor.fetchall()
jobs_2 = pd.DataFrame(data, columns=columns)
db.close()
jobs_2.to_dict()

Related

Updating tables on Postgres when AWS RDS causing connection freezes

I have been using AWS for about 6 months now for some simple tasks like using EC2 to setup python scripts and cronjobs, updating tables on postgres. So far everything has been great but recently everything went nasty after the cronjobs that update tables increased upto 20, so there are multiple collisions of them simultaneously trying to connect to the database modify the tables(but no 2 scripts try to modify the same table at once). they started failing randomly and I am starting to see most of them fail, I need to reboot the aws rds instance for the cronjobs to work properly but again the same things repeat however. As you can see here, the db connections keep increasing and once I restart it, it's back to normal, same with the current activity sessions(wonder how it could be a float to be honest)
Error:
Exception: Unexpected exception type: ReadTimeout
I am wondering if the way I connect to the database is the problem or what? I can't even figure out if it's some measure from RDS to freeze any modifications to the tables because of many connection attempts. The thing is I am sure the script works, because when I stop all cronjobs, reboot the rds instance and run this script it works fine, but when I start all of them again, it's failing. Can someone please help me with some leads on how to figure this out?
Most of my scripts look pretty similar to the following, there are scripts that run once in a day and that runs once every 5 mins too
def get_daily_data(coin, start_date, time_period):
return df
def main():
coin1 = 'bitcoin'
time_period = '1d'
DATABASE_URI = f'postgresql+psycopg2://{USR}:{token}#{ENDPOINT}:{PORT}/{DBNAME}'
engine = create_engine(DATABASE_URI, echo=False, poolclass=NullPool)
connection = engine.connect()
if connection is not None:
old_btc_data = pd.read_sql("SELECT * FROM daily_data", connection)
start_date= str(old_btc_data['datetime'].iloc[-1].date() - datetime.timedelta(days=7))
latest_data_btc = get_daily_data(coin1, start_date, time_period)
if latest_data_btc is not None:
try:
latest_data_btc = latest_data_btc.reset_index()
latest_data_btc['datetime'] = latest_data_btc['datetime'].dt.tz_localize(None)
latest_data_btc = pd.concat([old_btc_data, latest_data_btc], ignore_index=True)
latest_data_btc = latest_data_btc.drop_duplicates(keep='last')
latest_data_btc = latest_data_btc.set_index('datetime')
latest_data_btc.to_sql(daily_data, if_exists='replace', con=connection)
print(f"Pushed BTC daily data to the database")
except:
print(f"BTC daily data to the database update FAILED!!!!!!")
connection.close()
if __name__ == '__main__':
main()
I learnt that the error is to related to creating a connection, somehow creating a connection using (connection = engine.connect()) is freezing any other modifications to the database table, it started working out for me by using the engine as the connection directly as follows:
connection = create_engine(DATABASE_URI, echo=False, poolclass=NullPool)
Edit: Much better to use sessions with psycopg2 cursors, exception handling would be easy.

In python pyramid web framework, how can I drop all db table rows before seeding?

I am using a cookiecutter to make a pyramid web app.
It has a function to seed the db here:
https://github.com/Pylons/pyramid-cookiecutter-starter/blob/latest/%7B%7Bcookiecutter.repo_name%7D%7D/%7B%7Bcookiecutter.repo_name%7D%7D/sqlalchemy_scripts/initialize_db.py#L15
But if I run it twice, or change the entries that I am adding, I get duplicate entries and errors. I am using a sqlite db with sqlalchemy.
What code can I add inside setup_models that will drop db all db rows before writing the new model instances?
It would be great if this looped over all models and deleted all instances of them.
def setup_models(dbsession):
"""
Add or update models / fixtures in the database.
"""
model = models.mymodel.MyModel(name='one', value=1)
dbsession.add(model)
I am updating the db by running:
# to run the initial migration that adds the tables to the db, run this once
venv/bin/alembic -c development.ini upgrade head
# seed the data, I want to be able to keep editing the seed data
# and re-run this command and have it will wipe the db rows and insert the seed data defined in setup_models
venv/bin/initialize_suppah_db development.ini
By default, SQLite does not enforce foreign key constraints at the engine level (even if you have declared them in the table DDL), so you could probably just use something as simple as
insp = inspect(engine)
with engine.begin() as conn:
for table_name in insp.get_table_names():
conn.exec_driver_sql(f'DELETE FROM "{table_name}"')
One can do this by:
looping over all model classes
making all instances of those classes as to be deleted
committing the session/transaction to delete them
THEN seeding the data
the below code does this:
import transaction
from ..models.meta import Base
def delete_table_rows(dbsession):
model_clases = [cls for cls in Base.__subclasses__()]
with transaction.manager as tx:
for model_clases in model_clases:
for instance in dbsession.query(model_clases).all():
dbsession.delete(instance)
transaction.commit()
def setup_models(dbsession):
"""
Add or update models / fixtures in the database.
"""
delete_table_rows(dbsession)
# your custom seed code here
model = models.mymodel.MyModel(name='one', value=1)
dbsession.add(model)

How to update, and then select the updated data with Python and DBUtils PersistentDB?

There are two Python scripts. Script A keeps reading the database for records and then send out messages. Script B keeps finding data and inserting into the database. Both of them keeps running without being terminated.
If there are unhandled records in the database, when script A starts, script A can see these records, and handle them correctly. Script A is now idle. After some time, script B insert some new records. However, script A cannot see these newly added records.
In script B, after inserting the records, I have connection.commit(). The records do in the database.
script A does not work correctly when:
persist = PersistentDB(pymysql, host=DB_HOST, port=DB_PORT, user=DB_USERNAME, passwd=DB_PASSWORD, db=DB_DATABASE, charset='utf8mb4')
while True:
connection = persist.connection()
cursor = connection.cursor()
cursor.exec("SELECT...")
script A works correctly when:
while True:
persist = PersistentDB(pymysql, host=DB_HOST, port=DB_PORT, user=DB_USERNAME, passwd=DB_PASSWORD, db=DB_DATABASE, charset='utf8mb4')
connection = persist.connection()
cursor = connection.cursor()
cursor.exec("SELECT...")
Should I really use the PersistentDB in this way? It seems to override the persist connection pool everytime I declare it. Please tell me what is the correct way to do it.
P.S.
Connecting to MySQL 8 with Python 3, pymysql and DBUtils

Sqlalchemy Snowflake not closing connection after successfully retrieving results

I am connecting to snowflake datawarehouse from Python and I encounter a weird behavior. The Python program exits successfully if I retrieve fewer number of rows from SnowFlake but hangs in there in-definitely if I try to retrieve more than 200K rows. I am 100% sure that there are no issues with my machine because I am able to retrieve 5 to 10 million rows from other type of database systems such as Postgres.
My Python environment is Python 3.6 and I use the following version of the libraries -> SQLAlchemy 1.1.13, snowflake-connector-python 1.4.13, snowflake-sqlalchemy 1.0.7,
The following code prints the total number of rows and closes the connection.
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(URL(
account=xxxx,
user=xxxxx,
password=xxxxx,
database=xxxxx,
schema=xxxxxx,
warehouse=xxxxx))
query = """SELECT * FROM db_name.schema_name.table_name LIMIT 1000"""
results = engine.execute(query)
print (results.rowcount)
engine.dispose()
The following code prints the total number of rows but the connection doesn't close, it just hangs in there until I manually kill the Python process.
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(URL(
account=xxxx,
user=xxxxx,
password=xxxxx,
database=xxxxx,
schema=xxxxxx,
warehouse=xxxxx))
query = """SELECT * FROM db_name.schema_name.table_name LIMIT 500000"""
results = engine.execute(query)
print (results.rowcount)
engine.dispose()
I tried multiple different tables and I encounter the same issue with SnowFlake. Did anyone encounter similar issues?
Can you check the query status from UI? "History" page should include the query. If the warehouse is not ready, it may take a couple of minutes to start the query. (I guess that's very unlikely, though).
Try changing the connection to this:
connection = engine.connect()
results = connection.execute(query)
print (results.rowcount)
connection.close()
engine.dispose()
SQLAlchemy's dispose doesn't close the connection if the connection is not explicitly closed. I inquired before, but so far the workaround is just close the connection.
https://groups.google.com/forum/#!searchin/sqlalchemy/shige%7Csort:date/sqlalchemy/M7IIJkrlv0Q/HGaQLBFGAQAJ
Lastly, if the issue still persist, add the logger to top:
import logging
for logger_name in ['snowflake','botocore']:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.FileHandler('log')
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
and collect log.
If the output is too long to fit into here, I can take it at the issue page at https://github.com/snowflakedb/snowflake-sqlalchemy.
Note I tried it myself but cannot reproduce the issue so far.
Have you tried using a with statement instead to make your connection
instead of this:
engine = create_engine(URL(account=xxxx,user=xxxxx,password=xxxxx,database=xxxxx,schema=xxxxxx,warehouse=xxxxx))
results = engine.execute(query)
do the following:
with create_engine(URL(account=xxxx,user=xxxxx,password=xxxxx,database=xxxxx,schema=xxxxxx,warehouse=xxxxx)) as engine:
# do work
results = engine.execute(query)
...
After the with .. the engine object should be automatically closed out.

Resource Conflict after syncing with PouchDB

I am new to CouchDB / PouchDB and until now I somehow could manage the start of it all. I am using the couchdb-python library to send initial values to my CouchDB before I start the development of the actual application. Here I have one database with templates of the data I want to include and the actual database of all the data I will use in the application.
couch = couchdb.Server()
templates = couch['templates']
couch.delete('data')
data = couch.create('data')
In Python I have a loop in which I send one value after another to CouchDB:
value = templates['Template01']
value.update({ '_id' : 'Some ID' })
value.update({'Other Attribute': 'Some Value'})
...
data.save(value)
It was working fine the whole time, I needed to run this several times as my data had to be adjusted. After I was satisfied with the results I started to create my application in Javascript. Now I synced PouchDB with the data database and it was also working. However, I found out that I needed to change something in the Python code, so I ran the first python script again, but now I get this error:
couchdb.http.ResourceConflict: (u'conflict', u'Document update conflict.')
I tried to destroy() the pouchDB database data and delete the CouchDB database as well. But I still get this error at this part of the code:
data.save(value)
What I also don't understand is, that a few values are actually passed to the database before this error comes. So some values are saved() into the db.
I read it has something to do with the _rev values of the documents, but I cannot get an answer. Hope someone can help here.

Resources