aiomysql select data problem: not updated - python-3.x

version:
Python 3.6.9
aiomysql: 0.0.20
aiohttp: 3.6.2
problem:
where mysql table data deleted or inserted, query data is not updated for hours, unless web_app restart.
codes using aiomysql pool:
# initial
pool = await aiomysql.create_pool(
# echo=True,
db=conf['database'],
user=conf['user'],
password=conf['password'],
host=conf['host'],
port=conf['port'],
minsize=conf['minsize'],
maxsize=conf['maxsize'],
)
# query
async def get_data(request)::
cmd = 'select a,b,c from tbl where d = 0'
# request.app['db'] == pool
async with request.app['db'].acquire() as conn:
async with conn.cursor() as cur:
await cur.execute(cmd)
...
current solution:
set pool_recycle=20 when aiomysql.create_pool seems solve the problem. but why? other better way?

Related

Using "UPDATE" and "SET" in Python to Update Snowflake Table

I have been using Python to read and write data to Snowflake for some time now to a table I have full update rights to using a Snowflake helper class my colleague found on the internet. Please see below for the class I have been using with my personal Snowflake connection information abstracted and a simply read query that works given you have a 'TEST' table in your schema.
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
import keyring
import pandas as pd
from sqlalchemy import text
# Pull the username and password to be used to connect to snowflake
stored_username = keyring.get_password('my_username', 'username')
stored_password = keyring.get_password('my_password', 'password')
class SNOWDBHelper:
def __init__(self):
self.user = stored_username
self.password = stored_password
self.account = 'account'
self.authenticator = 'authenticator'
self.role = stored_username + '_DEV_ROLE'
self.warehouse = 'warehouse'
self.database = 'database'
self.schema = 'schema'
def __connect__(self):
self.url = URL(
user=stored_username,
password=stored_password,
account='account',
authenticator='authenticator',
role=stored_username + '_DEV_ROLE',
warehouse='warehouse',
database='database',
schema='schema'
)
# =============================================================================
self.url = URL(
user=self.user,
password=self.password,
account=self.account,
authenticator=self.authenticator,
role=self.role,
warehouse=self.warehouse,
database=self.database,
schema=self.schema
)
self.engine = create_engine(self.url)
self.connection = self.engine.connect()
def __disconnect__(self):
self.connection.close()
def read(self, sql):
self.__connect__()
result = pd.read_sql_query(sql, self.engine)
self.__disconnect__()
return result
def write(self, wdf, tablename):
self.__connect__()
wdf.to_sql(tablename.lower(), con=self.engine, if_exists='append', index=False)
self.__disconnect__()
# Initiate the SnowDBHelper()
SNOWDB = SNOWDBHelper()
query = """SELECT * FROM """ + 'TEST'
snow_table = SNOWDB.read(query)
I now have the need to update an existing Snowflake table and my colleague suggested I could use the read function to send the query containing the update SQL to my Snowflake table. So I adapted an update query I use successfully in the Snowflake UI to update tables and used the read function to send it to Snowflake. It actually tells me that the relevant rows in the table have been updated, but they have not. Please see below for update query I use to attempt to change a field "field" in "test" table to "X" and the success message I get back. Not thrilled with this hacky update attempt method overall (where the table update is a side effect of sorts??), but could someone please help with method to update within this framework?
# Query I actually store in file: '0-Query-Update-Effective-Dating.sql'
UPDATE "Database"."Schema"."Test" AS UP
SET UP.FIELD = 'X'
# Read the query in from file and utilize it
update_test = open('0-Query-Update-Effective-Dating.sql')
update_query = text(update_test.read())
SNOWDB.read(update_query)
# Returns message of updated rows, but no rows updated
number of rows updated number of multi-joined rows updated
0 316 0
SQL2Pandas | UPDATE row(s) in pandas

Python3: Multiprocessing closing psycopg2 connections to Postgres at AWS RDS

I'm trying to write chunk of 100000 rows to AWS RDS PostgreSQL server.
I'm using psycopg2.8 and multiprocessing. I'm creating new connection in each process and preparing the SQL statement as well. But every time random amount of rows are getting inserted. I assume the issue is the python multiprocessing library closing wrong connections, which is mentioned here:multiprocessing module and distinct psycopg2 connections
and here:https://github.com/psycopg/psycopg2/issues/829 in one of the comment.
The RDS server logs says:
LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on client connection with an open transaction
Here is the skeleton of the code:
from multiprocessing import Pool
import csv
from psycopg2 import sql
import psycopg2
from psycopg2.extensions import connection
def gen_chunks(reader, chunksize= 10 ** 5):
"""
Chunk generator. Take a CSV `reader` and yield
`chunksize` sized slices.
"""
chunk = []
for index, line in enumerate(reader):
if (index % chunksize == 0 and index > 0):
yield chunk
del chunk[:]
chunk.append(line)
yield chunk
def write_process(chunk, postgres_conn_uri):
conn = psycopg2.connect(dsn=postgres_conn_uri)
with conn:
with conn.cursor() as cur:
cur.execute(
'''PREPARE scrape_info_query_plan (int, bool, bool) AS
INSERT INTO schema_name.table_name (a, b, c)
VALUES ($1, $2, $3)
ON CONFLICT (a, b) DO UPDATE SET (c) = (EXCLUDED.c)
'''
)
for row in chunk:
cur.execute(
sql.SQL(
''' EXECUTE scrape_info_query_plan ({})''').format(sql.SQL(', ').join([sql.Literal(value) for value in [1,True,True]]))
)
pool = Pool()
reader = csv.DictReader('csv file path', skipinitialspace=True)
for chunk in gen_chunks(reader):
#chunk is array of row's(100000) from csv
pool.apply_async(write_process, [chunk, postgres_conn_uri])
commands to create required DB stuff:
1. CREATE DATABASE foo;
2. CREATE SCHEMA schema_name;
3. CREATE TABLE table_name (
x serial PRIMARY KEY,
a integer,
b boolean,
c boolean;
Any suggestions on this?
Note: I'm having EC2 instance with 64 vCPU and I can see 60 to 64 parallel connection on my RDS instance.

While exporting SELECT statement result to BigQuery only empty table is created

I am trying to export a select statement results to another table as a permanent storage. But, whenenver that new table is created it is schemaless. When I try to query that result table an error is shown:
Table project-id.dataset_name.temp_table does not have a schema.
Here is my code to export result from SELECT statement to a temporary tab
def query_to_table():
service_account_info = {} # account info
credentials = Credentials.from_service_account_info(
service_account_info)
client = bigquery.Client(
project=service_account_info.get("project_id"),
credentials=credentials)
query = """
SELECT
a,
b
FROM `project.dataset.table`
WHERE a NOT IN ('error', 'warning')
"""
destination_dataset = client.dataset("abc_123") #this is another dataset
destination_table = destination_dataset.table("temp_table") # destination table
try:
client.get_table(destination_table)
client.delete_table(destination_table)
except Exception as e:
# Some logging
pass
client.create_table(Table(destination_table))
# Execute the job and save to table
job_config = bigquery.QueryJobConfig()
job_config.allow_large_results = True
job_config.use_legacy_sql = False
job_config.destination = destination_table
job_config.dry_run = True
query_job = client.query(query, job_config=job_config)
# Wait till the job done
while not query_job.done():
time.sleep(1)
logging.info(f"Processed {query_job.total_bytes_processed} bytes.")
return destination_table
Where is the mistake? Is there any API changes from the Google Cloud side?
Because this script was working one month earlier.
Please help.
Damn! I just figured it out, it was because I set the dry_run to True.
According to this: https://stackoverflow.com/a/28355802/4494547, if dry_run is set to True, it just evaluates the Query without actually running the job.
Took me 5 hours busting my head. :(

pymssql - SELECT works but UPDATE doesn't

import pymssql
import decimal
CONN = pymssql.connect(server='1233123123', user='s123', password='sa1231231', database='DBforTEST')
CURSOR = CONN.cursor()
"""it is good code. here is no problem"""
CURSOR.execute("SELECT ttt from test where w=2")
ROW = CURSOR.fetchone()
tmp = list()
tmp.append(ROW)
if ROW is None:
print("table has nothing")
else:
while ROW:
ROW = CURSOR.fetchone()
tmp.append(ROW)
print(tmp)
"""it works!"""
CURSOR.execute("""
UPDATE test
SET
w = 16
where ttt = 1
""")
"it doesnt works"
I'm using python 3.5 with pymssql.
In my code, SELECT state works, so I can guarantee the connection is perfect.
But the UPDATE state doesn't work in Python.
The same code works in SSMS.
What is the problem?
I guess SELECT state is only for read, so DB can provide Data, but UPDATE is modifying DB, so DB blocks it.
How can I solve it?
CONN.commit()
if autocommit is not set then you have to commit yourself.

How do I insert new data to database every minute from api in python

Hi am getting a weather data from an API every 10 minutes and am using python. I have managed to connect to the api and gets the data and gets the code to run every 10mins using thread. However the data that is being recorded every 10mins is the same without getting the new weather data. I will like it to have new rows inserted since the station updates new records every 10mins. Thanks in advance for your help.
below is my code.
timestr=datetime.now()
for data in retrieve_data():
dashboard_data = data['dashboard_data']
Dew_point = dashboard_data['Temperature'] - (100 - dashboard_data['Humidity']) / 5
weather_list = [time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(dashboard_data['time_utc'])),
dashboard_data['Temperature'],
dashboard_data['Humidity'],
Dew_point,
dashboard_data['Pressure'],
dashboard_data['Noise'],
dashboard_data['CO2']
]
def insert_weather_data(*arg):
sql = (
"""INSERT INTO weather_data(time,temperature,
humidity,
dew_point,
pressure,
noise,
CO2 ) VALUES(%s,%s,%s,%s,%s,%s,%s);
"""
)
conn = None
weather_id = None
# read database configuration
params = config()
# connect to postgres database
conn = psycopg2.connect(**params)
# create a new cursor
cur = conn.cursor()
try:
# execute the insert statement
cur.execute(sql,arg)
conn.commit()
except(Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
def repeat_func_call():
insert_weather_data(weather_list)
threading.Timer(120, repeat_func_call, weather_list).start()
repeat_func_call()

Resources