BigQuery update how to get number of updated rows - python-3.x

I am using Google Cloud Functions to connect to a Google Bigquery database and update some rows. The cloud function is written using Python 3.
I need help figuring out how to get the result message or the number of updated/changed rows whenever I run an update dml through the function. Any ideas?
from google.cloud import bigquery
def my_update_function(context,data):
BQ = bigquery.Client()
query_job = BQ.query("Update table set etc...")
rows = query_job.result()
return (rows)
I understand that rows always come back as _emptyrowiterator object. Any way i can get result or result message? Documentation says I have to get it from a bigquery job method. But can't seem to figure it out.

I think that you are searching for QueryJob.num_dml_affected_rows. It contain number of rows affected by update or any other DML statement. If you just paste it to your code instead of rows in return statement you will get number as int or you can create some massage like :
return("Number of updated rows: " + str(job_query.num_dml_affected_rows))
I hope it will help :)

Seems like there is no mention in the bigquery Python DB-API documentation on rows returned. https://googleapis.dev/python/bigquery/latest/reference.html
I decided to use a roundabout method on dealing with this issue by generating a SELECT statement first to check if there are any matches to the WHERE clause in the UPDATE statement.
Example:
from google.cloud.bigquery import dbapi as bq
def my_update_function(context,data):
try:
bq_connection = bq.connect()
bq_cursor = bq_connection.cursor()
bq_cursor.execute("select * from table where ID = 1")
results = bq_cursor.fetchone()
if results is None:
print("Row not found.")
else:
bq_cursor.execute("UPDATE table set name = 'etc' where ID = 1")
bq_connection.commit()
bq_connection.close()
except Exception as e:
db_error = str(e)

Related

DynamoDB Query Reserved Words using boto3 Resource?

I've been playing around with some "weirder" querying of DynamoDB with Reserved Words using boto3.resource method, and came across a pretty annoying issue which I can't resolve for quite some time (Always the same error sigh), and can't seem to find the answer anywhere.
My code is the following:
import logging
import boto3
from boto3.dynamodb.conditions import Key
logger = logging.getLogger()
logger.setLevel(logging.INFO)
TABLE_NAME = "some-table"
def getItems(record, table=None):
if table is None:
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(TABLE_NAME)
record = str(record)
get_item = table.query(
KeyConditionExpression=Key("#pk").eq(":pk"),
ExpressionAttributeNames={"#pk": "PK"},
ExpressionAttributeValues={":pk": record},
)
logger.info(
f"getItem parameters\n{json.dumps(get_item, indent=4,sort_keys=True, default=str)}"
)
return get_item
if __name__ == "__main":
record = 5532941
getItems(record)
It's nothing fancy, as I mentioned I'm just playing around, but I'm constantly getting the following error no matter what I try:
"An error occurred (ValidationException) when calling the Query operation: Value provided in ExpressionAttributeNames unused in expressions: keys: {#pk}"
As far as I understand in order to "replace" the reserved keys/values with something arbitrary you put it into ExpressionAttributeNames and ExpressionAttributeValues, but I can't wrap my head around as to why it's telling me that this key is not used.
I should mention that this Primary Key exists with this value in the record var in DynamoDB.
Any suggestions?
Thanks
If you're using Key then just provide the string values and don't be fancy with substitution. See this example:
https://github.com/aws-samples/aws-dynamodb-examples/blob/master/DynamoDB-SDK-Examples/python/WorkingWithQueries/query_equals.py
If you're writing an equality expression as a single string, then you need the substitution. See this example:
https://github.com/aws-samples/aws-dynamodb-examples/blob/master/DynamoDB-SDK-Examples/python/WorkingWithQueries/query-consistent-read.py

Cassandra Python driver not fetching the next page of results

DataStax driver for Cassandra Version 3.25.0,
Python version 3.9
Session.execute() fetches the first 100 records. As per the documentation, the driver is supposed to
tranparently fetch next pages as we reach the end of first page. However, it fetches the same page again and again and hence the first 100 records is all that is ever accessible.
The for loop that prints records goes infinite.
ssl_context.verify_mode = CERT_NONE
cluster = Cluster(contact_points=[db_host], port=db_port,
auth_provider = PlainTextAuthProvider(db_user, db_pwd),
ssl_context=ssl_context
)
session = cluster.connect()
query = "SELECT * FROM content_usage"
statement = SimpleStatement(query, fetch_size=100)
results = session.execute(statement)
for row in results:
print(f"{row}")
I could see other similar threads, but they are not answered too. Has anyone encountered this issue before? Any help is appreciated.
I'm a bit confused by the initial statement of the problem. You mentioned that the initial page of results is fetched repeatedly and that these are the only results available to your program. You also indicated that the for loop responsible for printing results turns into an infinite loop when you run the program. These statements seem contradictory to me; how can you know what the driver has fetched if you never get any output? I'm assuming that's what you meant by "goes infinite"... if I'm wrong please correct me.
The following code seems to run as expected against Cassandra 4.0.0 using cassandra-driver 3.25.0 on Python 3.9.0:
import argparse
import logging
import time
from cassandra.cluster import Cluster, SimpleStatement
def setupLogging():
log = logging.getLogger()
log.setLevel('DEBUG')
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s"))
log.addHandler(handler)
def setupSchema(session):
session.execute("""create keyspace if not exists "baz" with replication = {'class':'SimpleStrategy', 'replication_factor':1};""")
session.execute("""create table if not exists baz.qux (run_ts bigint, idx int, uuid timeuuid, primary key (run_ts,idx))""")
session.execute("""truncate baz.qux""")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('-d','--debug', action='store_true')
args = parser.parse_args()
cluster = Cluster()
session = cluster.connect()
if args.debug:
setupLogging()
setupSchema(session)
run_ts = int(time.time())
insert_stmt = session.prepare("""insert into baz.qux (run_ts,idx,uuid) values (?,?,now())""")
for idx in range(10000):
session.execute(insert_stmt, [run_ts, idx])
query = "select * from baz.qux"
stmt = SimpleStatement(query, fetch_size=100)
results = session.execute(stmt)
for row in results:
print(f"{row}")
cluster.shutdown()
$ time (python foo.py | wc -l)
10000
real 0m12.452s
user 0m3.786s
sys 0m2.197s
You might try running your sample app with debug logging enabled (see sample code above for how to enable this). It sounds like something might be off in your Cassandra configuration (or perhaps your client setup); the additional logging might help you identify what (if anything) is getting in the way.
The logic in your code is only calling execute() once so the contents of results will only ever be the same list of 100 rows.
You need to call execute() in your loop to get the next page of results like this:
query = "SELECT * FROM content_usage"
statement = SimpleStatement(query, fetch_size=100)
for row in session.execute(statement):
process_row(row)
For more info, see Paging with the Python driver. Cheers!
Below is the code snippet that finally worked for me, after restricting the driver version to 3.20:
statement = session.prepare(query)
# Execute the query once and retrieve the first page of results
results = self.session.execute(statement, params)
for row in results.current_rows:
process_row(row)
# Fetch more pages until they exhaust
while results.has_more_pages:
page_state = results.paging_state
results = session.execute(statement, parameters=params, paging_state=page_state)
for row in results.current_rows:
process_row(row)

How to test mysql queries using sqlalchemy and sqlite?

I have the following code structure written in Python3.6, which I need to test using sqlite3 (because of standards defined in my project):
class BigSecretService:
""" Class designed to make calculations based on data stored in MySQL. """
def load_data(self):
# load some data using sqlalchemy ORM
def get_values_from_fields(self, fields):
# here's getting values via sqlalchemy execute with raw query:
self.sql_service.execute(SOME_QUERY)
def process_data(self, data, values):
# again execute some raw query
# process data and put into result list
return reuslt_list
def make_calculations(self, params):
data = self.load_data()
values = self.get_values_from_fields(fields)
result_vector = process_data(data, values)
SOME_QUERY is in separate module and it's format looks like this:
"SELECT SUM(some_field) FROM some_table WHERE col1 = :col1 AND col2 = :col2"
To cover make_calculations in my component test I designed awful patches:
class PatchedConnection:
""" Class is used to transform queries to sqlite format before executing. """
def __init__(self, connection, engine):
self.connection = connection
self.engine = engine
def __call__(self):
conn = self.connection()
conn.execute = self.patched_execute(conn.execute)
return conn
def transform_date(self, date):
try:
# quick check just for testing
if '+00:00' in date:
date = date.replace('T', ' ').replace('+00:00', '.000000')
finally:
return date
def patched_execute(self, f_execute):
def prepare_args_for_sqlite(query, *args):
# check if query is in sqlite format
if args:
if '?' in str(query):
args = list(map(self.transform_date, list(args[0].values())))
return self.engine.execute(str(query), args)
return f_execute(query, args[0])
else:
return f_execute(query)
return prepare_args_for_sqlite
Then in test it looks like this:
QUERY_TEMPLATE_SQLITE = 'SELECT SUM(some_field) FROM some_table WHERE col1 = ? AND col2 = ?'
with mock.patch('path_to_my_service.SOME_QUERY', QUERY_TEMPLATE_SQLITE):
self.sql_service.get_connection = PatchedConnection(self.sql_service.get_connection, self.engine)
response = self.client.simulate_post("/v1/secret_service/make_calculations",
headers=self.auth_header,
body=json.dumps(payload))
self.assertEqual(response.status_code, 200)
# then check response.text
It works so far, but I believe there must be much better solution. Moreover, in patched_execute args from dict are being converted to list, and who knows if order of dict values will be the same all the time.
So, my question is how to perform such testing in a correct way with given tools?
If you need to intercept and manipulate the SQL being sent to the database then using core events https://docs.sqlalchemy.org/en/13/core/events.html would be the most straightforward way of doing this. The before_cursor_execute event would suit your purposes as outlined in the following example from the SQLAlchemy documentation.
#event.listens_for(engine, "before_cursor_execute", retval=True)
def before_cursor_execute(conn, cursor, statement, parameters, context, executemany):
# do something with statement, parameters
return statement, parameters
From the example you have given however, I'm not sure that this is necessary. The MySQL query you have listed is also a valid SQLite query and needs no manipulation. Also if you pass your parameters as python objects, rather than as strings, then again no manipulation should be needed as SQLAlchemy will map these correctly to the backend.

Cx_Oracle fetch crash

So I've queried data from oracle database using cursor.execute(). A relatively simple select query. It works.
But when I try to fetch data from it, python crashes.
The same occurs for fetchall(), fetchmany() and fetchone().
When the query first broke in fetchmany() I decided to loop through fetchone() and it worked for the first two rows then broke at the third.
I'm guessing it is because there's too much data in third row.
So, is there any way to bypass this issue and pull the data?
(Please ignore the wrong indents could not copy properly in my phone)
EDIT:
I removed four columns with type "ROWID". There was no issue after that. I was easily able to fetch 100 rows in one go.
So to confirm my suspicion I went ahead and created another copy with only those rowed columns and it crashes as expected.
So is there any issue with ROWID type?
Test table for the same.
Insert into TEST_FOR_CX_ORACLE (Z$OEX0_LINES,Z$OEX0_ORDER_INVOICES,Z$OEX0_ORDERS,Z$ITEM_ROWID) values ('ABoeqvAEyAAB0HOAAM','AAAL0DAEzAAClz7AAN','AAAVeuABHAAA4vdAAH','ABoeo+AIVAAE6dKAAQ');
Insert into TEST_FOR_CX_ORACLE (Z$OEX0_LINES,Z$OEX0_ORDER_INVOICES,Z$OEX0_ORDERS,Z$ITEM_ROWID) values ('ABoeqvABQAABKo6AAI','AAAL0DAEzAAClz7AAO','AAAVeuABHAAA4vdAAH','ABoeo+AIVAAE6dKAAQ');
Insert into TEST_FOR_CX_ORACLE (Z$OEX0_LINES,Z$OEX0_ORDER_INVOICES,Z$OEX0_ORDERS,Z$ITEM_ROWID) values ('ABoeqvABQAABKo6AAG','AAAL0DAEzAAClz7AAP','AAAVeuABHAAA4vdAAH','ABoeo+AHIAAN+OIAAM');
Insert into TEST_FOR_CX_ORACLE (Z$OEX0_LINES,Z$OEX0_ORDER_INVOICES,Z$OEX0_ORDERS,Z$ITEM_ROWID) values ('ABoeqvAEyAAB0HOAAK','AAAL0DAEzAACl0EAAC','AAAVeuABHAAA4vdAAH','ABoeo+AHIAAN+OIAAM');
Script:
from cx_Oracle import makedsn,connect,Cursor
from pandas import read_sql_table, DataFrame, Series
from time import time
def create_conn( host_link , port , service_name , user_name , password ):
dsn=makedsn(host_link,port,service_name=service_name)
return connect(user=user_name, password=password, dsn=dsn)
def initiate_connection(conn):
try:
dbconnection = create_conn(*conn)
print('Connected to '+conn[2]+' !')
except Exception as e:
print(e)
dbconnection = None
return dbconnection
def execute_query(query,conn):
dbconnection=initiate_connection(conn)
try:
cursor = dbconnection.cursor()
print ('Cursor Created!')
return cursor.execute(query)
except Exception as e:
print(e)
return None
start_time = time()
query='''SELECT * FROM test_for_cx_oracle'''
try:
cx_read_query = execute_query(query,ecspat_c)
time_after_execute_query = time()
print('Query Executed')
columns = [i[0] for i in cx_read_query.description]
time_after_getting_columns = time()
except Exception as e:
print(e)
print(time_after_execute_query-start_time,time_after_getting_columns-time_after_execute_query)
Unfortunately, this is a bug in the Oracle Client libraries. You will see it if you attempt to fetch the same rowid value multiple times in consecutive rows. If you avoid that situation all is well. You can also set the environment variable ORA_OCI_NO_OPTIMIZED_FETCH to the value 1 before you run the query to avoid the problem.
This has been reported earlier here: https://github.com/oracle/python-cx_Oracle/issues/120

asyncpg fetch feedback (python)

I have been using psycopg2 to manage items in my PostgreSQL database. Recently someone suggested that I could improve my database transactions by using asyncio and asyncpg in my code. I have looked around Stack Overflow and read though the documentation for examples. I have been able to create tables and insert records, but I haven't been able to get the execution feedback that I desire.
For example in my psycopg2 code, I can verify that a table exists or doesn't exist prior to inserting records.
def table_exists(self, verify_table_existence, name):
'''Verifies the existence of a table within the PostgreSQL database'''
try:
self.cursor.execute(verify_table_existence, name)
answer = self.cursor.fetchone()[0]
if answer == True:
print('The table - {} - exists'.format(name))
return True
else:
print ('The table - {} - does NOT exist'.format(name))
return False
except Exception as error:
logger.info('An error has occurred while trying to verify the existence of the table {}'.format(name))
logger.info('Error message: {}').format(error)
sys.exit(1)
I haven't been able to get the same feedback using asyncpg. How do I accomplish this?
import asyncpg
import asyncio
async def main():
conn = await asyncpg.connect('postgresql://postgres:mypassword#localhost:5432/mydatabase')
answer = await conn.fetch('''
SELECT EXISTS (
SELECT 1
FROM pg_tables
WHERE schemaname = 'public'
AND tablename = 'test01'
); ''')
await conn.close()
#####################
# the fetch returns
# [<Record exists=True>]
# but prints 'The table does NOT exist'
#####################
if answer == True:
print('The table exists')
else:
print('The table does NOT exist')
asyncio.get_event_loop().run_until_complete(main())
You used fetchone()[0] with psycopg2, but just fetch(...) with asyncpg. The former will retrieve the first column of the first row, while the latter will retrieve a whole list of rows. Being a list, it doesn't compare as equal to True.
To fetch a single value from a single row, use something like answer = await conn.fetchval(...).

Resources