DynamoDB Query Reserved Words using boto3 Resource? - python-3.x

I've been playing around with some "weirder" querying of DynamoDB with Reserved Words using boto3.resource method, and came across a pretty annoying issue which I can't resolve for quite some time (Always the same error sigh), and can't seem to find the answer anywhere.
My code is the following:
import logging
import boto3
from boto3.dynamodb.conditions import Key
logger = logging.getLogger()
logger.setLevel(logging.INFO)
TABLE_NAME = "some-table"
def getItems(record, table=None):
if table is None:
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(TABLE_NAME)
record = str(record)
get_item = table.query(
KeyConditionExpression=Key("#pk").eq(":pk"),
ExpressionAttributeNames={"#pk": "PK"},
ExpressionAttributeValues={":pk": record},
)
logger.info(
f"getItem parameters\n{json.dumps(get_item, indent=4,sort_keys=True, default=str)}"
)
return get_item
if __name__ == "__main":
record = 5532941
getItems(record)
It's nothing fancy, as I mentioned I'm just playing around, but I'm constantly getting the following error no matter what I try:
"An error occurred (ValidationException) when calling the Query operation: Value provided in ExpressionAttributeNames unused in expressions: keys: {#pk}"
As far as I understand in order to "replace" the reserved keys/values with something arbitrary you put it into ExpressionAttributeNames and ExpressionAttributeValues, but I can't wrap my head around as to why it's telling me that this key is not used.
I should mention that this Primary Key exists with this value in the record var in DynamoDB.
Any suggestions?
Thanks

If you're using Key then just provide the string values and don't be fancy with substitution. See this example:
https://github.com/aws-samples/aws-dynamodb-examples/blob/master/DynamoDB-SDK-Examples/python/WorkingWithQueries/query_equals.py
If you're writing an equality expression as a single string, then you need the substitution. See this example:
https://github.com/aws-samples/aws-dynamodb-examples/blob/master/DynamoDB-SDK-Examples/python/WorkingWithQueries/query-consistent-read.py

Related

How to pass RunProperties while calling the glue workflow using boto3 and python in lambda function?

My python code in lambda function:
import json
import boto3
from botocore.exceptions import ClientError
glueClient = boto3.client('glue')
default_run_properties = {'s3_path': 's3://bucketname/abc.zip'}
response = glue_client.start_workflow_run(Name="Testing",RunProperties=default_run_properties)
print(response)
I am getting error like this:
"errorMessage": "Parameter validation failed:\nUnknown parameter in input: \"RunProperties\", must be one of: Name",
"errorType": "ParamValidationError",
I also tried like this :
session = boto3.session.Session()
glue_client = session.client('glue')
But got the same error.
can anyone tell how to pass the RunProperties while calling the glue workflow to run .The RunProperties are dynamic need to be passed from lambda event.
I had the same issue and this is a bit tricky. I do not like my solution, so maybe someone else has a better idea? See here: https://github.com/boto/boto3/issues/2580
And also here: https://docs.aws.amazon.com/glue/latest/webapi/API_StartWorkflowRun.html
So, you cannot pass the parameters when starting the workflow, which is a shame in my opinion, because even the CLI suggests that: https://docs.aws.amazon.com/cli/latest/reference/glue/start-workflow-run.html
However, you can update the parameters before you start the workflow. These values are then set for everyone. If you expect any "concurrency" issues then this is not a good way to go. You need to decide, if you reset the values afterwards or just leave it to the next start of the workflow.
I start my workflows like this:
glue_client.update_workflow(
Name=SHOPS_WORKFLOW_NAME,
DefaultRunProperties={
's3_key': file_key,
'market_id': segments[0],
},
)
workflow_run_id = glue_client.start_workflow_run(
Name=SHOPS_WORKFLOW_NAME
)
This basically produces the following in the next run:
I had the same problem and asked in AWS re:Post. The problem is the old boto3 version used in Lambda. They recommended two ways to work around this issue:
Update run properties for a Job immediately after start_workflow_run:
default_run_properties = {'s3_path': 's3://bucketname/abc.zip'}
response = glue_client.start_workflow_run(Name="Testing")
updateRun = glue_client.put_workflow_run_properties(
Name = "Testing",
RunId = response['RunId'],
RunProperties = default_run_properties
)
Or you can create a lambda layer for your lambda function and include a new boto3 version there.

When to use SQL Foreign key using peewee?

I'm currently using PeeWee together with Python and I have managed to create a decent beginner
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
which has been converted to peewee by following code:
# ------------------------------------------------------------------------------- #
class Stores(Model):
id = IntegerField(column_name='id')
store_name = TextField(column_name='store_name')
class Meta:
database = postgres_pool
db_table = "stores"
#classmethod
def get_all(cls):
try:
return cls.select(cls.id, cls.store_name).order_by(cls.store)
except Stores.IntegrityError:
return None
# ------------------------------------------------------------------------------- #
class Products(Model):
id = IntegerField(column_name='id')
store_id = TextField(column_name='store_id')
title = TextField(column_name='title')
url = TextField(column_name='url')
image = TextField(column_name='image')
store = ForeignKeyField(Stores, backref='products')
class Meta:
database = postgres_pool
db_table = "products"
#classmethod
def get_all_products(cls, given_id):
try:
return cls.select().where(cls.store_id == given_id)
except Stores.IntegrityError:
return None
#classmethod
def add_product(cls, pageData, store_id):
"""
INSERT
INTO
public.products(store_id, title, image, url)
VALUES((SELECT id FROM stores WHERE store_name = 'footish'), 'Teva Flatform Universal Pride',
'https://www.footish.se/sneakers/teva-flatform-universal-pride-t51116376',
'https://www.footish.se/pub_images/large/teva-flatform-universal-pride-t1116376-p77148.jpg?timestamp=1623417840')
"""
try:
return cls.insert(
store_id=store_id,
title=pageData.title,
url=pageData.url,
image=pageData.image,
).execute()
except Products.DoesNotExist:
return None
except peewee.IntegrityError as err:
print(f"error: {err}")
return None
My idea is that when I start my application, I would have a constant variable which a store_id set already e.g. 1. With that it would make the execution of queries faster as I do not need another select to get the store_id by a store_name. However looking at my code. I have a field that is: store = ForeignKeyField(Stores, backref='products') where I am starting to think what do I need it in my application.
I am aware that I do have a FK from my ALTER query but in my application that I have written I cannot see a reason why I would need to type in the the foreign key at all but I would like some help to understand more why and how I could use the value "store" in my applciation. It could be as I think that I might not need it at all?
Hello! By reading your initial idea about making "the execution of queries faster" from having a constant variable, the first thing that came to mind was the hassle of always having to manually edit the variable. This is poor practice and not something you'd want to do on a professional application. To obtain the value you should use, I suggest running a query programmatically and fetching the id's highest value using SQL's MAX() function.
As for the foreign key, you don't have to use it, but it can be good practice when it matters. In this case, look at your FK constraint: it has an ON DELETE RESTRICT statement, which cancels any delete operation on the parent table if it has data being used as a foreign key in another table. This would require going to the other table, the one with the foreign key, and deleting every row related to the one on the previous table before being able to delete it.
In general, if you have two tables with information linked in any way, I'd highly suggest using keys. It increases organization and, if proper constraints are added, it increases both readability for external users and reduces errors.
When it comes to using the store you mentioned, you might want to have an API return all products related to a single store. Or all products except from a specific one.
I tried to keep things simple due to not being fully confident I understood the question. I hope this was helpful.

BigQuery update how to get number of updated rows

I am using Google Cloud Functions to connect to a Google Bigquery database and update some rows. The cloud function is written using Python 3.
I need help figuring out how to get the result message or the number of updated/changed rows whenever I run an update dml through the function. Any ideas?
from google.cloud import bigquery
def my_update_function(context,data):
BQ = bigquery.Client()
query_job = BQ.query("Update table set etc...")
rows = query_job.result()
return (rows)
I understand that rows always come back as _emptyrowiterator object. Any way i can get result or result message? Documentation says I have to get it from a bigquery job method. But can't seem to figure it out.
I think that you are searching for QueryJob.num_dml_affected_rows. It contain number of rows affected by update or any other DML statement. If you just paste it to your code instead of rows in return statement you will get number as int or you can create some massage like :
return("Number of updated rows: " + str(job_query.num_dml_affected_rows))
I hope it will help :)
Seems like there is no mention in the bigquery Python DB-API documentation on rows returned. https://googleapis.dev/python/bigquery/latest/reference.html
I decided to use a roundabout method on dealing with this issue by generating a SELECT statement first to check if there are any matches to the WHERE clause in the UPDATE statement.
Example:
from google.cloud.bigquery import dbapi as bq
def my_update_function(context,data):
try:
bq_connection = bq.connect()
bq_cursor = bq_connection.cursor()
bq_cursor.execute("select * from table where ID = 1")
results = bq_cursor.fetchone()
if results is None:
print("Row not found.")
else:
bq_cursor.execute("UPDATE table set name = 'etc' where ID = 1")
bq_connection.commit()
bq_connection.close()
except Exception as e:
db_error = str(e)

How to perform Key based queries to Google Datastore from Python 3?

I manage to make a connection to a Google Cloud Datastore databased. Now I want to get some entities given their Key/Id. Right now I am doing the following:
from google.cloud import datastore
client = datastore.Client()
query = client.query(kind='City')
query.key_filter("325899977574122") -> Exception here
I get "Invalid key: '325899977574122'".
What could be the cause of error? That Id exist, a city does have that key/Id.
It looks like it needs to be of type google.cloud.datastore.key.Key
https://googleapis.dev/python/datastore/latest/queries.html#google.cloud.datastore.query.Query.key_filter
Also, 325899977574122 is probably supposed to be cast to a long
So something like this:
client = datastore.Client()
query = client.query(kind='City')
query.key_filter(Key('City', 325899977574122L, project=project))
EDIT:
Also if youre trying to retrieve a single id, you should probably use this:
https://googleapis.dev/python/datastore/latest/client.html#google.cloud.datastore.client.Client.get
client = datastore.Client()
client.get(Key('City', 325899977574122L, project=project))
Fetching by ID is faster than doing a query

How to flatten a tuple of server ids to a string?

I'm trying to create a file that includes the ID's of multiple server hosts that were generated with the count attribute:
resource "aws_instance" "workers" {
count = "${var.worker_count}"
...
}
resource "local_file" "stop_instances" {
filename = "${path.module}/generated/stop_instances.py"
content =<<EOF
import boto3
# Boto Connection
ec2 = boto3.resource('ec2', '${var.region}')
def lambda_handler(event, context):
# Retrieve instance IDs
instance_ids = ["${aws_instance.controller.id}", "${aws_instance.gateway.id}", "${aws_instance.workers.*.id}"]
# stopping instances
stopping_instances = ec2.instances.filter(InstanceIds=instance_ids).stop()
EOF
}
However, I'm getting the following error:
447: instance_ids = ["${aws_instance.controller.id}",
"${aws_instance.gateway.id}", "${aws_instance.workers.*.id}"]
449:
450:
451:
|----------------
| aws_instance.workers is tuple with 3 elements
Cannot include the given value in a string template: string required.
Is there a way that I can flatten the tuple to a string?
I've tried the tostring() method, but that only accepts primitive types.
join was the solution for me:
instance_ids = ["${aws_instance.controller.id}","${aws_instance.gateway.id}","${join("\",\"", aws_instance.workers.*.id)}"]
The best way to produce a string from a list will depend on the specific situation, because there are lots of different ways to represent a list in a string.
For this particular situation, it seems likely that JSON array/string syntax is compatible enough with Python syntax that you could get away with using jsonencode to produce a Python list expression:
import boto3
# Boto Connection
ec2 = boto3.resource('ec2', ${jsonencode(var.region)})
def lambda_handler(event, context):
# Retrieve instance IDs
instance_ids = ${jsonencode(
concat(
[
aws_instance.controller.id,
aws_instance.gateway.id,
],
aws_instance.workers.*.id
)
)}
# stopping instances
stopping_instances = ec2.instances.filter(InstanceIds=instance_ids).stop()
For situations where a lot of data needs to be passed into a program written in another language, and where the JSON syntax might not 100% align with the target language, a more general solution would be to pass the data structure in as JSON and then parse it using the language's own JSON parser.
If you know that all of your values are strings as in this case, you could also simplify things and join all of your string values together with some delimiter using the join function and then split it using Python's split method, at the expense of the resulting source code looking even less like a human might hand-write it.

Resources