Partition key as variable in python azure storage table - python-3.x

I have partition key in azure storage table as below, partition key is string here.
for each day, the partition key is Date in string format. if i take count of records based on partiotion key (03042018) and rowkey as (1) . i get 2 counts.
partitionkey rowkey timestamp desc
02042018 1 2018-11-26T01:23:57.149Z 'abc'
02042018 1 2018-11-26T23:46:57.149Z 'def'
03042018 1 2018-11-27T01:46:57.149Z 'fff'
03042018 1 2018-11-27T01:47:57.149Z 'ggg'
03042018 2 2018-11-27T01:48:01.149Z 'ggg'
How to pass the partition key as variable in the below query.
Here, partitionkey is 'taskseattle', but i want to pass 02042014 on today and when run the python script it should pass 03042018 in the below query in the partitionkey.
a='02042018' (how to pass a in below partition key)
tasks = table_service.query_entities('tasktable', "PartitionKey eq 'tasksSeattle'")

If you just want to pass a variable to the query, please use the code below:
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity
table_service = TableService(account_name='your_account',account_key='your_key')
a='03042018'
tasks = table_service.query_entities('tasktable',filter='PartitionKey eq \'' + a + '\'')
for task in tasks:
print(task.description)
The test result as below:

Related

Python boto3 get item with specific non-parition-key attribute value

My AWS dynamoDB has id as Partition key and there is no Sort key. The following does not return the existing record from the table:
response = producttable.scan(FilterExpression=Attr('title').eq("My Product"))
response['ScannedCount'] is less than the total count of the table.
You need to pagínate.
For example code look at https://github.com/aws-samples/aws-dynamodb-examples/blob/master/DynamoDB-SDK-Examples/python/WorkingWithScans/scan_paginate.py

Inserting Timestamp Into Snowflake Using Python 3.8

I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.
In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED
Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.
You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)

Data model in Cassandra and proper deletion Strategy

I have following table in cassandra:
CREATE TABLE article (
id text,
price int,
validFrom timestamp,
PRIMARY KEY (id, validFrom)
) WITH CLUSTERING ORDER BY (validFrom DESC);
With articles and historical price information (validFrom is a timestamp of new price). Article price changes often. I want to query for
Historic price for a particular article.
Last price for an article.
From my understanding, I can solve both problems with following query:
select id, price from article where id = X validFrom < Y limit 1;
This query uses article id as restriction, query uses the partition key. Since the clustering order is based on the validFrom timestamp in reversed order, cassandra can efficient perform this query.
Am I getting this right?
What is the best approach to delete old data (house-keeping). Let's assume, I want delete all articles with validFrom > 20150101 and validFrom < 20151231. Since I don't have a primary key, this would be inefficient, even if I use an index on validFrom, right? How can I achieve this?
You can use external tools for that:
Spark with Spark Cassandra Connector (even in the local mode). Code could look as following (note that I'm using validfrom as name, not validFrom, as it's not escaped in your schema):
import com.datastax.spark.connector._
val data = sc.cassandraTable("test", "article")
.where("validfrom >= '2020-07-28T11:50:00Z' AND validfrom < '2020-07-28T12:50:00Z'")
.select("id", "validfrom")
data.deleteFromCassandra("test", "article", keyColumns=SomeColumns("id", "validfrom"))
use DSBulk to do find the matching entries and output them into the file (output.csv in my case), and then perform their deletion:
bin/dsbulk unload -url output.csv \
-query "SELECT id, validfrom FROM test.article WHERE token(id) > :start AND token(id) <= :end AND validFrom >= '2020-07-28T11:50:00Z' AND validFrom < '2020-07-28T12:50:00Z' ALLOW FILTERING"
bin/dsbulk load -query "DELETE from test.article WHERE id = :id and validfrom = :validfrom" \
-url output.csv
To add to Alex Ott's answer, this comment of yours is incorrect:
This query uses article id as restriction, query uses the partition key. Since the clustering order is based on price, cassandra can efficient perform this query.
The rows are not ordered by price. They are ordered by validFrom in reverse-chronological order. Cheers!

Batch conditional delete from dynamodb without sort key

I am shifting my database from mongodb to dynamo db. I have a problem with delete function from a table where labName is partition key and serialNumber is my sort key and there is one Id as feedId I want to delete all the records from the table where labName is given and feedId is NOT IN (array of ids).
I am doing it in mongo like below mentioned code
Is there a way with BatchWriteItem where i can add condition for feedId without sort key.
let dbHandle = await getMongoDbHandle(dbName);
let query = {
feedid: {$nin: feedObjectIds}
}
let output = await dbModule.removePromisify(dbHandle,
dbModule.collectionNames.feeds, query);
While working with DynamoDB, you can perform Conditional Retrieval (GET) / Deletion (DELETE) on the records only & only if you have provided all of the attributes for the Primary Key. For example:
For a Simple Primary key, you only need to provide a value for the Partition key.
For a Composite Primary Key, you must need to provide values for both the Partition key & sort key.

Unable to query on Partition key in DyanmoDB by boto3

I have one table TestTable and partition Key TestColumn.
Inputs Dates:
from_date= "2017-04-20T16:31:54.451071+00:00"
to_date = "2018-04-20T16:31:54.451071+00:00"
when I use equal query the date then it is working.
key_expr = Key('TestColumn').eq(to_date)
query_resp = table.query(KeyConditionExpression=key_expr)
but when I use between query then is not working.
key_expr = Key('TestColumn').between(from_date, to_date)
query_resp = table.query(KeyConditionExpression=key_expr)
Error:
Unknown err_msg while querying dynamodb: An error occurred (ValidationException) when calling the Query operation: Query key condition not supported
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
DynamoDB Query will return data from one and only one partition, meaning you have to supply a single partition key in the request.
KeyConditionExpression
The condition that specifies the key value(s)
for items to be retrieved by the Query action.
The condition must perform an equality test on a single partition key
value.
You can optionally use a BETWEEN operator on a sort key (but you still have to supply a single partition key).
If you use a Scan you can use an ExpressionFilter and use the BETWEEN operator on TestColumn

Resources