Can't add secondary index for dynamodb in cdk using python - python-3.x

I am trying to create a dynamoDB table, with a secondary index with partition and sort key.
I can create the table without the secondary index, but haven't been able to find a way yet to add the secondary index
I've looked at both of these resources, but haven't found anything that actually shows me what code i need in my cdk python script to create the resource with a secondary index
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_aws-dynamodb.Table.html
https://docs.aws.amazon.com/cdk/api/latest/docs/aws-dynamodb-readme.html
This is the code that will create the table
table_name = 'event-table-name'
event_table = dynamoDB.Table(self, 'EventsTable',
table_name=table_name,
partition_key=Attribute(
name='composite',
type=AttributeType.STRING
),
sort_key=Attribute(
name='switch_ref',
type=AttributeType.STRING
),
removal_policy=core.RemovalPolicy.DESTROY,
billing_mode=BillingMode.PAY_PER_REQUEST,
stream=StreamViewType.NEW_IMAGE,
)
and this is the secondary index I need to attach to it
secondaryIndex = dynamoDB.GlobalSecondaryIndexProps(
index_name='mpan-status-index',
partition_key=Attribute(
name='field1',
type=AttributeType.STRING
),
sort_key=Attribute(
name='field2',
type=AttributeType.STRING
),
)
I've tried adding the block inside the table creation and tried calling the addSecondaryindex method on the table. But both fail either saying unexpected keyword or object has no attribute addGlobalSecondaryIndex

addGlobalSecondaryIndex should be called on the Table class.
The code below (in typescript) works perfectly for me:
const table = new ddb.Table(this, "EventsTable", {
tableName: "event-table-name",
partitionKey: { name: 'composite', type: ddb.AttributeType.STRING },
sortKey: { name: 'switch_ref', type: ddb.AttributeType.STRING },
removalPolicy: cdk.RemovalPolicy.DESTROY,
billingMode: BillingMode.PAY_PER_REQUEST,
stream: StreamViewType.NEW_IMAGE
});
table.addGlobalSecondaryIndex({
indexName: 'mpan-status-idex',
partitionKey: { name: 'field1', type: ddb.AttributeType.STRING },
sortKey: { name: 'field2', type: ddb.AttributeType.STRING }
});

For anyone looking for this and stumbling on it through google search:
create your table with the usual:
from aws_cdk import aws_dynamodb as dynamodb
from aws_cdk.aws_dynamodb import Attribute, AttributeType, ProjectionType
table = dynamodb.Table(self, 'tableID',
partition_key=Attribute(name='partition_key', type = AttributeType.STRING))
then add your global secondary indexes in much the same way:
table.add_global_secondary_index(
partition_key=Attribute(name='index_hash_key', type=AttributeType.NUMBER),
sort_key=Attribute(name='range_key', type=AttributeType.STRING),
index_name='some_index')
you can add projection attributes with they kwarg arguments:
projection_type = ProjectionType.INCLUDE,
non_key_attributes= ['list', 'of', 'attribute','names']
and projection_type defaults to All if you don't include it.
I know the docs are incomplete in lots of areas, but this is found here:
https://docs.aws.amazon.com/cdk/api/latest/python/aws_cdk.aws_dynamodb/Table.html?highlight=add_global#aws_cdk.aws_dynamodb.Table.add_global_secondary_index

Have you tried using the addGlobalSecondaryIndex method as in
event_table.addGlobalSecondaryIndex({indexName: "...", partitionKey: "...", ...})
Take a look at the documentation for the method.

aws_dynamodb.Table returns an ITable. To use the addGlobalSecondaryIndex, first cast to Table like so:
table = aws_dynamodb.Table(self, "Table",
partition_key=dynamodb.Attribute(name="id", type=dynamodb.AttributeType.STRING)
aws_dynamodb.Table(table).add_global_secondary_index(...)

Related

dynamodb query: ValidationException: The number of conditions on the keys is invalid

I have the following schema where I am basically just trying to have a table with id as primary key, and both code and secondCode to be global secondary indexes to use to query the table.
resource "aws_dynamodb_table" "myDb" {
name = "myTable"
billing_mode = "PAY_PER_REQUEST"
hash_key = "id"
attribute {
name = "id"
type = "S"
}
attribute {
name = "code"
type = "S"
}
attribute {
name = "secondCode"
type = "S"
}
global_secondary_index {
name = "code-index"
hash_key = "code"
projection_type = "ALL"
}
global_secondary_index {
name = "second_code-index"
hash_key = "secondCode"
projection_type = "ALL"
}
}
When I try to look for one item by code
const toGet = Object.assign(new Item(), {
code: 'code_456',
});
item = await dataMapper.get<Item>(toGet);
locally I get
ValidationException: The number of conditions on the keys is invalid
and on the deployed instance of the DB I get
The provided key element does not match the schema
I can see from the logs that the key is not being populated
Serverless: [AWS dynamodb 400 0.082s 0 retries] getItem({ TableName: 'myTable', Key: {} })
Here is the class configuration for Item
#table(getEnv('MY_TABLE'))
export class Item {
#hashKey({ type: 'String' })
id: string;
#attribute({
indexKeyConfigurations: { 'code-index': 'HASH' },
type: 'String',
})
code: string;
#attribute({
indexKeyConfigurations: { 'second_code-index': 'HASH' },
type: 'String',
})
secondCode: string;
#attribute({ memberType: embed(NestedItem) })
nestedItems?: Array<NestedItem>;
}
class NestedItem {
#attribute()
name: string;
#attribute()
price: number;
}
I am using https://github.com/awslabs/dynamodb-data-mapper-js
I looked at the repo you linked for the package, I think you need to use the .query(...) method with the indexName parameter to tell DynamoDB you want to use that secondary index. Usuallly in DynamoDB, get operations use the default keys (in your case, you'd use get for queries on id, and query for queries on indices).
Checking the docs, it's not very clear - if you look at the GetItem reference, you'll see there's nowhere to supply an index name to actually use the index, whereas the Query operation allows you to supply one. As for why you need to query this way, you can read this: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html
The issue you are facing is due to calling a GetItem on an index, which is not possible. A GetItem must target a single item and an index can contain multiple items with the same key (unlike the base table), for this reason you can only use multi-item APIs on an index which are Query and Scan.

DynamoDB - boto3 - batch_write_item: The provided key element does not match the schema

This issue has been raised before but so far I couldn't find a solution that worked in boto3. GSI is set on 'solutionId' and partition key being 'emp_id'. Basically, I just want to delete all records in the table without deleting the table itself. What am I missing here?
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.batch_write_item
table_name = "solutions"
dynamodb_client = boto3.client('dynamodb')
dynamodb_resource = boto3.resource('dynamodb')
table = dynamodb_resource.Table(table_name)
data = table.scan()
delete_list = []
for item in data['Items']:
delete_list.append({
'DeleteRequest': {
'Key': {
'solutionId': {"S": f'{item["solutionId"]}'}
}
}
}
)
def list_spliter(list, size):
return (list[pos:pos + size] for pos in range(0, len(list), size))
for batch in list_spliter(delete_list, 25):
dynamodb_resource.batch_write_item(RequestItems={
f'{table_name}': batch
}
)
I think there are two small issues here:
you're using the high-level service resource interface so you don't need to explicitly tell DynamoDB what the attribute types are. They are inferred through automatic marshaling. So you can simply use "key" : "value" rather than "key": {"S": "value"} for the string keys
when deleting an item you need to provide the full primary key, to include both partition key and sort key
So, for example, if your partition and sort keys are named pk and sk:
'DeleteRequest': {
'Key': {
'pk': pk,
'sk': sk
}
}

Removing attribute from map in dynamodb

I have the following dynamodb table and want to remove "oldInfo" from the attribute "accounts" using update_expression = f"REMOVE accounts.oldInfo" however that does not work. Does anyone have any ideas or suggestions for this? Note that the "accounts" attribute is a map and so is "oldInfo".
{
"accounts": {
"oldInfo": {
"oldStuff": []
}
"otherInfo": {
"otherStuff": []
}
}
}
This is the code that I am using to remove "oldInfo"
self.clean_up_dynamo_db.updateitem(table_name=self.table_name,
key={
"partitionKey": item["partitionKey"],
"sortKey": item["sortKey"]
},
update_expression=f"REMOVE {self.attribute}")
I didn't find a clear example in the DDB docs, so I replicated your situation in my AWS account.
I created a map attribute named accounts and gave it keys of oldInfo and otherInfo. I executed the following operation, which removed accounts.oldInfo as expected.
const params = {
TableName: "YOUR_TABLE_NAME",
Key: {
PK: "PARTITION_KEY",
SK: "SORT_KEY"
},
UpdateExpression: "REMOVE accounts.otherInfo"
}
await ddbClient.updateItem(params);
Although this snippet is in JavaScript, it confirms that you are going about the operation correctly according to the DDB API.
Perhaps there is something else going on in your code that is preventing the operation from succeeding? Are you certain the variables you are sending update_item are as you expect them?

knex js query many to many

i'm having trouble with node & knex.js
I'm trying to build a mini blog, with posts & adding functionality to add multiple tags to post
I have a POST model with following properties:
id SERIAL PRIMARY KEY NOT NULL,
name TEXT,
Second I have Tags model that is used for storing tags:
id SERIAL PRIMARY KEY NOT NULL,
name TEXT
And I have many to many table: Post Tags that references post & tags:
id SERIAL PRIMARY KEY NOT NULL,
post_id INTEGER NOT NULL REFERENCES posts ON DELETE CASCADE,
tag_id INTEGER NOT NULL REFERENCES tags ON DELETE CASCADE
I have managed to insert tags, and create post with tags,
But when I want to fetch Post data with Tags attached to that post I'm having a trouble
Here is a problem:
const data = await knex.select('posts.name as postName', 'tags.name as tagName'
.from('posts')
.leftJoin('post_tags', 'posts.id', 'post_tags.post_id')
.leftJoin('tags', 'tags.id', 'post_tags.tag_id')
.where('posts.id', id)
Following query returns this result:
[
{
postName: 'Post 1',
tagName: 'Youtube',
},
{
postName: 'Post 1',
tagName: 'Funny',
}
]
But I want the result to be formated & returned like this:
{
postName: 'Post 1',
tagName: ['Youtube', 'Funny'],
}
Is that even possible with query or do I have to manually format data ?
One way of doing this is to use some kind of aggregate function. If you're using PostgreSQL:
const data = await knex.select('posts.name as postName', knex.raw('ARRAY_AGG (tags.name) tags'))
.from('posts')
.innerJoin('post_tags', 'posts.id', 'post_tags.post_id')
.innerJoin('tags', 'tags.id', 'post_tags.tag_id')
.where('posts.id', id)
.groupBy("postName")
.orderBy("postName")
.first();
->
{ postName: 'post1', tags: [ 'tag1', 'tag2', 'tag3' ] }
For MySQL:
const data = await knex.select('posts.name as postName', knex.raw('GROUP_CONCAT (tags.name) as tags'))
.from('posts')
.innerJoin('post_tags', 'posts.id', 'post_tags.post_id')
.innerJoin('tags', 'tags.id', 'post_tags.tag_id')
.where('posts.id', id)
.groupBy("postName")
.orderBy("postName")
.first()
.then(res => Object.assign(res, { tags: res.tags.split(',')}))
There are no arrays in MySQL, and GROUP_CONCAT will just concat all tags into a string, so we need to split them manually.
->
RowDataPacket { postName: 'post1', tags: [ 'tag1', 'tag2', 'tag3' ] }
The result is correct as that is how SQL works - it returns rows of data. SQL has no concept of returning anything other than a table (think CSV data or Excel spreadsheet).
There are some interesting things you can do with SQL that can convert the tags to strings that you concatenate together but that is not really what you want. Either way you will need to add a post-processing step.
With your current query you can simply do something like this:
function formatter (result) {
let set = {};
result.forEach(row => {
if (set[row.postName] === undefined) {
set[row.postName] = row;
set[row.postName].tagName = [set[row.postName].tagName];
}
else {
set[row.postName].tagName.push(row.tagName);
}
});
return Object.values(set);
}
// ...
query.then(formatter);
This shouldn't be slow as you're only looping through the results once.

Sequelize - Join with multiple column

I like to convert the following query into sequelize code
select * from table_a
inner join table_b
on table_a.column_1 = table_b.column_1
and table_a.column_2 = table_b.column_2
I have tried many approaches and followed many provided solution but I am unable to achieve the desired query from sequelize code.
The max I achieve is following :
select * from table_a
inner join table_b
on table_a.column_1 = table_b.column_1
I want the second condition also.
and table_a.column_2 = table_b.column_2
any proper way to achieve it?
You need to define your own on clause of the JOIN statement
ModelA.findAll({
include: [
{
model: ModelB,
on: {
col1: sequelize.where(sequelize.col("ModelA.col1"), "=", sequelize.col("ModelB.col1")),
col2: sequelize.where(sequelize.col("ModelA.col2"), "=", sequelize.col("ModelB.col2"))
},
attributes: [] // empty array means that no column from ModelB will be returned
}
]
}).then((modelAInstances) => {
// result...
});
Regarding #TophatGordon 's doubt in accepted answer's comment: that if we need to have any associations set up in model or not.
Also went through the github issue raised back in 2012 that is still in open state.
So I was also in the same situation and trying to setup my own ON condition for left outer join.
When I directly tried to use the on: {...} inside the Table1.findAll(...include Table2 with ON condition...), it didn't work.
It threw an error:
EagerLoadingError [SequelizeEagerLoadingError]: Table2 is not associated to Table1!
My use case was to match two non-primary-key columns from Table1 to two columns in Table2 in left outer join. I will show how and what I acheived:
Don't get confused by table names and column names, as I had to change them from the original ones that I used.
SO I had to create an association in Table1(Task) like:
Task.associate = (models) => {
Task.hasOne(models.SubTask, {
foreignKey: 'someId', // <--- one of the column of table2 - SubTask: not a primary key here in my case; can be primary key also
sourceKey: 'someId', // <--- one of the column of table1 - Task: not a primary key here in my case; can be a primary key also
scope: {
[Op.and]: sequelize.where(sequelize.col("Task.some_id_2"),
// '=',
Op.eq, // or you can use '=',
sequelize.col("subTask.some_id_2")),
},
as: 'subTask',
// no constraints should be applied if sequelize will be creating tables and unique keys are not defined,
//as it throws error of unique constraint
constraints: false,
});
};
So the find query looks like this :
Task.findAll({
where: whereCondition,
// attributes: ['id','name','someId','someId2'],
include: [{
model: SubTask, as: 'subTask', // <-- model name and alias name as defined in association
attributes: [], // if no attributes needed from SubTask - empty array
},
],
});
Resultant query:
One matching condition is taken from [foreignKey] = [sourceKey]
Second matching condition is obtained by sequelize.where(...) used in scope:{...}
select
"Task"."id",
"Task"."name",
"Task"."some_id" as "someId",
"Task"."some_id_2" as "someId2"
from
"task" as "Task"
left outer join "sub_task" as "subTask" on
"Task"."some_id" = "subTask"."some_id"
and "Task"."some_id_2" = "subTask"."some_id_2";
Another approach to achieve same as above to solve issues when using Table1 in include i.e. when Table1 appears as 2nd level table or is included from other table - say Table0
Task.associate = (models) => {
Task.hasOne(models.SubTask, {
foreignKey: 'someId', // <--- one of the column of table2 - SubTask: not a primary key here in my case; can be primary key also
sourceKey: 'someId', // <--- one of the column of table1 - Task: not a primary key here in my case; can be a primary key also
as: 'subTask',
// <-- removed scope -->
// no constraints should be applied if sequelize will be creating tables and unique keys are not defined,
//as it throws error of unique constraint
constraints: false,
});
};
So the find query from Table0 looks like this : Also the foreignKey and sourceKey will not be considered as we will now use custom on: {...}
Table0.findAll({
where: whereCondition,
// attributes: ['id','name','someId','someId2'],
include: {
model: Task, as: 'Table1AliasName', // if association has been defined as alias name
include: [{
model: SubTask, as: 'subTask', // <-- model name and alias name as defined in association
attributes: [], // if no attributes needed from SubTask - empty array
on: {
[Op.and]: [
sequelize.where(
sequelize.col('Table1AliasName_OR_ModelName.some_id'),
Op.eq, // '=',
sequelize.col('Table1AliasName_OR_ModelName->subTask.some_id')
),
sequelize.where(
sequelize.col('Table1AliasName_OR_ModelName.some_id_2'),
Op.eq, // '=',
sequelize.col('Table1AliasName_OR_ModelName->subTask.some_id_2')
),
],
},
}],
}
});
Skip below part if your tables are already created...
Set constraints to false, as if sequelize tries to create the 2nd table(SubTask) it might throw error (DatabaseError [SequelizeDatabaseError]: there is no unique constraint matching given keys for referenced table "task") due to following query:
create table if not exists "sub_task" ("some_id" INTEGER, "some_id_2"
INTEGER references "task" ("some_id") on delete cascade on update
cascade, "data" INTEGER);
If we set constraint: false, it creates this below query instead which will not throw unique constraint error as we are referencing non-primary column:
create table if not exists "sub_task" ("some_id" INTEGER, "some_id_2" INTEGER, "data" INTEGER);

Resources