Can I query both the Table and the Global Secondary Index in KeyConditionExpression - python-3.x

I have this table where I have put a Hash Key on a column called org_id and a Global Secondary Index on a column called ts. And I need to run a query against the table matching the condition, but I am getting the error Query key condition not supported.I can't use the "ts" as a Sort Key because there might be repetition there.
Therefore I wanted to know is it possible to query both the index and table in single condition like I have done below.
KeyCondition = Key("org_id").eq("some_id") &
Key("ts").between(START_DATE,END_DATE)
ProjectionExpression = "ts,val"
response = GET_TABLE.query(
TableName=DYNAMO_TABLE_NAME,
IndexName="ts-index",
KeyConditionExpression=KeyCondition,
ProjectionExpression=ProjectionExpression,
Limit=50
)

It isn't possible to access base table attributes and from a GSI query. You have to project the attributes you need, into the GSI.
You can project other base table attributes into the index if you want. When you query the index, DynamoDB can retrieve these projected attributes efficiently. However, global secondary index queries cannot fetch attributes from the base table.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
Note that the "primary key" of a GSI doesn't need to be unique.
In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html

Related

Regarding suggestion of best schema for a cassandra table?

I want to have a table in Cassandra that has a partition key say column 'A', and a column say 'B' which is of 'set' type and can have up to 10000 elements in the set.
But when i retrieve a row from this table then the whole set is retrieved at once and because of that the JVM heap increases rapidly. So should i stick to this schema or go with other schema where 'A' is partition key and i make dynamic columns for each element in the set in my other schema say 'B1', 'B2' ..... 'B10,000'where each of this column is a clustering key.
Which schema is suited best and will give the optimal performance please recommend.
NOTE: cqlsh 5.0.1v
Based off of what you've described, and the documentation I've read, I would not create a collection with 10k elements. Instead I would have two tables, one with everything but the collection, and then use the primary key values of the first table, as the partition key columns of the second table; adding the element name (or whatever you can use to identify an individual element) as a clustering column.
So for a given query, if you wanted everything for a particular primary key value (including all elements), you'd query the first table with the primary key, grab whatever you need, then hit the second table as well, looping/fetching through all elements.
If the query only provides a filter on the partition key (not the primary key - i.e. retrieving multiple rows) , the first query would have to retrieve all columns that make up the primary key for each row, and then query the second table looping for all elements - nested loop here - one loop for each primary key record retrieved from the first table, and a second loop to grab all elements for each pk record.
Probably the best way to go with this. That's how I would probably tackle this.
Does that make sense?
-Jim

Query Dynamodb using columns other than partition and sort keys

I have a dynamodb table named "client" with the following columns :
- userId(partition key)
- clientId(sort key)
- status(true/false).
I would like to get all the records from the "client" table with status="true" using node.js.
You can't query without a key.
if you want to query by status, you will have to create a secondary partition key on the 'status' column (which, you have to pay more for using it like everything else in AWS).
but unless you will discard some of the columns from the projection (result) that you don't need, it won't be any much faster than using full scan on the table, because state contain only two values...
you can read about it in
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-general.html

Data Model for One To Many - Itemcontainer in Cassandra

i have two CFs "ItemContainer" and "Items".
I used to have a secondary index in "Items" referring to the "Itemcontainer".
Something like:
CREATE table items (key uuid primary key, container uuid, slot int ....
CREATE INDEX items_container ON items(container)
i change the "container" cell quite often when changing the itemcontainer.
Documentation says that a secondary index shouldnt be used in this case.
So i tried something like:
primary key(container, key)
in items. now i can query all items for an itemcontainer just fine.
but how do i put the item in another itemcontainer?
you cant override parts of the primary key.
so do i really have to delete the item and reinsert all the date with a different "container" field?
Doesn't this create a lot of tombstones?
Also "Items" has like 20 columns with maps and list and everything...
any ideas?

Is It Possible To BatchGet Multiple Items By Partition Key Only DynamoDB

I have items with ItemID's and Paths. ItemID is the partition key and Path is the range key. If I have multiple ItemID's I want to query, but don't want to include the range key is it possible to do it with batchGet or will I have to use query for each of the ItemID's? I have tried batchGet but get the error "The provided key element does not match the schema"
No, it is not possible to get the items based on Partition key only. The batch get item API requires both Partition and Range key.
Keys - An array of primary key attribute values that define specific
items in the table. For each primary key, you must provide all of the
key attributes. For example, with a simple primary key, you only need
to provide the partition key value. For a composite key, you must
provide both the partition key value and the sort key value.
However, you can use Query API to get the data by partition key only.

Why cassandra/cql restrict to use where clause on a column that not indexed?

I have a table as follows in Cassandra 2.0.8:
CREATE TABLE emp (
empid int,
deptid int,
first_name text,
last_name text,
PRIMARY KEY (empid, deptid)
)
when I try to search by: "select * from emp where first_name='John';"
cql shell says:
"Bad Request: No indexed columns present in by-columns clause with Equal operator"
I searched for the issue and every places it says add a secondary index for the column 'first_name'.
But I need to know the exact reason for why that column need to be indexed?
Only thing I can figure out is performance.
Any other reasons?
Cassandra does not support for searching by arbitrary column. It is because it would involve scanning all the rows, which is not supported.
The data are internally organised into something which one can compare to HashMap[X, SortedMap[Y, Z]]. The key of the outer map is a partition key value and the key of the inner map is a kind of concatenation of all clustering columns values and a name of some regular column.
Unless you have an index on a column, you need to provide full (preferred) or partial path to the data you want to collect with the query. Therefore, you should design your schema so that queries contain primary key value and some range on clustering columns.
You may read about what is allowed and what is not here
Alternatively you can create an index in Cassandra, but that will hamper your write performance.

Resources