Query Dynamodb using columns other than partition and sort keys - node.js

I have a dynamodb table named "client" with the following columns :
- userId(partition key)
- clientId(sort key)
- status(true/false).
I would like to get all the records from the "client" table with status="true" using node.js.

You can't query without a key.
if you want to query by status, you will have to create a secondary partition key on the 'status' column (which, you have to pay more for using it like everything else in AWS).
but unless you will discard some of the columns from the projection (result) that you don't need, it won't be any much faster than using full scan on the table, because state contain only two values...
you can read about it in
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-general.html

Related

Regarding suggestion of best schema for a cassandra table?

I want to have a table in Cassandra that has a partition key say column 'A', and a column say 'B' which is of 'set' type and can have up to 10000 elements in the set.
But when i retrieve a row from this table then the whole set is retrieved at once and because of that the JVM heap increases rapidly. So should i stick to this schema or go with other schema where 'A' is partition key and i make dynamic columns for each element in the set in my other schema say 'B1', 'B2' ..... 'B10,000'where each of this column is a clustering key.
Which schema is suited best and will give the optimal performance please recommend.
NOTE: cqlsh 5.0.1v
Based off of what you've described, and the documentation I've read, I would not create a collection with 10k elements. Instead I would have two tables, one with everything but the collection, and then use the primary key values of the first table, as the partition key columns of the second table; adding the element name (or whatever you can use to identify an individual element) as a clustering column.
So for a given query, if you wanted everything for a particular primary key value (including all elements), you'd query the first table with the primary key, grab whatever you need, then hit the second table as well, looping/fetching through all elements.
If the query only provides a filter on the partition key (not the primary key - i.e. retrieving multiple rows) , the first query would have to retrieve all columns that make up the primary key for each row, and then query the second table looping for all elements - nested loop here - one loop for each primary key record retrieved from the first table, and a second loop to grab all elements for each pk record.
Probably the best way to go with this. That's how I would probably tackle this.
Does that make sense?
-Jim

Can I query both the Table and the Global Secondary Index in KeyConditionExpression

I have this table where I have put a Hash Key on a column called org_id and a Global Secondary Index on a column called ts. And I need to run a query against the table matching the condition, but I am getting the error Query key condition not supported.I can't use the "ts" as a Sort Key because there might be repetition there.
Therefore I wanted to know is it possible to query both the index and table in single condition like I have done below.
KeyCondition = Key("org_id").eq("some_id") &
Key("ts").between(START_DATE,END_DATE)
ProjectionExpression = "ts,val"
response = GET_TABLE.query(
TableName=DYNAMO_TABLE_NAME,
IndexName="ts-index",
KeyConditionExpression=KeyCondition,
ProjectionExpression=ProjectionExpression,
Limit=50
)
It isn't possible to access base table attributes and from a GSI query. You have to project the attributes you need, into the GSI.
You can project other base table attributes into the index if you want. When you query the index, DynamoDB can retrieve these projected attributes efficiently. However, global secondary index queries cannot fetch attributes from the base table.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
Note that the "primary key" of a GSI doesn't need to be unique.
In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html

Filter on the partition and the clustering key with an additional criteria

I want to filter on a table that has a partition and a clustering key with another criteria on a regular column. I got the following warning.
InvalidQueryException: Cannot execute this query as it might involve
data filtering and thus may have unpredictable performance. If you
want to execute this query despite the performance unpredictability,
use ALLOW FILTERING
I understand the problem if the partition and the clustering key are not used. In my case, is it a relevant error or can I ignore it?
Here is an example of the table and query.
CREATE TABLE mytable(
name text,
id uuid,
deleted boolean
PRIMARY KEY((name),id)
)
SELECT id FROM mytable WHERE name='myname' AND id='myid' AND deleted=false;
In Cassandra you can't filter data with non-primary key column unless you create index in it.
Cassandra 3.0 or up it is allowed to filter data with non primary key but in unpredictable performance
Cassandra 3.0 or up, If you provide all the primary key (as your given query) then you can use the query with ALLOW FILTERING, ignoring the warning
Otherwise filter from the client side or remove the field deleted and create another table :
Instead of updating the field to deleted true move your data to another table let's say mytable_deleted
CREATE TABLE mytable_deleted (
name text,
id uuid
PRIMARY KEY (name, id)
);
Now if you only have the non deleted data on mytable and deleted data on mytable_deleted table
or
Create index on it :
The column deleted is a low cardinality column. So remember
A query on an indexed column in a large cluster typically requires collating responses from multiple data partitions. The query response slows down as more machines are added to the cluster. You can avoid a performance hit when looking for a row in a large partition by narrowing the search.
Read More : When not to use an index

Azure Table Storage - Retrieving all entities matching a partial row key

I am just learning Azure Table Storage and I'm able to save and retrieve entities without any problem. However, I'd like to do the following. Say I have row keys (all with the same partition key) that look as follows:
KJV-C1-V1
KJV-C1-V2
KJV-C1-V3
KJV-C2-V1
KJV-C2-V2
KJV-C2-V3
I'd like to be able to perform two types of queries in .NET C#:
Retrieve all entities with row keys that start with 'KJV-C1'.
Retrieve all entities with row keys that contain '-C1-' in the key
Preferrably I'd like to be able to do this without reading all entities in the partition and filtering the ones that don't match the pattern I'm looking for. Is this possible with Azure Table Storage queries?
Retrieve all entities with row keys that start with 'KJV-C1'.
This is possible. Sample OData query:
PartitionKey eq 'your partition key' and (RowKey ge 'KJV-C1' and RowKey lt 'KJV-C2')
Retrieve all entities with row keys that contain '-C1-' in the key
This unfortunately is not possible. You would have to fetch all entities and filter the data on the client side.
You cannot do something like contains() over keys. But as it supports CompareTo("") method, you need to slightly modify your table design.
Maintain multiple partition keys instead of single. You can simple push 'KJV' part of your row key to partition key. Then start with C1-V1, C1-V2 as your row keys.
Then, if you want
All entries of KJV - Query for partition key 'KJV'
All 'C1' entries of KJV - Query for partition key 'KJV' and row key starting with 'C1-'
All entries for C1 - Query for row key starting with 'C1-'
OR with out design change in table, you need to loop through your major products like 'KJV' and build multiple queries with each starting with 'KJV-C1-', then union all of them to get final result.
Please mind that table storage does not allow all LINQ operations and sometimes you need to design the table keys keeping your majority of queries in mind.

Primary Key related CQL3 Queries cases & errors when sorting

I have two issues while querying Cassandra:
Query 1
> select * from a where author='Amresh' order by tweet_id DESC;
Order by with 2ndary indexes is not supported
What I learned: secondary indexes are made to be used only with a WHERE clause and not ORDER BY? If so, then how can I sort?
Query 2
> select * from a where user_id='xamry' ORDER BY tweet_device DESC;
Order by currently only supports the ordering of columns following their
declared order in the PRIMARY KEY.
What I learned: The ORDER BY column should be in the 2nd place in the primary key, maybe? If so, then what if I need to sort by multiple columns?
Table:
CREATE TABLE a(
user_id varchar,
tweet_id varchar,
tweet_device varchar,
author varchar,
body varchar,
PRIMARY KEY(user_id,tweet_id,tweet_device)
);
INSERT INTO a (user_id, tweet_id, tweet_device, author, body)
VALUES ('xamry', 't1', 'web', 'Amresh', 'Here is my first tweet');
INSERT INTO a (user_id, tweet_id, tweet_device, author, body)
VALUES ('xamry', 't2', 'sms', 'Saurabh', 'Howz life Xamry');
INSERT INTO a (user_id, tweet_id, tweet_device, author, body)
VALUES ('mevivs', 't1', 'iPad', 'Kuldeep', 'You der?');
INSERT INTO a (user_id, tweet_id, tweet_device, author, body)
VALUES ('mevivs', 't2', 'mobile', 'Vivek', 'Yep, I suppose');
Create index user_index on a(author);
To answer your questions, let's focus on your choice of primary key for this table:
PRIMARY KEY(user_id,tweet_id,tweet_device)
As written, the user_id will be used as the partition key, which distributes your data around the cluster but also keeps all of the data for the same user ID on the same node. Within a single partition, unique rows are identified by the pair (tweet_id, tweet_device) and those rows will be automatically ordered by tweet_id because it is the second column listed in the primary key. (Or put another way, the first column in the PK that is not a part of the partition key determines the sort order of the partition.)
Query 1
The WHERE clause is author='Amresh'. Note that this clause does not involve any of the columns listed in the primary key; instead, it is filtering using a secondary index on author. Since the WHERE clause does not specify an exact value for the partition key column (user_id) using the index involves scanning all cluster nodes for possible matches. Results cannot be sorted when they come from more than one replica (node) because that would require holding the entire result set on the coordinator node before it could return any results to the client. The coordinator can't know what is the real "first" result row until it has confirmed that it has received and sorted every possible matching row.
If you need the information for a specific author name, separate from user ID, and sorted by tweet ID, then consider storing the data again in a different table. The data design philosophy with Cassandra is to store the data in the format you need when reading it and to actually denormalize (store redundant information) as necessary. This is because in Cassandra, writes are cheap (though it places the burden of managing multiple copies of the same logical data on the application developer).
Query 2
Here, the WHERE clause is user_id = 'xamry' which happens to be the partition key for this table. The good news is that this will go directly to the replica(s) holding this partition and not bother asking the other nodes. However, you cannot ORDER BY tweet_device because of what I explained at the top of this answer. Cassandra stores rows (within a single partition) sorted by a single column, generally the second column in the primary key. In your case, you can access data for user_id = 'xamry' ORDER BY tweet_id but not ordered by tweet_device. The answer, if you really need the data sorted by device, is the same as for Query 1: store it in a table where that is the second column in the primary key.
If, when looking up the tweets by user_id you only ever need them sorted by device, simply flip the order of the last two columns in your primary key. If you need to be able to sort either way, store the data twice in two different tables.
The Cassandra storage engine does not offer multi-column sorting other than the order of columns listed in your primary key.

Resources