Which is the correct way to scan a table on DynamoDB? - node.js

As the title says, I want to know which is the best way to scan a table in Amazon DynamoDB, searching by another field than the primary key.
I searched about this and read a lot, but I found this solution for me:
let DynamoDBServiceObj = new AWS.DynamoDB({apiVersion: '2012-08-10'});
let params = {
ExpressionAttributeValues: {
':hash' : { S: req.param('wildcard') }
},
ProjectionExpression: 'directory',
FilterExpression: 'qrCode = :hash',
TableName: 'business'
};
let business = await DynamoDBServiceObj.scan(params).promise();
if (business.Count == 1) return res.ok();
else return res.view('404');
This works for me, but I also read that perform an scan on a table is a bad idea, for performance and pricing. But, how to do it then?
Which is the correct way to scan a table, searching by another than the primary key?
What is the difference between DocumentClient and DynamoDB Object?
I always use .get() for obtain what I want on DynamoDB. Is this a good or a bad practice?
I read these posts, and I suppose that GSI is the solution, but I don't understand how it works.
Global Secondary Indexes (GSI)
DynamoDB: Scan on multiple non key attribute
Step 4: Query and Scan the table
Querying and Scanning a DynamoDB Table
What is the difference between scan and query in dynamodb?
How to fetch/scan all items from AWS dynamodb using node.js

Like you said, scanning the table is not a great idea and you have already read about it. I would suggest two things.
Use a composite primary key (if you're not doing so yet). Using the combination of partition key and sort key gives you more possibilities to query (and not scan) your table depending on your frequent access patterns.
If you still need to query the table by an attribute other than the ones included in your composite primary key, you are right that the GSI is the solution. You can check this post on how the GSI works. Choose primary index for Global secondary index
You can think of a GSI as a copy of your table with a different primary key.

Related

How can I query Cassandra with GraphQL using a non-primary key column?

I am using GraphQL for Cassandra database operation. The following search query work perfectly when filtering the column with partition key:
query oneUsers{
users(value: { username:"username" }) {
values {
id
name
username
}
}
}
But getting following errors when trying to search other columns:
graphql: Exception while fetching data (/users) : org.apache.cassandra.stargate.exceptions.InvalidRequestException: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
What is the best way to search columns in Cassandra using GraphQL?
You can only retrieve records from Cassandra by specifying (a) the partition key, or (b) the primary key column(s).
If you would like to filter by non-primary key columns, you will need to create an index on the column. For example if you want to filter by id, create an index with:
mutation createIndexes {
users: createIndex(
keyspaceName:"myks",
tableName:"users",
columnName:"id",
indexName:"id_idx"
)
}
You should then be able to query by id.
For details and more examples, see Developing with the Astra DB GraphQL API. Cheers!

How To Get Item From DynamoDB Based On Multiple (one primary) Attributes Lambda/NodeJS

My table structure in DynamoDB looks like the following:
uuid (Primary Key) | ip | userAgent
From within a NodeJS function inside of lambda, I would like to get the uuid of an item whose ip and useragent match the information I provide.
Scan becomes less and less efficient and more expensive over time as millions of items are added to the table every week.
Here is the code I am using to try and accomplish this:
function tieDown(sIP, uA){
const userQuery = {
Key : {
"ip" : "192.168.0.1",
"userAgent" : "sample"
},
TableName: "mytable"
};
return ddb.get(userQuery, function(err, data){
if (err) console.log(err.stack);
}).promise();
}
When this code executes, the following error is thrown ValidationException: The provided key element does not match the schema.
So I guess my questions are:
Is it even possible to get one specific item based on non-primary attributes
Are there any issues with the code sample I provided that could lead to this error being thrown? (I'm using DocumentClient so no need to explicitly declare strings, numbers etc.
Thanks!
You cannot get a single item using the get operation without specifying the partition key, and sort key if you have one. Scans should be avoided in most cases. What you probably need is a Global Secondary Index that allows you to query by ip and userAgent. Keep in mind that the records on a GSI are not guaranteed unique, so you may get more than one result.

DynamoDB begins with not returning expected results

I'm using NodeJS and DynamoDB. I'm never used DynamoDB before, and primary a C# developer (where this would simply just be a .Where(x => x...) call, not sure why Amazon made it any more complicated then that). I'm trying to simply just query the table based on if an id starts with certain characters. For example, we have the year as the first 2 characters of the Id field. So something like this: 180192, so the year is 2018. The 20 part is irrelevant, just wanted to give a human readable example. So the Id starts with either 18 or 17 and I simply want to query the db for all rows that Id starts with 18 (for example, could be 17 or whatever). I did look at the documentation and I'm not sure I fully understand it, here's what I have so far that is just returning all results and not the expected results.
let params = {
TableName: db.table,
ProjectionExpression: "id,CompetitorName,code",
KeyConditionExpression: "begins_with(id, :year)",
ExpressionAttributeValues: {
':year': '18'
}
return db.docClient.scan(params).promise();
So as you can see, I'm thinking that this would be a begins_with call, where I look for 18 against the Id. But again, this is returning all results (as if I didn't have KeyConditionExpression at all).
Would love to know where I'm wrong here. Thanks!
UPDATE
So I guess begin_with won't work since it only works on strings and my id is not a string. As per commenters suggestion, I can use BETWEEN, which even that is not working either. I either get back all the results or Query key condition not supported error (if I use .scan, I get back all results, if I use .query I get the error)
Here is the code I'm trying.
let params = {
TableName: db.table,
ProjectionExpression: "id,CompetitorName,code",
KeyConditionExpression: "id BETWEEN :start and :end",
ExpressionAttributeValues: {
':start': 18000,
':end': 189999
}
};
return db.docClient.query(params).promise();
It seems as if there's no actual solution for what I was originally trying to do unfortunately. Which is a huge downfall of DynamoDB. There really needs to be some way to do 'where' using the values of columns, like you can in virtually any other language. However, I have to admit, part of the problem was the way that id was structured. You shouldn't have to rely on the id to get info out of it. Anyways, I did find another column DateofFirstCapture which using with contains (all the dates are not the same format, it's a mess) and using a year 2018 or 2017 seems to be working.
if you want to fetch data by id, add it as the partition key. If you want to get data by part of the string, you can use "begins with" on sort key.
begins_with (a, substr)— true if the value of attribute a begins with a particular substring.
source: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/Query.html
begins_with and between can only be used on sort keys.
For query you must always supply partition key.
So if you change your design to have unique partition key (or unique combo of partition/sort keys) and strings like 180192 as sort key you will be able to query begins_with(sortkey, ...).

Cosmos DB succeeds and fails on randomly on the same query, saying they are cross partition when they aren't

I have a collection with the partition key "flightConversationId".
I am doing a very simple query, BY THE PARTITON KEY FIELD
SELECT * from root WHERE root.flightConversationId="b36d13c0-cbec-11e7-a4ad-8fcedf370f98"
When doing this query via the nodeJS SDK, it will work one second, and fail the next with the error:
Cross partition query is required but disabled. Please set x-ms-documentdb-query-enablecrosspartition to true, specify x-ms-documentdb-partitionkey, or revise your query to avoid this exception.
I realize I could enable cross-partition querying, but I do not need cross partition queries. What is going on???
This situation seemed to resolve itself over time.
My theory is that when we deleted a collection and recreated it with a new partition key, it took a long time for all remnants of the original collection to really be deleted from the cloud, and that some requests were going to the "old" collection that had the same name as the "new".
You have to explicitly scope the query to a partition by providing an FeedOptions or RequestOptions class with a partitionKey property. Using the PartitionKey in your where clause isn't enough without that explicit scope. This is for C# but should be same object model:
https://learn.microsoft.com/en-us/azure/cosmos-db/documentdb-partition-data
Document result = await client.ReadDocumentAsync(
UriFactory.CreateDocumentUri("db", "coll", "XMS-001-FE24C"),
new RequestOptions { PartitionKey = new PartitionKey("XMS-0001") });
jsDoc:
http://azure.github.io/azure-documentdb-node/global.html#RequestOptions
http://azure.github.io/azure-documentdb-node/global.html#FeedOptions

flask dyanmo query for count fileds

consider this schema in dyanmo db,we count of question
[
{
'TableName': "user_detail",
'KeySchema': [
{'AttributeName': "timestamp", 'KeyType': "HASH"},
{'AttributeName': "question", 'KeyType': "RANGE"},
],
'AttributeDefinitions': [
{'AttributeName': "timestamp", 'AttributeType': "S"},
{'AttributeName': "question", 'AttributeType': "N"},
],
'ProvisionedThroughput': {
'ReadCapacityUnits': 40,
'WriteCapacityUnits': 40] }
}
]
I'm beginner of dyanmo db can any one give idea for that one.we need query,the sql query goes like that select count(question) from user_detail where question =1
Thanks in advance
I will throw some pointers. DynamoDB has two types of APIs :-
Option 1:-
1) Scan API - will scan the whole table. The scan api should be used when the hash key value is not known
2) Query API - will query the table using hash key. The hash key is must for Query API
In your case, the hash key value is not known. So, you can't use Query API. However, you can use scan API which is a very costly operation in terms of performance and cost. So, it should be avoided if you have a table of millions of items.
The alternative is to create global secondary index (GSI) with question attribute as hash key and some other field as sort key (possibly timestamp). This way, you should be able to use Query API on GSI. However, this wouldn't solve the problem completely.
DynamoDB doesn't have aggregate functions like count,min and max. So, you need to count the number of items in the result set at client side.
Option 2:-
If you have an option to change the data model, you can redesign the above table as mentioned below:-
question - hash key
timestamp - range key
I have seen many use cases using timestamp as range key. Please analyse your query access patterns (QAP) for all your use cases and make the decision accordingly.

Resources