consider this schema in dyanmo db,we count of question
[
{
'TableName': "user_detail",
'KeySchema': [
{'AttributeName': "timestamp", 'KeyType': "HASH"},
{'AttributeName': "question", 'KeyType': "RANGE"},
],
'AttributeDefinitions': [
{'AttributeName': "timestamp", 'AttributeType': "S"},
{'AttributeName': "question", 'AttributeType': "N"},
],
'ProvisionedThroughput': {
'ReadCapacityUnits': 40,
'WriteCapacityUnits': 40] }
}
]
I'm beginner of dyanmo db can any one give idea for that one.we need query,the sql query goes like that select count(question) from user_detail where question =1
Thanks in advance
I will throw some pointers. DynamoDB has two types of APIs :-
Option 1:-
1) Scan API - will scan the whole table. The scan api should be used when the hash key value is not known
2) Query API - will query the table using hash key. The hash key is must for Query API
In your case, the hash key value is not known. So, you can't use Query API. However, you can use scan API which is a very costly operation in terms of performance and cost. So, it should be avoided if you have a table of millions of items.
The alternative is to create global secondary index (GSI) with question attribute as hash key and some other field as sort key (possibly timestamp). This way, you should be able to use Query API on GSI. However, this wouldn't solve the problem completely.
DynamoDB doesn't have aggregate functions like count,min and max. So, you need to count the number of items in the result set at client side.
Option 2:-
If you have an option to change the data model, you can redesign the above table as mentioned below:-
question - hash key
timestamp - range key
I have seen many use cases using timestamp as range key. Please analyse your query access patterns (QAP) for all your use cases and make the decision accordingly.
Related
I am using GraphQL for Cassandra database operation. The following search query work perfectly when filtering the column with partition key:
query oneUsers{
users(value: { username:"username" }) {
values {
id
name
username
}
}
}
But getting following errors when trying to search other columns:
graphql: Exception while fetching data (/users) : org.apache.cassandra.stargate.exceptions.InvalidRequestException: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
What is the best way to search columns in Cassandra using GraphQL?
You can only retrieve records from Cassandra by specifying (a) the partition key, or (b) the primary key column(s).
If you would like to filter by non-primary key columns, you will need to create an index on the column. For example if you want to filter by id, create an index with:
mutation createIndexes {
users: createIndex(
keyspaceName:"myks",
tableName:"users",
columnName:"id",
indexName:"id_idx"
)
}
You should then be able to query by id.
For details and more examples, see Developing with the Astra DB GraphQL API. Cheers!
I would like to return items where a nested key exists. I have the following table:
"users": [
{
"active": true,
"apps": {
"app-name-1": {
"active": true,
"group": "aaaaaaaaa",
"settings": {}
}
},
"username: "user1"
},
{
"active": true,
"apps": {
"app-name-2": {
"active": true,
"group": "bbbbbb",
"settings": {}
}
},
"username: "user2"
]
So I want to return all users that have "app-name-1" under "apps". Which operation is the best for this purpose?
The question you need to ask yourself isn't just the "operation", but also how do you model your data in DynamoDB. I.e., how does that JSON array you showed translates into a DynamoDB table, with hash and sort keys?
While DynamoDB nominally does supports nested attributes, this support is actually only partial, with some features (notably secondary indexes) not supporting them, so as I'll show now it is better not to use them. To model your data without nested attributes, what you can do is to use a hash key "username" and sort key "appname". Each item in this table is one app belonging to one user. The user's "active" flag is a bit of a problem in this modeling, but you can implement it by using a fake appname for storing such user parameters.
This modeling makes it efficient to list all applications belonging to one user (I assume you need this feature as well...) but not all users with a certain application. But you were looking for the reverse operation - to get a list of users given an app name.
You can get this reverse with a Scan operation but this is a full-table scan, and accordingly can be very slow and expensive (you'll be paying to read the entire database, even if only part of the data is actually returned to the user).
If efficient search by app is important, you should create a secondary index (GSI) whose hash key is app-name and sort key is user (i.e., the opposite key order from that of the base table). You can then query this index to get - efficiently - the list of usernames that have this app.
Note that such a GSI wouldn't have been possible if you were to insist of modeling your "user" item with nested attributes, because GSIs don't support nested attributes as the key.
As the title says, I want to know which is the best way to scan a table in Amazon DynamoDB, searching by another field than the primary key.
I searched about this and read a lot, but I found this solution for me:
let DynamoDBServiceObj = new AWS.DynamoDB({apiVersion: '2012-08-10'});
let params = {
ExpressionAttributeValues: {
':hash' : { S: req.param('wildcard') }
},
ProjectionExpression: 'directory',
FilterExpression: 'qrCode = :hash',
TableName: 'business'
};
let business = await DynamoDBServiceObj.scan(params).promise();
if (business.Count == 1) return res.ok();
else return res.view('404');
This works for me, but I also read that perform an scan on a table is a bad idea, for performance and pricing. But, how to do it then?
Which is the correct way to scan a table, searching by another than the primary key?
What is the difference between DocumentClient and DynamoDB Object?
I always use .get() for obtain what I want on DynamoDB. Is this a good or a bad practice?
I read these posts, and I suppose that GSI is the solution, but I don't understand how it works.
Global Secondary Indexes (GSI)
DynamoDB: Scan on multiple non key attribute
Step 4: Query and Scan the table
Querying and Scanning a DynamoDB Table
What is the difference between scan and query in dynamodb?
How to fetch/scan all items from AWS dynamodb using node.js
Like you said, scanning the table is not a great idea and you have already read about it. I would suggest two things.
Use a composite primary key (if you're not doing so yet). Using the combination of partition key and sort key gives you more possibilities to query (and not scan) your table depending on your frequent access patterns.
If you still need to query the table by an attribute other than the ones included in your composite primary key, you are right that the GSI is the solution. You can check this post on how the GSI works. Choose primary index for Global secondary index
You can think of a GSI as a copy of your table with a different primary key.
Why doesn't CosmosDB index arrays by default? The default index path is
"path": "/*"
Doesn't that mean "index everything"? Not "index everything except arrays".
If I add my array field to the index with something like this:
"path": "/tags/[]/?"
It will work and start indexing that particular array field.
But my question is why doesn't "index everything" index everything?
EDIT: Here's a blog post that describes the behavior I'm seeing. http://www.devwithadam.com/2017/08/querying-for-items-in-array-in-cosmosdb.html Array_Contains queries are very slow, clearly not using the index. If you add the field in question to the index explicitly then the queries are fast (clearly they start using the index).
"New" index layout
As stated in Index Types
Azure Cosmos containers support a new index layout that no longer uses
the Hash index kind. If you specify a Hash index kind on the indexing
policy, the CRUD requests on the container will silently ignore the
index kind and the response from the container only contains the Range
index kind. All new Cosmos containers use the new index layout by
default.
The below issue does not apply to the new index layout. There the default indexing policy works fine (and delivers the results in 36.55 RUs). However pre-existing collections may still be using the old layout.
"Old" index layout
I was able to reproduce the issue with ARRAY_CONTAINS that you are asking about.
Setting up a CosmosDB collection with 100,000 posts from the SO data dump (e.g. this question would be represented as below)
{
"id": "50614926",
"title": "Indexing arrays in CosmosDB",
/*Other irrelevant properties omitted */
"tags": [
"azure",
"azure-cosmosdb"
]
}
And then performing the following query
SELECT COUNT(1)
FROM t IN c.tags
WHERE t = 'sql-server'
The query took over 2,000 RUs with default indexing policy and 93 with the following addition (as shown in your linked article)
{
"path": "/tags/[]/?",
"indexes": [
{
"kind": "Hash",
"dataType": "String",
"precision": -1
}
]
}
However what you are seeing here is not that the array values aren't being indexed by default. It is just that the default range index is not useful for your query.
The range index uses keys based on partial forward paths. So will contain paths such as the following.
tags/0/azure
tags/0/c#
tags/0/oracle
tags/0/sql-server
tags/1/azure-cosmosdb
tags/1/c#
tags/1/sql-server
With this index structure it starts at tags/0/sql-server and then reads all of the remaining tags/0/ entries and the entirety of the entries for tags/n/ where n is an integer greater than 0. Each distinct document mapping to any of these needs to be retrieved and evaluated.
By contrast the hash index uses reverse paths (more details - PDF)
StackOverflow theoretically allows a maximum of 5 tags per question to be added by the UI so in this case (ignoring the fact that a few questions have more tags through site admin activities) the reverse paths of interest are
sql-server/0/tags
sql-server/1/tags
sql-server/2/tags
sql-server/3/tags
sql-server/4/tags
With the reverse path structure finding all paths with leaf nodes of value sql-server is straight forward.
In this specific use case as the arrays are bounded to a maximum of 5 possible values it is also possible to use the original range index efficiently by looking at just those specific paths.
The following query took 97 RUs with default indexing policy in my test collection.
SELECT COUNT(1)
FROM c
WHERE 'sql-server' IN (c.tags[0], c.tags[1], c.tags[2], c.tags[3], c.tags[4])
Cosmos DB does indexes all the element of an Array. By, default, All Azure Cosmos DB data is indexed. Read more here https://learn.microsoft.com/en-us/azure/cosmos-db/indexing-policies
Lets say I have these documents in my CosmosDB. (DocumentDB API, .NET SDK)
{
// partition key of the collection
"userId" : "0000-0000-0000-0000",
"emailAddresses": [
"someaddress#somedomain.com", "Another.Address#someotherdomain.com"
]
// some more fields
}
I now need to find out if I have a document for a given email address. However, I need the query to be case insensitive.
There are ways to search case insensitive on a field (they do a full scan however):
How to do a Case Insensitive search on Azure DocumentDb?
select * from json j where LOWER(j.name) = 'timbaktu'
e => e.Id.ToLower() == key.ToLower()
These do not work for arrays. Is there an alternative way? A user defined function looks like it could help.
I am mainly looking for a temporary low-effort solution to support the scenario (I have multiple collections like this). I probably need to switch to a data structure like this at some point:
{
"userId" : "0000-0000-0000-0000",
// Option A
"emailAddresses": [
{
"displayName": "someaddress#somedomain.com",
"normalizedName" : "someaddress#somedomain.com"
},
{
"displayName": "Another.Address#someotherdomain.com",
"normalizedName" : "another.address#someotherdomain.com"
}
],
// Option B
"emailAddressesNormalized": {
"someaddress#somedomain.com", "another.address#someotherdomain.com"
}
}
Unfortunately, my production database already contains documents that would need to be updated to support the new structure.
My production collections contain only 100s of these items, so I am even tempted to just get all items and do the comparison in memory on the client.
If performance matters then you should consider one of the normalization solution you have proposed yourself in question. Then you could index the normalized field and get results without doing a full scan.
If for some reason you really don't want to retouch the documents then perhaps the feature you are missing is simple join?
Example query which will do case-insensitive search from within array with a scan:
SELECT c FROM c
join email in c.emailAddresses
where lower(email) = lower('ANOTHER.ADDRESS#someotherdomain.com')
You can find more examples about joining from Getting started with SQL commands in Cosmos DB.
Note that where-criteria in given example cannot use an index, so consider using it only along another more selective (indexed) criteria.