How can I query Cassandra with GraphQL using a non-primary key column? - cassandra

I am using GraphQL for Cassandra database operation. The following search query work perfectly when filtering the column with partition key:
query oneUsers{
users(value: { username:"username" }) {
values {
id
name
username
}
}
}
But getting following errors when trying to search other columns:
graphql: Exception while fetching data (/users) : org.apache.cassandra.stargate.exceptions.InvalidRequestException: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
What is the best way to search columns in Cassandra using GraphQL?

You can only retrieve records from Cassandra by specifying (a) the partition key, or (b) the primary key column(s).
If you would like to filter by non-primary key columns, you will need to create an index on the column. For example if you want to filter by id, create an index with:
mutation createIndexes {
users: createIndex(
keyspaceName:"myks",
tableName:"users",
columnName:"id",
indexName:"id_idx"
)
}
You should then be able to query by id.
For details and more examples, see Developing with the Astra DB GraphQL API. Cheers!

Related

How To Get Item From DynamoDB Based On Multiple (one primary) Attributes Lambda/NodeJS

My table structure in DynamoDB looks like the following:
uuid (Primary Key) | ip | userAgent
From within a NodeJS function inside of lambda, I would like to get the uuid of an item whose ip and useragent match the information I provide.
Scan becomes less and less efficient and more expensive over time as millions of items are added to the table every week.
Here is the code I am using to try and accomplish this:
function tieDown(sIP, uA){
const userQuery = {
Key : {
"ip" : "192.168.0.1",
"userAgent" : "sample"
},
TableName: "mytable"
};
return ddb.get(userQuery, function(err, data){
if (err) console.log(err.stack);
}).promise();
}
When this code executes, the following error is thrown ValidationException: The provided key element does not match the schema.
So I guess my questions are:
Is it even possible to get one specific item based on non-primary attributes
Are there any issues with the code sample I provided that could lead to this error being thrown? (I'm using DocumentClient so no need to explicitly declare strings, numbers etc.
Thanks!
You cannot get a single item using the get operation without specifying the partition key, and sort key if you have one. Scans should be avoided in most cases. What you probably need is a Global Secondary Index that allows you to query by ip and userAgent. Keep in mind that the records on a GSI are not guaranteed unique, so you may get more than one result.

Which is the correct way to scan a table on DynamoDB?

As the title says, I want to know which is the best way to scan a table in Amazon DynamoDB, searching by another field than the primary key.
I searched about this and read a lot, but I found this solution for me:
let DynamoDBServiceObj = new AWS.DynamoDB({apiVersion: '2012-08-10'});
let params = {
ExpressionAttributeValues: {
':hash' : { S: req.param('wildcard') }
},
ProjectionExpression: 'directory',
FilterExpression: 'qrCode = :hash',
TableName: 'business'
};
let business = await DynamoDBServiceObj.scan(params).promise();
if (business.Count == 1) return res.ok();
else return res.view('404');
This works for me, but I also read that perform an scan on a table is a bad idea, for performance and pricing. But, how to do it then?
Which is the correct way to scan a table, searching by another than the primary key?
What is the difference between DocumentClient and DynamoDB Object?
I always use .get() for obtain what I want on DynamoDB. Is this a good or a bad practice?
I read these posts, and I suppose that GSI is the solution, but I don't understand how it works.
Global Secondary Indexes (GSI)
DynamoDB: Scan on multiple non key attribute
Step 4: Query and Scan the table
Querying and Scanning a DynamoDB Table
What is the difference between scan and query in dynamodb?
How to fetch/scan all items from AWS dynamodb using node.js
Like you said, scanning the table is not a great idea and you have already read about it. I would suggest two things.
Use a composite primary key (if you're not doing so yet). Using the combination of partition key and sort key gives you more possibilities to query (and not scan) your table depending on your frequent access patterns.
If you still need to query the table by an attribute other than the ones included in your composite primary key, you are right that the GSI is the solution. You can check this post on how the GSI works. Choose primary index for Global secondary index
You can think of a GSI as a copy of your table with a different primary key.

Join three table using library knex.js

I am using knex.js
suppose we have three table :-
table1-- id,name,address
table2--id,city,sate,table1_id as fk
table3--id,housenumber,table1_id as fk
I want to join these three table using knex.js libraray of node and express
so that i want to get output json like this.
{
"id":1,
"name":"abc",
"address:"xyz",
"table2":{"id":1,"city":"ttt","state":"www" }//i want check if table1.id == table2.table1_id then put table details
"table3":[]//if no relation found between table1.id === table3.table1.id then kept it as an array
}
tl;dr knex is too low level tool for the thing you are trying to do, you should use an ORM for that kind of task
However you can do that with lots of manual work.
First you have to make the query with proper joins and creating aliases with table prefixes for each column of table to be able to get result data in a format where all data is in a flat array like:
knex('table1' as t1)
.join('table2 as t2', 't2.t1_id', 't1.id')
.select(
't1.id as t1_id',
't1.other_column as t1_other_column',
't2.id as t2_id', <more columns you want to extract>)
Results something like
[ { t1_id: 1, t1_other_column: 'foo', t2_id: 4}, ... more rows with flat data... }]
Then you need to write javascript code for restructuring flat data to nested objects.
But you should not do that kind of work manually. All knex based ORMs has already implemented general solutions for writing that kind of queries in easy manner.

flask dyanmo query for count fileds

consider this schema in dyanmo db,we count of question
[
{
'TableName': "user_detail",
'KeySchema': [
{'AttributeName': "timestamp", 'KeyType': "HASH"},
{'AttributeName': "question", 'KeyType': "RANGE"},
],
'AttributeDefinitions': [
{'AttributeName': "timestamp", 'AttributeType': "S"},
{'AttributeName': "question", 'AttributeType': "N"},
],
'ProvisionedThroughput': {
'ReadCapacityUnits': 40,
'WriteCapacityUnits': 40] }
}
]
I'm beginner of dyanmo db can any one give idea for that one.we need query,the sql query goes like that select count(question) from user_detail where question =1
Thanks in advance
I will throw some pointers. DynamoDB has two types of APIs :-
Option 1:-
1) Scan API - will scan the whole table. The scan api should be used when the hash key value is not known
2) Query API - will query the table using hash key. The hash key is must for Query API
In your case, the hash key value is not known. So, you can't use Query API. However, you can use scan API which is a very costly operation in terms of performance and cost. So, it should be avoided if you have a table of millions of items.
The alternative is to create global secondary index (GSI) with question attribute as hash key and some other field as sort key (possibly timestamp). This way, you should be able to use Query API on GSI. However, this wouldn't solve the problem completely.
DynamoDB doesn't have aggregate functions like count,min and max. So, you need to count the number of items in the result set at client side.
Option 2:-
If you have an option to change the data model, you can redesign the above table as mentioned below:-
question - hash key
timestamp - range key
I have seen many use cases using timestamp as range key. Please analyse your query access patterns (QAP) for all your use cases and make the decision accordingly.

Astyanax key range query

trying to write a query which will paginate through all rows in a column family using astyanax client and RowSliceQuery.
keyspace.prepareQuery(COLUMN_FAMILY).getKeyRange(null, null, null, null, 100);
Done this successfully using hector where 1st call is done with null start and end keys. After retrieving 1st page I use last key from the result to make query for second page and etc. This is code for 1st page using hector.
HFactory.createRangeSlicesQuery(keyspace,
LongSerializer.get(), new CompositeSerializer(),
BytesArraySerializer.get())
.setColumnFamily(COLUMN_FAMILY)
.setRange(null, null, false, 100).setRowCount(100);
Now when I am trying to do this with astyanax I am getting errors about null and non-null keys and tokens. Not sure what tokens do in this query. Also I am able to use allRows(), but would like to do this using key range query as it gives me more flexibility.
Does anybody have an example of key range query using astyanax? I cannot find an example neither in "getting started" documentation or anywhere else on the net.
Thanks!
Anton
What you are referring to is the getRowRange method:
keyspace.prepareQuery(CF_STANDARD1)
.getRowRange(startKey, endKey, startToken, endToken, count)
Note however that this works only when the ByteOrderedPartitioner is used. Since by default Cassandra uses the Murmur3Partitioner, this will usually not work. Using an index to do this instead is recommended. Astyanax also provides the reverse index search recipe which takes advantage of a second column family which stores your keys as columns to allow efficient range searches on the original data.
Check this sample code. I hope this code will help you in doing the paging.
IndexQuery<String, String> query = keyspace
.prepareQuery(CF_STANDARD1).searchWithIndex()
.setRowLimit(10).autoPaginateRows(true).addExpression()
.whereColumn("Index2").equals().value(42);
Best,

Resources