DynamoDB begins with not returning expected results - node.js

I'm using NodeJS and DynamoDB. I'm never used DynamoDB before, and primary a C# developer (where this would simply just be a .Where(x => x...) call, not sure why Amazon made it any more complicated then that). I'm trying to simply just query the table based on if an id starts with certain characters. For example, we have the year as the first 2 characters of the Id field. So something like this: 180192, so the year is 2018. The 20 part is irrelevant, just wanted to give a human readable example. So the Id starts with either 18 or 17 and I simply want to query the db for all rows that Id starts with 18 (for example, could be 17 or whatever). I did look at the documentation and I'm not sure I fully understand it, here's what I have so far that is just returning all results and not the expected results.
let params = {
TableName: db.table,
ProjectionExpression: "id,CompetitorName,code",
KeyConditionExpression: "begins_with(id, :year)",
ExpressionAttributeValues: {
':year': '18'
}
return db.docClient.scan(params).promise();
So as you can see, I'm thinking that this would be a begins_with call, where I look for 18 against the Id. But again, this is returning all results (as if I didn't have KeyConditionExpression at all).
Would love to know where I'm wrong here. Thanks!
UPDATE
So I guess begin_with won't work since it only works on strings and my id is not a string. As per commenters suggestion, I can use BETWEEN, which even that is not working either. I either get back all the results or Query key condition not supported error (if I use .scan, I get back all results, if I use .query I get the error)
Here is the code I'm trying.
let params = {
TableName: db.table,
ProjectionExpression: "id,CompetitorName,code",
KeyConditionExpression: "id BETWEEN :start and :end",
ExpressionAttributeValues: {
':start': 18000,
':end': 189999
}
};
return db.docClient.query(params).promise();

It seems as if there's no actual solution for what I was originally trying to do unfortunately. Which is a huge downfall of DynamoDB. There really needs to be some way to do 'where' using the values of columns, like you can in virtually any other language. However, I have to admit, part of the problem was the way that id was structured. You shouldn't have to rely on the id to get info out of it. Anyways, I did find another column DateofFirstCapture which using with contains (all the dates are not the same format, it's a mess) and using a year 2018 or 2017 seems to be working.

if you want to fetch data by id, add it as the partition key. If you want to get data by part of the string, you can use "begins with" on sort key.
begins_with (a, substr)— true if the value of attribute a begins with a particular substring.
source: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/Query.html

begins_with and between can only be used on sort keys.
For query you must always supply partition key.
So if you change your design to have unique partition key (or unique combo of partition/sort keys) and strings like 180192 as sort key you will be able to query begins_with(sortkey, ...).

Related

flux query: filter out all records related to one matching the condition

I'm trying to filter an influx DB query (using the nodeJS influxdb-client library).
As far as I can tell, it only works with "flux" queries.
I would like to filter out all records that share a specific attribute with any record that matches a particular condition. I'm filtering using the filter-function, but I'm not sure how I can continue from there. Is this possible in a single query?
My filter looks something like this:
|> filter(fn:(r) => r["_value"] == 1 and r["button"] == "1" ) and I would like to leave out all the record that have the same r["session"] as any that match this filter.
Do I need two queries; one to get those r["session"]s and one to filter on those, or is it possible in one?
Update:
Trying the two-step process. Got the list of r["session"]s into an array, and attempting to use the contains() flux function now to filter values included in that array called sessionsExclude.
Flux query section:
|> filter(fn:(r) => contains(value: r["session"], set: ${sessionsExclude}))
Getting an error unexpected token for property key: INT ("102")'. Not sure why. Looks like flux tries to turn the values into Integers? The r["session"] is also a String (and the example in the docs also uses an array of Strings)...
Ended up doing it in two queries. Still confused about the Strings vs Integers, but casting the value as an Int and printing out the array of r["session"] within the query seems to work like this:
'|> filter(fn:(r) => not contains(value: int(v: r["session"]), set: [${sessionsExclude.join(",")}]))'
Added the "not" to exclude instead of retain the values matching the array...

Select one column from Type-ORM query - Node

I have a type ORM query that returns five columns. I just want the company column returned but I need to select all five columns to generate the correct response.
Is there a way to wrap my query in another select statement or transform the results to just get the company column I want?
See my code below:
This is what the query returns currently:
https://i.stack.imgur.com/MghEJ.png
I want it to return:
https://i.stack.imgur.com/qkXJK.png
const qb = createQueryBuilder(Entity, 'stats_table');
qb.select('stats_table.company', 'company');
qb.addSelect('stats_table.title', 'title');
qb.addSelect('city_code');
qb.addSelect('country_code');
qb.addSelect('SUM(count)', 'sum');
qb.where('city_code IS NOT NULL OR country_code IS NOT NULL');
qb.addGroupBy('company');
qb.addGroupBy('stats_table.title');
qb.addGroupBy('country_code');
qb.addGroupBy('city_code');
qb.addOrderBy('sum', 'DESC');
qb.addOrderBy('company');
qb.addOrderBy('title');
qb.limit(3);
qb.cache(true);
return qb.getRawMany();
};```
[1]: https://i.stack.imgur.com/MghEJ.png
[2]: https://i.stack.imgur.com/qkXJK.png
TypeORM didn't meet my criteria, so I'm not experienced with it, but as long as it doesn't cause problems with TypeORM, I see an easy SQL solution and an almost as easy TypeScript solution.
The SQL solution is to simply not select the undesired columns. SQL will allow you to use fields you did not select in WHERE, GROUP BY, and/or ORDER BY clauses, though obviously you'll need to use 'SUM(count)' instead of 'sum' for the order. I have encountered some ORMs that are not happy with this though.
The TS solution is to map the return from qb.getRawMany() so that you only have the field you're interested in. Assuming getRawMany() is returning an array of objects, that would look something like this:
getRawMany().map(companyRecord => {return {company: companyRecord.company}});
That may not be exactly correct, I've taken the day off precisely because I'm sick and my brain is fuzzy enough I was making too many stupid mistakes, but the concept should work even if the code itself doesn't.
EDIT: Also note that map returns a new array, it does not modify the existing array, so you would use this in place of the getRawMany() when assigning, not after the assignment.

How To Get Item From DynamoDB Based On Multiple (one primary) Attributes Lambda/NodeJS

My table structure in DynamoDB looks like the following:
uuid (Primary Key) | ip | userAgent
From within a NodeJS function inside of lambda, I would like to get the uuid of an item whose ip and useragent match the information I provide.
Scan becomes less and less efficient and more expensive over time as millions of items are added to the table every week.
Here is the code I am using to try and accomplish this:
function tieDown(sIP, uA){
const userQuery = {
Key : {
"ip" : "192.168.0.1",
"userAgent" : "sample"
},
TableName: "mytable"
};
return ddb.get(userQuery, function(err, data){
if (err) console.log(err.stack);
}).promise();
}
When this code executes, the following error is thrown ValidationException: The provided key element does not match the schema.
So I guess my questions are:
Is it even possible to get one specific item based on non-primary attributes
Are there any issues with the code sample I provided that could lead to this error being thrown? (I'm using DocumentClient so no need to explicitly declare strings, numbers etc.
Thanks!
You cannot get a single item using the get operation without specifying the partition key, and sort key if you have one. Scans should be avoided in most cases. What you probably need is a Global Secondary Index that allows you to query by ip and userAgent. Keep in mind that the records on a GSI are not guaranteed unique, so you may get more than one result.

SQLAlchemy query with conditional filters and results

I'm building a fastAPI app and I have a complicated query that I'm trying to avoid doing as multiple individual queries where I concat the results.
I have the following tables that all have foreign keys:
CHANGE_LOG: change_id | original (FK ROSTER.shift_id) | new (FK ROSTER.shift_id) | change_type (FK CONFIG_CHANGE_TYPES)
ROSTER: shift_id | shift_type (FK CONFIG_SHIFT_TYPES) | shift_start | shift_end | user_id (FK USERS)
CONFIG_CHANGE_TYPES: change_type_id | change_type_name
CONFIG_SHIFT_TYPES: shift_type_id | shift_type_name
USERS: user_id | user_name
FK= Foreign Key
I need to return the following information:
user_name, change_type_name, and shift_start shift_end and shift_type_name for those whose shift_id matches the original or new in the CHANGE_LOG row.
The catch is that the CHANGE_LOG table might have both original and new, only an original but no new, or only a new but no original. But as the user can select a few options from drop down boxes before submitting the request, I also need to be able to include a filter to single out:
just one user, or all users
any change_type, or a group of change_types
The issue is that I can't find a way to get the user_name guaranteed for each row without inspecting it afterwards because I don't know if the new or original exist or are set to null.
Is there a way in SQLalchemy to have an optional filter in the query where I can say if the original exists use that to get the user_id, but if not then use the new to get the user_id.
Also, if i have a query that definitely finds those with original and new shifts, it will never find those with only one of them as the criteria will never match.
I've also read this and similar ones, and while they'll resolve the issue of conditionally setting some of the filters, it doesn't get around the issue of part nulls returning nothing at all, rather than half the data.
This one seems to solve that problem, but I have no idea how to implement it.
I know it's complicated, so let me know if I've done a poor job of explaining the question.
Sorted. The solution was to use the outerjoin option.
I'm sure the syntax can be more elegant than my solution if I properly engage in adding relationships when defining each class, but what I end up with is explicit and I think it makes it easier to read... at least for me.
Since I'm using a few tables more than once in the same query for different information, it was important to alias those, otherwise I ended up with a conflict (which 'user_id' did you want - it's not clear). For those playing at home, here's my general solution:
new=aliased(ROSTER)
original=aliased(ROSTER)
o_name=aliased(CONFIG_SHIFT_TYPES)
n_name=aliased(CONFIG_SHIFT_TYPES)
pd.read_sql(
db.query(
CHANGE_LOG.change_id,
CHANGE_LOG.created,
CONFIG_CHANGE_TYPES.change_name,
o_name.shift_name.label('original_type'),
n_name.shift_name.label('new_type'),
OPERATORS.operator_name
)
.outerjoin(original, original.shift_id==CHANGE_LOG.original_shift)
.outerjoin(new, new.shift_id==CHANGE_LOG.new_shift)
.outerjoin (CONFIG_CHANGE_TYPES,CONFIG_CHANGE_TYPES.change_id==CHANGE_LOG.change_type)
.outerjoin(CONFIG_SHIFT_TYPES, CONFIG_SHIFT_TYPES.shift_id==new.roster_shift_id)
.outerjoin(o_name, o_name.shift_id==original.roster_shift_id)
.outerjoin(n_name, n_name.shift_id==new.roster_shift_id)
.outerjoin(USERS, or_(USERS.operator_id==original.user_id, USERS.user_id==new.user_id)
).statement, engine)

Rethinkdb replace document if document exists, else insert document

I would like to insert a document if it doesn't exist (client_nr not found).
If this exists, replace the whole document with new values.
The only other this is, that the client_nr is not the primary key. The primary key is the default id created by rethinkdb database.
I tried the below code in node js, but nothing happened. The data is in the variable jsonArray. I use the for loop to go through the whole jsonArray.
Any idea how to solve this problem?
Thanks!!!
for(var Ticker in jsonArray){
r.db(db).table('trades').filter({client_nr: jsonArray[Ticker].client_nr}).forEach(function(post) {
return r.branch(
post.eq(null),
r.db(db).table('log').insert(jsonArray[Ticker]),
r.db(db).table('log').replace(jsonArray[Ticker])
)
}).run()
}
This is much easier to do if client_nr is your primary key. I'd consider doing that instead of using the autogenerated IDs. That will also enforce uniqueness on the field, which is probably what you want.
I was also a little confused by your example because your description made it sound like you wanted to be inserting/replacing into the same table that you're filtering on, but your example is referencing two different tables.
Assuming you want to be using a single table, something like this should do it:
TABLE.filter({client_nr: jsonArray[Ticker].client_nr}).replace(function(row) {
return r(jsonArray[Ticker]).merge(row.pluck('id'));
}).do(function(res) {
return r.branch(
res('replaced').add(res('unchanged')).eq(0),
TABLE.insert(jsonArray[Ticker]),
res);
})

Resources