Get relational data in dynamodb

Get relational data in dynamodb - node.js

I have studied many articles and blogs and finally conclude that I need only one base table for whole application and then many Global Secondary Indexes according to my access patterns. Now I have stuck in a problem.
My base table structure is:-
**PK** **SK** **data**
university uni_uuid name
course course_uuid uni_uuid
As you see when I will add a new course it will always have a university uuid which will save under university_uuid key with course record.
Now I want to list all the courses to Admin. So I query the dynamodb like:-
var params = {
TableName: "BaseTable",
FilterExpression:"PK = :type",
ExpressionAttributeValues: {
":type": "Course"
}
};
docClient.scan(params, onScan);
var count = 0;
function onScan(err, result) {
if (err) {
console.error("Unable to scan the table. Error JSON:", JSON.stringify(err, null, 2));
} else {
resolve(result);
}
}
This successfully returns me all the added courses. Like:-
Now my question is that how can I show University name in the column University. Currently university_uuid is displaying here. Do I need to run another query and find university name by its uuid, if so then for 100 courses I need to run 100 more queries for each course university name.
Any help will deeply appreciated!!

Approaches:
If university name will not be changed in Admin you can use denormalization and include it into courses table. Even if
university name can be changed, on update you can get all corresponded courses by uni_uuid and update redundant data.
When you receive courses list, distinct uni_uuid and then request data from
universities using IN clouse.

Related

Firebase cloud function to count and update collections

I have three collections in my Firebase project, one contains locations that users have checked in from, and the other two are intended to hold leaderboards with the cities and suburbs with the most check ins.
However, as a bit of a newbie to NOSQL databases, I'm not quite sure how to do the queries I need to get and set the data I want.
Currently, my checkins collection has this structure:
{ Suburb:,
City:,
Leaderboard:}
The leaderboard entry is a boolean to mark if the check in has already been added to the leaderboard.
What I want to do is query for all results where leaderboard is false, count the entries for all cities, count the entries for all suburbs, then add the city and suburb data to a separate collection, then update the leaderboard boolean to indicate they've been counted.
exports.updateLeaderboard = functions.pubsub.schedule('30 * * * *').onRun(async context => {
db.collection('Bears')
.where('Leaderboard', '==', 'false')
.get()
.then(snap =>{
snap.forEach(x => {
//Count unique cities and return object SELECT cities,COUNT(*) AS `count` FROM Bears GROUP BY cities
})
})
.then(() => {
console.log({result: 'success'});
})
.catch(error => {
console.error(error);
});
})
Unfortunately, I've come to about the limit of my knowledge here and would love some help.

Firebase is meant to be a real-time platform, and most of your business logic is going to be expressed in Functions. Because the ability to query is so limited, lots of problems like this are usually solved with triggers and data denormalization.
For instance, if you want a count of all mentions of a city, then you have to maintain that count at event-time.
// On document create
await firestore()
.collection("city-count")
.doc(doc.city)
.set({
count: firebase.firestore.FieldValue.increment(1),
}, { merge: true });
Since it's a serverless platform, it's built to run a lot of very small, very fast functions like this. Firebase is very bad at doing large computations -- you can quickly run in to mb/minute and doc/minute write limits.
Edit: Here is how Firebase solved this exact problem from the perspective of a SQL trained developer https://www.youtube.com/watch?v=vKqXSZLLnHA

As clarified in this other post from the Community here, Firestore doesn't have a built-in API for counting documents via query. You will need to read the whole collection and load it to a variable and work with the data then, counting how many of them have False as values in their Leaderboard document. While doing this, you can start adding these cities and suburbs to arrays that after, will be written in the database, updating the other two collections.
The below sample code - untested - returns the values from the Database where the Leaderboard is null, increment a count and shows where you need to copy the value of the City and Suburb to the other collections. I basically changed some of the orders of your codes and changed the variables to generic ones, for better understanding, adding a comment of where to add the copy of values to other collections.
...
// Create a reference to the collection of checkin
let checkinRef = db.collection('cities');
// Create a query against the collection
let queryRef = checkinRef.where('Leaderboard', '==', false);
var count = 0;
queryRef.get().
.then(snap =>{
snap.forEach(x => {
//add the cities and suburbs to their collections here and update the counter
count++;
})
})
...
You are very close to the solution, just need now to copy the values from one collection to the others, once you have all of them that have False in leaderboard. You can get some good examples in copying documents from a Collection to another, in this other post from the Community: Cloud Functions: How to copy Firestore Collection to a new document?
Let me know if the information helped you!

Google Datastore not retrieving entities

I have been working with the google cloud library, and I can successfully save data in DataStore, specifically from my particle electron device (Used their tutorial here https://docs.particle.io/tutorials/integrations/google-cloud-platform/)
The problem I am now having is retrieving the data again.
I am using this code, but it is not returning anything
function getData(){
var data = [];
const query = datastore.createQuery('ParticleEvent').order('created');
datastore.runQuery(query).then(results => {
const event = results[0];
console.log(results);
event.forEach(data => data.push(data.data));
});
console.log(data)
}
But each time it is returning empty specifically returning this :
[ [], { moreResults: 'NO_MORE_RESULTS', endCursor: 'CgA=' } ]
, and I can't figure out why because I have multiple entities saved in this Datastore.
Thanks

In the tutorial.js from the repo mentioned in the tutorial I see the ParticleEvent entities are created using this data:
var obj = {
gc_pub_sub_id: message.id,
device_id: message.attributes.device_id,
event: message.attributes.event,
data: message.data,
published_at: message.attributes.published_at
}
This means the entities don't have a created property. I suspect that ordering the query by such property name is the reason for which the query doesn't return results. From Datastore Queries (emphasis mine):
The results include all entities that have at least one value for
every property named in the filters and sort orders, and whose
property values meet all the specified filter criteria.
I'd try ordering the query by published_at instead, that appears to be the property with a meaning closest to created.

Query condition missed key schema element : Validation Error

I am trying to query dynamodb using the following code:
const AWS = require('aws-sdk');
let dynamo = new AWS.DynamoDB.DocumentClient({
service: new AWS.DynamoDB(
{
apiVersion: "2012-08-10",
region: "us-east-1"
}),
convertEmptyValues: true
});
dynamo.query({
TableName: "Jobs",
KeyConditionExpression: 'sstatus = :st',
ExpressionAttributeValues: {
':st': 'processing'
}
}, (err, resp) => {
console.log(err, resp);
});
When I run this, I get an error saying:
ValidationException: Query condition missed key schema element: id
I do not understand this. I have defined id as the partition key for the jobs table and need to find all the jobs that are in processing status.

You're trying to run a query using a condition that does not include the primary key. This is how queries work in DynamoDB. You would need to do a scan for the info in your case, however, I don't think that is the best option.
I think you want to set up a global secondary index and use that to query for the processing status.

In another answer #smcstewart responded to this question. But he provides a link instead of commenting why this error occurs. I want to add a brief comment hoping it will save your time.
AWS docs on Querying a Table states that you can do WHERE condition queries (e.g. SQL query SELECT * FROM Music WHERE Artist='No One You Know') in the DynamoDB way, but with one important caveat:
You MUST specify an EQUALITY condition for the PARTITION key, and you can optionally provide another condition for the SORT key.
Meaning you can only use key attributes with Query. Doing it in any other way would mean that DynamoDB would run a full scan for you which is NOT efficient - less efficient than using Global secondary indexes.
So if you need to query on non-key attributes using Query is usually NOT an option - best option is using Global Secondary Indexes as suggested by #smcstewart.
I found this guide to be useful to create a Global secondary index manually.
If you need to add it using CloudFormation here is a relevant page.

I was getting this error for a different scenario. Here is my scenario.
(It's very unlikely that anyone else ends up with this case, but incase)
I had a query working on a Table (say table A). Table A had a partition key m_id and sort key u_id.
I had a query to fetch data using m_id. The query was working.
'''
var queryParams = {
ExpressionAttributeValues: {
':m_id': mId
},
KeyConditionExpression: 'm_id = :m_id',
TableName: "A"
};
let connections = await docClient.query(queryParams).promise();
'''
I created another Table say Table B. I made some errors in naming keys so I simply deleted and created a table with the same name again, Table B. Table B had partition key m_id, and sort key s_id.
I copied pasted the same query which I was using for Table A, I changed Table name only because partition key had the same name.
To my shock, I get this expectation.
"ValidationException: Query condition missed key schema element"
I rechecked all the names, I compared the query with the working query. Everything was fine.
I thought maybe because, I was deleting recreating Table B, it could be something with that. So I create a fresh Table with a new Name Table B2 with the same key names as Table B.
In my query that was throwing exceptions, I changed only the Table name from B to B2.
And the Exception was gone.
If you are getting this on a fresh table, where no query has worked earlier, creating a new Table with a new name is an option.
If you delete a Table only to change partition key names, it may be safer to use a new name for Table as well (Dynamo could be referring metadata by table names and not by internal identifiers, it is possible that old metadata stays even if you delete a table. Just a guess given I faced this case).
EDIT:2022-July-12
This error does not leave me. My own answer was helpful but one more case, there was a trailing space in name of Key in the table. And Dynamo does not even check for spaces in key names.

You have to create an global secondary index for the status field.
Then, you code could look like smth like this:
dynamo.query({
TableName: "Jobs",
IndexName: 'status',
KeyConditionExpression: '#s = :st',
ExpressionAttributeValues: {
':st': 'processing'
},
ExpressionAttributeNames: {
'#s': 'status',
},
}, (err, resp) => {
console.log(err, resp);
});
Note: scan operation is indeed very costly, especially if you table is huge in size

i solved the problem using AWS.DynamoDB.DocumentClient() with scan, for sample (nodejs):
var docClient = new AWS.DynamoDB.DocumentClient();
var params = {
TableName: "product",
FilterExpression: "#cg = :data",
ExpressionAttributeNames: {
"#cg": "categoria",
},
ExpressionAttributeValues: {
":data": category,
}
};
docClient.scan(params, onScan);
function onScan(err, data) {
if (err) {
// for the log in server
console.error("Unable to scan the table. Error JSON:", JSON.stringify(err, null, 2));
res.json(err);
} else {
console.log("Scan succeeded.");
res.json(data);
}
}

mongodb/node - update separate collection from a cursor - Promise condition

I have two collections for sales data - one for cities and one for towns within those cities. I have sales data for town level only (by month) and now want to add a new city.total sales field which will be the sum these sales for towns within each city.
I am using node to run a script.
Set a cursor on the Towns collection (aggregation) to group collection all sales at the town level. This works fine.
Iterate over the cursor, for each town find the city and add the value to the city.total_sales.
Example code:
cursor.each(function(err, doc) {
assert.equal(err, null);
if (doc != null) {
// debug - lets just find an example row to update
var city_row = db.collection('city').findOne( { "city": "Liverpool" }
)
console.log(city_row);
} else {
callback();
}
});
The issues I am seeing - console shows "Promise { }"
This is run as a batch process - not overly concerned with performance at the moment - what do I need to do to get the code to wait for the find, rather than asynchronous operation?

Put all you findOne queries into a Promise array and then use Promise.all()

Modelling API: each row represents a table. Suggestions?

I have an app that stores user uploaded spreadsheets as tables in PostgreSQL. Everytime an user uploads a spreadsheet I create a record in a Dataset table containing the physical table name, its alias and the owner. I can retrieve a certain Dataset information with
GET domain.com/v1/Datasets/{id}
AFAIK, the relation between rows in Dataset and physical tables can't be enforced by a FK, or at least I haven't seen anyones creating FKs on the information_schema of PostgreSQL, and FKs can't drop tables, or can they? So it's common to have orphan tables, or records in Dataset that point to tables that no longer exist. I have managed this with business logic and cleaning tasks.
Now, to access one of those physical tables, for example one called nba_teams I would need to declare an NbaTeams model in loopback and restart the app, then query its records with
GET domain.com/v1/NbaTeams/{id}
But that can't scale, specially if I'm already having like 100 uploads a day. So from where I'm standing, there are two ways to go:
1.- Create one model, then add 4 custom methods that accepts a table name as a string, and perform the next CRUD operation on that table name via raw queries. For example, to list the records:
GET domain.com/v1/Datasets/getTable/NbaTeams
or, to update one team
PUT domain.com/v1/Datasets/getTable/NbaTeams/{teamId}
This sounds unelegant but should work.
2.- Create a custom method that accepts a table name as a string, which in turn creates an ephemeral model and forward the HTTP verb and the rest of the arguments to it
dataSource.discoverAndBuildModels('nba_teams', {
owner: 'uploader'
}, function (err, models) {
console.log(models);
models.NbaTeams.find(function (err, act) {
if (err) {
console.error(err);
} else {
console.log(act);
}
dataSource.disconnect();
});
});
this second one I haven't got to work yet, and I don't know how much overhead it might have, but I'm sure it's doable.
So before I dig in deeper I came to ask: has anybody dealt with this row-to-table relation? What are the good practices in this?

In the end, I did my own hacky workaround and I thought it may help someone, some day.
What I did was put a middleware (with regular express syntax) to listen for /v1/dataset{id_dataset} , create the model on the fly and pass the execution to the next middleware
app.use('/v1/dataset:id_dataset', function(req, res, next) {
var idDataset=req.params.id_dataset;
app.getTheTable(idDataset,function(err,result) {
if(err) {
console.error(err);
res.json({"error":"couldn't retrieve related table"});
} else {
next();
}
});
});
inside the app.getTheTable function, I'm creating a model dynamically and setting it up before callback
app.getTheTable = function (idDataset, callback) {
var Table = app.models.Dataset,
modelName='dataset'+idDataset,
dataSource;
Table.findById(idDataset, function (err, resultados) {
if (err) {
callback(new Error('Unauthorized'));
} else {
if(app.models[modelName]) {
callback(null,modelName); // model already exists
} else {
var theDataset = dataSource.createModel(modelName, properties, options);
theDataset.settings.plural = modelName;
theDataset.setup();
app.model(theDataset);
var restApiRoot = app.get('restApiRoot');
app.use(restApiRoot, app.loopback.rest());
callback(null, modelName);
}
}
});
};
It's hacky, I know, and I believe there must be some kind of performance penalty for overloading restApiRoot middleware, but it's still better tan creating 500 models on startup to cover all possible dataset requests.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Get relational data in dynamodb - node.js

Related

Firebase cloud function to count and update collections

Google Datastore not retrieving entities

Query condition missed key schema element : Validation Error

mongodb/node - update separate collection from a cursor - Promise condition

Modelling API: each row represents a table. Suggestions?

Categories

Resources