mongodb/node - update separate collection from a cursor - Promise condition - node.js

I have two collections for sales data - one for cities and one for towns within those cities. I have sales data for town level only (by month) and now want to add a new city.total sales field which will be the sum these sales for towns within each city.
I am using node to run a script.
Set a cursor on the Towns collection (aggregation) to group collection all sales at the town level. This works fine.
Iterate over the cursor, for each town find the city and add the value to the city.total_sales.
Example code:
cursor.each(function(err, doc) {
assert.equal(err, null);
if (doc != null) {
// debug - lets just find an example row to update
var city_row = db.collection('city').findOne( { "city": "Liverpool" }
)
console.log(city_row);
} else {
callback();
}
});
The issues I am seeing - console shows "Promise { }"
This is run as a batch process - not overly concerned with performance at the moment - what do I need to do to get the code to wait for the find, rather than asynchronous operation?

Put all you findOne queries into a Promise array and then use Promise.all()

Related

Firebase cloud function to count and update collections

I have three collections in my Firebase project, one contains locations that users have checked in from, and the other two are intended to hold leaderboards with the cities and suburbs with the most check ins.
However, as a bit of a newbie to NOSQL databases, I'm not quite sure how to do the queries I need to get and set the data I want.
Currently, my checkins collection has this structure:
{ Suburb:,
City:,
Leaderboard:}
The leaderboard entry is a boolean to mark if the check in has already been added to the leaderboard.
What I want to do is query for all results where leaderboard is false, count the entries for all cities, count the entries for all suburbs, then add the city and suburb data to a separate collection, then update the leaderboard boolean to indicate they've been counted.
exports.updateLeaderboard = functions.pubsub.schedule('30 * * * *').onRun(async context => {
db.collection('Bears')
.where('Leaderboard', '==', 'false')
.get()
.then(snap =>{
snap.forEach(x => {
//Count unique cities and return object SELECT cities,COUNT(*) AS `count` FROM Bears GROUP BY cities
})
})
.then(() => {
console.log({result: 'success'});
})
.catch(error => {
console.error(error);
});
})
Unfortunately, I've come to about the limit of my knowledge here and would love some help.
Firebase is meant to be a real-time platform, and most of your business logic is going to be expressed in Functions. Because the ability to query is so limited, lots of problems like this are usually solved with triggers and data denormalization.
For instance, if you want a count of all mentions of a city, then you have to maintain that count at event-time.
// On document create
await firestore()
.collection("city-count")
.doc(doc.city)
.set({
count: firebase.firestore.FieldValue.increment(1),
}, { merge: true });
Since it's a serverless platform, it's built to run a lot of very small, very fast functions like this. Firebase is very bad at doing large computations -- you can quickly run in to mb/minute and doc/minute write limits.
Edit: Here is how Firebase solved this exact problem from the perspective of a SQL trained developer https://www.youtube.com/watch?v=vKqXSZLLnHA
As clarified in this other post from the Community here, Firestore doesn't have a built-in API for counting documents via query. You will need to read the whole collection and load it to a variable and work with the data then, counting how many of them have False as values in their Leaderboard document. While doing this, you can start adding these cities and suburbs to arrays that after, will be written in the database, updating the other two collections.
The below sample code - untested - returns the values from the Database where the Leaderboard is null, increment a count and shows where you need to copy the value of the City and Suburb to the other collections. I basically changed some of the orders of your codes and changed the variables to generic ones, for better understanding, adding a comment of where to add the copy of values to other collections.
...
// Create a reference to the collection of checkin
let checkinRef = db.collection('cities');
// Create a query against the collection
let queryRef = checkinRef.where('Leaderboard', '==', false);
var count = 0;
queryRef.get().
.then(snap =>{
snap.forEach(x => {
//add the cities and suburbs to their collections here and update the counter
count++;
})
})
...
You are very close to the solution, just need now to copy the values from one collection to the others, once you have all of them that have False in leaderboard. You can get some good examples in copying documents from a Collection to another, in this other post from the Community: Cloud Functions: How to copy Firestore Collection to a new document?
Let me know if the information helped you!

Get relational data in dynamodb

I have studied many articles and blogs and finally conclude that I need only one base table for whole application and then many Global Secondary Indexes according to my access patterns. Now I have stuck in a problem.
My base table structure is:-
**PK** **SK** **data**
university uni_uuid name
course course_uuid uni_uuid
As you see when I will add a new course it will always have a university uuid which will save under university_uuid key with course record.
Now I want to list all the courses to Admin. So I query the dynamodb like:-
var params = {
TableName: "BaseTable",
FilterExpression:"PK = :type",
ExpressionAttributeValues: {
":type": "Course"
}
};
docClient.scan(params, onScan);
var count = 0;
function onScan(err, result) {
if (err) {
console.error("Unable to scan the table. Error JSON:", JSON.stringify(err, null, 2));
} else {
resolve(result);
}
}
This successfully returns me all the added courses. Like:-
Now my question is that how can I show University name in the column University. Currently university_uuid is displaying here. Do I need to run another query and find university name by its uuid, if so then for 100 courses I need to run 100 more queries for each course university name.
Any help will deeply appreciated!!
Approaches:
If university name will not be changed in Admin you can use denormalization and include it into courses table. Even if
university name can be changed, on update you can get all corresponded courses by uni_uuid and update redundant data.
When you receive courses list, distinct uni_uuid and then request data from
universities using IN clouse.

Mongoose express sort items order with populate function

How do I send through a sorted list of documents with mongoose when using the populate function?
The high level structure for my data is Project > Task > Todo item.
I'm currently attempting to find a list of tasks for a given project, then for each task that is found I'm populating the associated todo items. This is working as expected, however I'm now trying to sort the todo items by their "rank" (each todo has an integer representing its rank).
The below piece of code is working to a certain extent...the todo items are coming back in the correct order, but grouped by the parent Task.
How can I apply a top level sort that provides the true sorted list of todo items?
Task.find().where("project.id").equals(req.params.id).populate({path: "todos", options: {sort:{"rank": 1}}}).exec(function(err, projectTasks){
if (err) {
console.log(err);
} else {
res.render("tasks/show", {currentUser: req.user, tasks: projectTasks});
}
});
Thanks!

MongoDB Node.js each method

I have an array of data which I'll store in the database. When I'm looking if the data already exists, each() will called twice, even when I'm using limit(1). I have no clue whats going on here...
collection.find({
month: 'april'
}).limit(1).count(function(err, result){
console.log('counter', result);
});
collection.find({
month: 'april'
}).limit(1).each(function(err, result){
console.log('each', result);
});
collection.find({
month: 'april'
}).limit(1).toArray(function(err, result){
console.log('toArray', result);
});
At this time, there is exact 1 dataset of month April already stored in the collection.
The above queries will generate an output like this:
count 1
each {...}
each null
toArray {...}
In the mongo shell I have checked the count() and forEach() methods. Everything works as expected. Is it a driver problem? Am I doing anything wrong?
This is the expected behavior. The driver returns the items in the loop, and then at the end it returns null to indicate that there are no items left. You can see this in the driver's examples too:
// Find returns a Cursor, which is Enumerable. You can iterate:
collection.find().each(function(err, item) {
if(item != null) console.dir(item);
});
If you are interested in the details, you can check the source code for each:
if(this.items.length > 0) {
// Trampoline all the entries
while(fn = loop(self, callback)) fn(self, callback);
// Call each again
self.each(callback);
} else {
self.nextObject(function(err, item) {
if(err) {
self.state = Cursor.CLOSED;
return callback(utils.toError(err), item);
}
>> if(item == null) return callback(null, null); <<
callback(null, item);
self.each(callback);
})
}
In this code each iterates through the items using loop which shifts items from the array (var doc = self.items.shift();). When this.items.length becomes 0, the else block is executed. This else block tries to get the next document from the cursor. If there are no more documents, nextObject returns null (item's value becomes null) which makes if(item == null) return callback(null, null); to be executed. As you can see the callback is called with null, and this is the null that you can see in the console.
This is needed because MongoDB returns the matching documents using a cursor. If you have millions of documents in the collection and you run find(), not all documents are returned immediately because you would run out of memory. Instead MongoDB iterates through the items using a cursor. "For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte." So this.items.length becomes the number of the items that are in the first batch, but that's not necessarily the total number of the documents resulted by the query. That's why when you iterate through the documents and this.items.length becomes 0, MongoDB uses the cursor to check if there are more matching documents. If there are, it loads the next batch, otherwise it returns null.
It's easier to understand this if you use a large limit. For example in case of limit(100000) you would need a lot of memory if MongoDB returned all 100000 documents immediately. Not to mention how slow processing would be. Instead, MongoDB returns results in batches. Let's say the first batch contains 101 documents. Then this.items.length becomes 101, but that's only the size of the first batch, not the total number of the result. When you iterate through the results and you reach the next item after the last one that is in the current batch (102nd in this case), MongoDB uses the cursor to check if there are more matching documents. If there are, the next batch of documents are loaded, null otherwise.
But you don't have to bother with nextObject() in you code, you only need to check for null as in the MongoDB example.

CouchDB: Filter by List of Account IDs and Sort By Date

Trying to retrieve a list of events from a list of specific accounts (using account ids) and sort these events by date.
Here is my current map function:
function(doc) {
if (doc.doc_type == 'event' && doc.account_id) {
emit([doc.date_start,doc.account_id],doc);
}
}
This outputs the correct data when run in Futon for all accounts, however, I am unclear on how to construct my request query or whether I need to modify my map function to get this accomplished? All that matters is that the events are in one of the accounts specified and that the event dates are sorted in order (descending).
You have two options:
do a query for each account, and merge the results on the client:
function(doc) {
if (doc.doc_type === 'event' && doc.account_id) {
emit([doc.account_id, doc.date_start], 1);
}
}
Query with (properly url-encoded) ?include_docs=true&startkey=["ACCOUNT_ID"]&endkey=["ACCOUNT_ID", {}] for each value of account_id. The merging is fast and easy because the results are already sorted by date.
do a single query, and sort the results on the client:
function(doc) {
if (doc.doc_type === 'event' && doc.account_id) {
emit(doc.account_id, 1);
}
}
Query with ?include_docs=true&keys=["ACCOUNT_ID1", "ACCOUNT_ID2", ...]. Sorting will be slower because results are sorted by account_id.
Only tests will tell you what's the best option for your use-case.

Resources