How to limit .once('value) in firebase-admin node.js

How to limit .once('value) in firebase-admin node.js - node.js

How do I limit .once('value') in firebase-admin?
Code:
async function GetStuff(limit, page){
const data = await ref.limitToFirst(parseInt(limit)).once('value')
return data.val();
}
I wanted to create a page system, where a it sends request for a limited amount of data, and the user can change the page to get different data, but for some reason, I can't get it to work.
The code above only gets the first 20(when limit is 20), but how can I make it start at 20, so I can make this page feature.
I thought:
Code:
async function GetStuff(limit, page){
const data = await ref.startAt(limit*page).limitToFirst(parseInt(limit)).once('value')
return data.val();
}

You might want to review the relevant documentation. It looks like you're trying to pass the offset of a child to startAt, but that's not how startAt works. It accepts the actual value of the child to start at. Pagination by offset index is not supported.
The way you use startAt is typically passing the last sorted value retrieved by the prior query (or, if you don't want to retrieve that value again, 1 + that value, or a string that is lexically greater than the last string received. As such, some data sets might actually be difficult to paginate if they have the same sorted value repeated many times.

Related

Nodejs compute gets slow after query big list from Mongodb

I am using mongoose to query a really big list from Mongodb
const chat_list = await chat_model.find({}).sort({uuid: 1}); // uuid is a index
const msg_list = await message_model.find({}, {content: 1, xxx}).sort({create_time: 1});// create_time is a index of message collection, time: t1
// chat_list length is around 2,000, msg_list length is around 90,000
compute(chat_list, msg_list); // time: t2
function compute(chat_list, msg_list) {
for (let i = 0, len = chat_list.length; i < len; i++) {
msg_list.filter(msg => msg.uuid === chat_list[i].uuid)
// consistent handling for every message
}
}
for above code, t1 is about 46s, t2 is about 150s
t2 is really to big, so weird.
then I cached these list to local json file,
const chat_list = require('./chat-list.json');
const msg_list = require('./msg-list.json');
compute(chat_list, msg_list); // time: t2
this time, t2 is around 10s.
so, here comes the question, 150 seconds vs 10 seconds, why? what happened?
I tried to use worker to do the compute step after mongo query, but the time is still much bigger than 10s

The mongodb query returns a FindCursor that includes arrayish methods like .filter() but the result is not an Array.
Use .toArray() on the cursor before filtering to process the mongodb result set like for like. That might not make the overall process any faster, as the result set still needs to be fetched from mongodb, but compute will be similar.
const chat_list = await chat_model
.find({})
.sort({uuid: 1})
.toArray()
const msg_list = await message_model
.find({}, {content: 1, xxx})
.sort({create_time: 1})
.toArray()

Matt typed faster than I did, so some of what was suggested aligns with part of this answer.
I think you are measuring and comparing something different than what you are expecting and implying.
Your expectation is that the compute() function takes around 10 seconds once all of the data is loaded by the application. This is (mostly) demonstrated by your second test, apart from the fact that that test includes the time it takes to load the data from the local files. But you're seeing that there is a difference of 104 seconds (150 - 46) between the completion of message_model.find() and compute() hence leading to the question.
The key thing is that successfully advancing from the find against message_model is not the same thing as retrieving all of the results. As #Matt notes, the find() will return with a cursor object once the initial batch of results are ready. That is very different than retrieving all of the results. So there is more work (apparently ~94 seconds worth) left to do from the two find() operations to further iterate the cursors and retrieve the rest of the results. This additional time is getting reported inside of t2.
Ass suggested by #Matt, calling .toArray() should shift that time back into t1 as you are expecting. Also sounds like it may be more correct due to ambiguity with .filter() functions.
There are two other things that catch my attention. The first is: why are you retrieving all of this data client-side to do the filtering there? Perhaps you would like to do this uuid matching inside of the database via $lookup?
Secondly, this comment isn't clear to me:
// create_time is a index of message collection, time: t1
create_time itself is a field here, existent or not, that you are requesting an ascending sort against.

You are taking data from 2 tables, then with for loop you are comparing ID using filter function, what is happening now is your loop will be executed 2000 time and so the filter function also which contains 90000 records.
So take a worst case scenario here lets consider 2000 uuid you are getting is not inside the msg_list, here you are executing loop 2000*90000 even though you are not getting data.
It wan't take more than 10 to 15 secs if use below code.
//This will generate array of uuid present in message_model
const msg_list = await message_model.find({}, {content: 1, xxx}).sort({create_time: 1}).distinct("uuid");
// Below query will match all uuid present in msg_list array with chat_list UUID
const chat_list = await chat_model.find({uuid:{$in:msg_list}}).sort({uuid: 1});
The above result is doing same as you have done in your code with filter function and loop but this is proper and fastest way to receive the data you required.

OrderBy and StartAt with two different fields firestore

In my app, I have comments that have a field value of threadCommentCount. I want to order the comments using orderBy threadCommentCount descending and then have pagination continue this using startAfter(lastThreadCommentCount). The problem is when threadCommentCount is 0, which is a lot of them, it will return and the same data every time since it starts at 0 everytime. Here is the query:
popularCommentsQuery = db
.collection('comments')
.where('postId', '==', postId)
.orderBy('threadCommentCount', 'desc')
.startAfter(startAfter)
.limit(15)
.get()
This will return the same comments everytime once threadComment count is 0. I'm unable to send the last document snapshot because im using cloud functions and I dont want to send the documentSnapshot in a get query parameter. I don;t really care how the comments are ordered after threadCommentCount is 0, I just need to not get any duplicates. Any help is great!

All Firestore queries have an implicit orderBy("__name__", direction) to resolve any ties between documents that have the same values for the other named orderBy fields. This makes the final sort order stable. But it also enables you to pass another argument to startAfter to provide the document ID of the anchor document that you wish to use for the purpose of pagination.
.startAfter(lastThreadCommentCount, lastDocumentId)
Between these two values, you should be able to uniquely identify the document in the result set to start the next page.

so, I was trying OrderBy and StartAfter with two different fields(time,key) in firestore to establish pagination in flastList.
Key point is that we can pass document snapshot to define the query cursor [reference]
Here is how I managed to do it.
step 1: get the document Id(which is auto generated by firebase) with where() [reference]
const docRef = firestore().collection('shots').where('key','==','custom_key')
const fbDocIdGeneratedByFirebase= await docRef.get().docs[0].id;
step 2: get document snapshot with firebase generated document Id (which we got in the 1st step)
const docRef2= firestore().collection('shots').doc(fbDocIdGeneratedByFirebase)
const snapshot = await docRef2.get();
step 3: pass the snapshot got in step 2 to startAfter() so that the cursor will point there [reference]
let additionalQuery = firestore().collection('shots')
.orderBy("time", "desc")
.startAfter(snapshot)
.limit(this.state.limit)
let documentSnapshots = await additionalQuery.get(); // you know what to do next
...
Can you Improve the solution??

firebase Starting point was already set

I use firebase admin and realtime database on node.js
Data look like
When I want to get data where batch = batch-7, I was doing
let batch = "batch-7";
let ref = admin.database().ref('qr/');
ref.orderByChild("batch").equalTo(batch).on('value', (snapshot) =>
{
res.json(Object.assign({}, snapshot.val()));
ref.off();
});
All was OK!
But now i should create pagination, i.e. I should receive data on 10 elements depending on the page.
I use this code:
let page = req.query.page;// num page
let batch = req.params.batch;// batch name
let ref = admin.database().ref('qr/');
ref.orderByChild("batch").startAt(+page*10).limitToFirst(10).equalTo(batch)
.on('value', (snapshot) =>
{
res.json(Object.assign({}, snapshot.val()));
ref.off();
});
But I have error:
Query.equalTo: Starting point was already set (by another call to startAt or equalTo)
How do I get data in the amount of N, starting at position M, where batch equal my batch

You can only call one startAt (and/or endAt) OR equalTo. Calling both is not possible, nor does it make a lot of sense.
You seem to have a general misunderstanding of how startAt works though, as you're passing in an offset. Firebase queries are not offset based, but work purely on the value, often also referred to as an anchor node.
So when you want to get the data for a second page, and you order by batch, you need to pass in the value of batch for the anchor node; first item that you want to be returned. This anchor node is typically the last item of the previous page, since you don't know the first item of the next page yet. And for this anchor node, you need to know the value of the item you order on (batch) and usually also its key (if/when there may be multiple nodes with the same value for batch).
It also means that you usually request one item more than you need, which is the anchor node.
So when you request the first page, you should track the key/batch of the last node:
var lastKey, lastValue;
ref.orderByChild("batch").equalTo(batch).limitToFirst(10).on('value', (snapshot) => {
snapshot.forEach((child) => {
lastKey = child.key;
lastValue = child.child('batch').value();
})
})
Then when you need the second page, you do a query like that:
ref.orderByChild("batch").start(lastValue, lastKey).endAt(lastValue+"\uf8ff").limitToFirst(11).on('value', (snapshot) => {
snapshot.forEach((child) => {
lastKey = child.key;
lastValue = child.child('batch').value();
})
})
There's one more trick above here: I use startAt instead of equalTo, so that we can get pagination working. But it then uses endAt to ensure we still end at the correct item, by using the last known Unicode character as the last batch value to return.
I'd also highly recommend checking out some of the previous questions on pagination with the Firebase Realtime Database.

Thread pool with Apps Script on Spreadsheet

I have a Google Spreadsheet with internal AppsScript code which process each row of the sheet and perform an urlfetch with the row data. The url will provide a value which will be added to the values returned by each row processing..
For now the code is processing 1 row at a time with a simple for:
var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
var sheet = spreadsheet.getActiveSheet();
var range = sheet.getDataRange();
for(var i=1 ; i<range.getValues().length ; i++) {
var payload = {
// retrieve data from the row and make payload object
};
var options = {
"method":"POST",
"payload" : payload
};
var result = UrlFetchApp.fetch("http://.......", options);
var text = result.getContentText();
// Save result for final processing
// (with multi-thread function this value will be the return of the function)
}
Please note that this is only a simple example, in the real case the working function will be more complex (like 5-6 http calls, where the output of some of them are used as input to the next one, ...).
For the example let's say that there is a generic "function" which executes some sort of processing and provides a result as output.
In order to speed up the process, I'd like to try to implement some sort of "multi-thread" processing, so I can process multiple rows in the same time.
I already know that javascript does not offer a multi-thread handling, but I read about WebWorker which seems to create an async processing of a function.
My goal is to obtain some sort of ThreadPool (like 5 threads at a time) and send every row that need to be processed to the pool, obtaining as output the result of each function.
When all the rows finished the processing, a final action will be performed gathering all the results of each function.
So the capabilities I'm looking for are:
managed "ThreadPool" where I can submit an N amount of tasks to be performed
possibility to obtain a resulting value from each task processed by the pool
possibility to determine that all the tasks has been processed, so a final "event" can be executed
I already see that there are some ready-to-use libraries like:
https://www.hamsters.io/wiki#thread-pool
http://threadsjs.readthedocs.io/en/latest/
https://github.com/andywer/threadpool-js
but they work with NodeJS. Due to AppsScript nature, I need a more simplier approach, which is provided by native JS. Also, it seems that minified JS are not accepted by AppsScript editor, so I also need the "expanded" version.
Do you know a simple ThreadPool in JS where I can submit a function to be execute and I get back a Promise for the result?

Paginating a mongoose mapReduce, for a ranking algorithm

I'm using a MongoDB mapReduce to code a ranking feed algorithm, it almost works but the latest thing to implement is the pagination. The map reduce supports the results limitation but how could I implement the offset (skipping) based e.g. on the latest viewed _id of the results, knowing that I'm using mongoose?
This is the procedure I wrote:
o = {};
o.map = function() {
//log10(likes+comments) / elapsed hours from the post creation
emit(Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1), this);
};
o.reduce = function(key, values) {
//sort the values, when they have the same score
values.sort(function(a, b) {
a.createdAt - b.createdAt;
});
//serialize the values, because mongoose does not support multiple returned values
return JSON.stringify(values);
};
o.scope = {now: new Date()};
o.limit = 15;
Posts.mapReduce(o, function(err, results) {
if (err) return console.log(err);
console.log(results);
});
Also, if the mapReduce it's not the way to go, do you suggest other on how to implement something like this?

What you need is a page delimiter which is not the id of the latest viewed as you say, but your sorting property. In this case, it seems to be the formula Math.log(this.likes + this.comments + 1) / Math.LN10 / Math.abs((now - this.createdAt) / 6e7 + 1).
So, in your mapReduce query needs to hold a where value of that formula above. Or specifically, 'formula >= . And also it needs to hold the value of createdAt at the last page, since you don't sort by that. (Assuming createdAt is unique). So yourqueryof mapReduce would saywhere: theFormulaExpression, createdAt: { $lt: lastCreatedAt }`
If you do allow multiple identical createdAt values, you have to play a little outside of the database itself.
So you just search by formula.
Ideally, that gives you one element with exactly that value, and the next ones sorted after that. So in reply to the module caller, remove this first element off the array (and make sure you actually ask for more results then you need because of this).
Now, since you allow for multiple similar values, you need another identifying prop, say, object id or created_at. Your consumer (caller of this module) will have to provide both (last value of the score, createdAt of the last object). Say you have a page split exactly in the middle - one or more objects is on the previous page, another set on the next
. You'd have to not simply remove the top value (because that same score is already served on the previous page), but possibly several of them from the top.
Then it goes really crazy, because potentially your whole page was already served - compare the _ids, look for the first one after the one your module caller has provided you with. Or look into the data and determine how many matching values like that are there, try to get at least as many more values from mapReduce then you have on your actual page size.
Aside from that, I would do this with aggregation instead, it should be much more preformant.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string