MongoDB find documents before specific _id

MongoDB find documents before specific _id - node.js

I'm creating some kind of real-chat app and I have trouble. What I want to do is read previous 50 messages(documents) before specified _id. I'll explain more detail.
In first time user getting in the room, App automatically load recent 50 messages. After, if user scrolling up to the top, load more 50 previous messages.
The problem is I don't get it how to do. What I thought is find all documents and move the cursor, but every I tried were failed. If I log the "cursor" object in console, it saids:
Promise { <pending> }
so if I do this:
let cursor = db.find('room', { ... });
while(cursor.hasNext()) {
cursor.next();
}
it goes infinite loop, never stops. If will be very thanksful gimme a hand. :)
And if there is alternative way to not need to use cursor, that would be really nice.
one more final question: is using cursor causes performance low?

I'm not sure what library you use, it seems that cursor is an asynchronous object (that's what Promise suggests), so the while loop is incorrect anyway. It will always be pending cause you don't allow the other event (i.e. "i got response") to occure due to single-threaded nature of NodeJS. You probably have to use callbacks, not synchronous loops.
But that aside I do believe that your whole approach is incorrect.
If you know how to load the most recent 50 messages, then it means that you have to have some kind of logical ordering on the collection. Perhaps a timestamp (which might be a part of id_).
So what I propose instead is something similar to "pagination":
On the client side set timestamp_pointer = now()
Do a query: get me 50 most recent messages such that timestamp < timestamp_pointer
On the client side set timestamp_pointer = smallest timestamp of loaded messages
If a user scrolls up go back to point 2.
There are several advantages of this method, one of them is that you don't have to worry if a connection drops for a short moment since the state is tracked on the user side, not on the database side. And with a proper index it will be very fast.
And yes, using cursor like you do causes low performance because the database has to keep track of the query until it is fully iterated. Apart from pure memory and/or cpu usage it has some other nasty drawbacks, like Mongo has timeouts on cursors. What if a user scrolls up after 15 minutes? By default the timeout on cursor is 10 minutes. It would be very hard to implement your idea properly.
Use Postgres. #PostgresEvangelist

Related

Backpressuring Snowflake using "rowStreamHighWaterMark" in snowflake-sdk?

I'm using snowflake-sdk and snowflake-promise to stream results (to avoid loading too many objects in memory).
For each streamed row, I want to process the received information (an ETL-like job that performs write-backs). My code is quite basic and similar to this simplistic snowflake-promise example.
My current problem is that .on('data', ...) is called more often than I can manage to handle. (My ETL-like job can't keep up with the received rows and my DB connection pool to perform write-backs gets exhausted).
I tried setting rowStreamHighWaterMark to various values (1, 10 [default], 100, 1000, 2000 and 4000) in an effort to slow down/backpressure stream.Readable but, unfortunately, it didn't change anything.
What did I miss ? How can I better control when to consume the read data ?

If this was written synchronous, you would see that to "be pushed too much data" than you can handled to write at the same time" cannot happen because:
while(data){
data.readrow()
doSomethineAwesome()
writeDataViaPoolTheBacksUp()
}
just can not spin to fast.
Now if you are accepting data on one async thread, and pushing that data onto a queue and draining the queue in another async thread, you will get the problem you discribe (that is your queue explodes). So you need to slow/pause the completion of the read's thread when the write thread is too behind.
Given to is writing to the assumed queue, when that gets too long, stop.
The other way you might be doing this is with no work queue, but fire a async write each time conditions are meet. This is bad because you have no track of outstand work, and you are doing many small updates to the DB, which if is Snowflake it really dislikes. A better approach would be to build a local set of data changes, we will call it a batch, and when you batch get to a size you flush the changes set in one operation (and you flush the batch when input is completed, to catch the dregs)

The Snowflake support got back to me with an answer.
They told me to create the connection this way:
var connection = snowflake.createConnection({
account: "testaccount",
username: "testusername",
password: "testpassword",
rowStreamHighWaterMark: 5
});
Full disclaimer: My project has changed and I could NOT recreate the problem on my local environment. I couldn't assess the answer's validity; still, I wanted to share in case somebody could get some hints from this information.

Do I need to use transaction when reading data after write?

I have a Node.js web app with a route that marks some entity as deleted - flipping boolean field in a database. This route returns that entity. Right now I have code that looks like this:
UPDATE entity SET is_deleted=true WHERE entity.id = ?
SELECT * FROM entity WHERE entity.id = ?
For the moment I can't use RETURNING statement for other reasons.
So I got in the argument with colleague, I think that putting both UPDATE and SELECT inside transaction is unnecessary, because we are not doing anything significant with data, just returning it. As a user of the app I would expect that data that is returned is as fresh as possible, meaning that I would get same results on page refresh.
My question is, what is the best practice regarding reading data after write? Do you always wrap reading with writing inside transaction? Or it depends?

Well, for performance reasons you want to keep your transactions as small and quick as possible. This will minimize the chance to have potential locks and deadlocks that could bring your application to its knees. As such, unless there is a very good reason to do so, keep your select statements outside of the transaction. This is specially important if your need to execute a long running select statement. By putting the select inside the transaction, you keep the update locks much longer than needed.

close RethinkDB changefeed before changed

I can't seem to find any info in the rethinkdb docs on how you might stop a changefeed before the first change is fired. Here's the problem that makes this necessary:
A client connects to the server via a socket, which begins a changefeed, something like this:
var changeCursors = {};
db('app').table('things').changes().run(cursor, function(err, cursor) {
// do something when changed
changeCursors[user.id] = cursor
})
// later, when the user disconnects
changeCursors[user.id].close()
When the first change is dispatched, I can assign the cursor to a variable in memory, and if the client disconnects, close this cursor.
However, what if the user disconnects before the first change?
As far as I can tell, rethink doesn't support dispatching an initial state to the feed, so the cursor will only be available after a change. However, if the user disconnects, changeCursors[user.id] is undefined, and the changefeed stays open forever.
This can be solved by checking a state object inside the changefeed and just closing the feed after the first change, but in theory if there are no changes and many connected clients, we can potentially open many cursors that will eat memory for no reason (they'll be closed as soon as they update).
Is there a way to get the cursor from a changefeed without the run callback being executed? Alternatively, is there a way to force rethink to perform an initial state update to the run callback?

You'd have this problem even if the server responded immediately, because the user might disconnect after you've sent the query to the server and before the response has made it back. Unfortunately we can't create the cursor before sending the query to the server because in the general case figuring out the return type of the query is sort of hard, so we don't put that logic in the clients.
I think the best option is what you described, where if the cursor hasn't been returned yet you set a flag and close it inside the callback. You might be able to make the logic cleaner using promises.
I wouldn't worry about memory usage unless you're sure it's a problem; if some portion of a second passes without a change, we return a cursor with no initial values to the client, so your memory use in the case of a lot of users opening and then immediately closing connections will be proportional to how many users can do that in that portion of a second. If that portion of a second is too long for you, you can configure it to be smaller with the optargs to run (http://rethinkdb.com/api/javascript/run/). (I would just set firstBatchScaledownFactor to be higher in your case.)

how to make trigger in rethinkDb

My Requirement :
When ever there is a data change in a table(s) (whether insert,update,delete) , i should be
able to update my cache using my logic which does manipulation using the table(s).
Technology : Node, rethinkDb
My Implementation :
I heard of table.changes() in rethinkDb, which emits a stream of objects representing changes to a table.
I tried this code
r.table('games').changes().run(conn, function(err, cursor) {
cursor.each(console.log);
});
Its working fine, i mean i am getting the events in that i put my logic for manipulations.
My Question is for how long it will emit the changes .. I mean is there any limit.
And how it works ?
I read this in their doc,
The server will buffer up to 100,000 elements. If the buffer limit is hit, early changes will be discarded, and the client will receive an object of the form {error: "Changefeed cache over array size limit, skipped X elements."} where X is the number of elements skipped.
I didn't understand this properly. I guess after 100,000 it wont give the changes in the event like old_vale and new_value.
Please explain this constraint and also as per requirement will this work ?
I m very to this technology. Please help me.

Short answer: There's no limit.
The 100.000 elements for the buffer is if you do not retrieve changes from the cursor. The server will keep buffering them up to 100.000 elements. If you use each, you will retrieve the changes as soon as they are available, so you will not be impacted by the limit.

Implications of nested transactions in Firebase?

I am running a transaction to update an item that needs to be stored in two keys. To accomplish this, I have setup a nested transaction as follows, and it seems to run as expected:
firebaseOOO.child('relationships/main').child(accountID).child(friendAccountID).transaction(function(data) {
data.prop = 'newval';
firebaseOOO.child('relationships/main').child(friendAccountID).child(accountID).transaction(function(data) {
return r;
});
return r;
});
Are there any gotchas or possible unexpected implications to this? I am most worried about getting stuck in some sort of transaction loop under load, where each transaction cancels the other out forcing them both to restart, or similar.
Is there a better way of doing this?
I am using the NodeJS client.

You probably don't want to start another transaction from within the callback to the first one. There is no guarantee as to how many times the function for your first transaction will run, particularly if there is a lot of contention at the location you are trying to update.
A better solution, which I believe you hit on in your other question, is to start the second transaction from the completion callback, after checking that the first one committed.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string