I can't seem to find any info in the rethinkdb docs on how you might stop a changefeed before the first change is fired. Here's the problem that makes this necessary:
A client connects to the server via a socket, which begins a changefeed, something like this:
var changeCursors = {};
db('app').table('things').changes().run(cursor, function(err, cursor) {
// do something when changed
changeCursors[user.id] = cursor
})
// later, when the user disconnects
changeCursors[user.id].close()
When the first change is dispatched, I can assign the cursor to a variable in memory, and if the client disconnects, close this cursor.
However, what if the user disconnects before the first change?
As far as I can tell, rethink doesn't support dispatching an initial state to the feed, so the cursor will only be available after a change. However, if the user disconnects, changeCursors[user.id] is undefined, and the changefeed stays open forever.
This can be solved by checking a state object inside the changefeed and just closing the feed after the first change, but in theory if there are no changes and many connected clients, we can potentially open many cursors that will eat memory for no reason (they'll be closed as soon as they update).
Is there a way to get the cursor from a changefeed without the run callback being executed? Alternatively, is there a way to force rethink to perform an initial state update to the run callback?
You'd have this problem even if the server responded immediately, because the user might disconnect after you've sent the query to the server and before the response has made it back. Unfortunately we can't create the cursor before sending the query to the server because in the general case figuring out the return type of the query is sort of hard, so we don't put that logic in the clients.
I think the best option is what you described, where if the cursor hasn't been returned yet you set a flag and close it inside the callback. You might be able to make the logic cleaner using promises.
I wouldn't worry about memory usage unless you're sure it's a problem; if some portion of a second passes without a change, we return a cursor with no initial values to the client, so your memory use in the case of a lot of users opening and then immediately closing connections will be proportional to how many users can do that in that portion of a second. If that portion of a second is too long for you, you can configure it to be smaller with the optargs to run (http://rethinkdb.com/api/javascript/run/). (I would just set firstBatchScaledownFactor to be higher in your case.)
Related
I'm using snowflake-sdk and snowflake-promise to stream results (to avoid loading too many objects in memory).
For each streamed row, I want to process the received information (an ETL-like job that performs write-backs). My code is quite basic and similar to this simplistic snowflake-promise example.
My current problem is that .on('data', ...) is called more often than I can manage to handle. (My ETL-like job can't keep up with the received rows and my DB connection pool to perform write-backs gets exhausted).
I tried setting rowStreamHighWaterMark to various values (1, 10 [default], 100, 1000, 2000 and 4000) in an effort to slow down/backpressure stream.Readable but, unfortunately, it didn't change anything.
What did I miss ? How can I better control when to consume the read data ?
If this was written synchronous, you would see that to "be pushed too much data" than you can handled to write at the same time" cannot happen because:
while(data){
data.readrow()
doSomethineAwesome()
writeDataViaPoolTheBacksUp()
}
just can not spin to fast.
Now if you are accepting data on one async thread, and pushing that data onto a queue and draining the queue in another async thread, you will get the problem you discribe (that is your queue explodes). So you need to slow/pause the completion of the read's thread when the write thread is too behind.
Given to is writing to the assumed queue, when that gets too long, stop.
The other way you might be doing this is with no work queue, but fire a async write each time conditions are meet. This is bad because you have no track of outstand work, and you are doing many small updates to the DB, which if is Snowflake it really dislikes. A better approach would be to build a local set of data changes, we will call it a batch, and when you batch get to a size you flush the changes set in one operation (and you flush the batch when input is completed, to catch the dregs)
The Snowflake support got back to me with an answer.
They told me to create the connection this way:
var connection = snowflake.createConnection({
account: "testaccount",
username: "testusername",
password: "testpassword",
rowStreamHighWaterMark: 5
});
Full disclaimer: My project has changed and I could NOT recreate the problem on my local environment. I couldn't assess the answer's validity; still, I wanted to share in case somebody could get some hints from this information.
I'm creating some kind of real-chat app and I have trouble. What I want to do is read previous 50 messages(documents) before specified _id. I'll explain more detail.
In first time user getting in the room, App automatically load recent 50 messages. After, if user scrolling up to the top, load more 50 previous messages.
The problem is I don't get it how to do. What I thought is find all documents and move the cursor, but every I tried were failed. If I log the "cursor" object in console, it saids:
Promise { <pending> }
so if I do this:
let cursor = db.find('room', { ... });
while(cursor.hasNext()) {
cursor.next();
}
it goes infinite loop, never stops. If will be very thanksful gimme a hand. :)
And if there is alternative way to not need to use cursor, that would be really nice.
one more final question: is using cursor causes performance low?
I'm not sure what library you use, it seems that cursor is an asynchronous object (that's what Promise suggests), so the while loop is incorrect anyway. It will always be pending cause you don't allow the other event (i.e. "i got response") to occure due to single-threaded nature of NodeJS. You probably have to use callbacks, not synchronous loops.
But that aside I do believe that your whole approach is incorrect.
If you know how to load the most recent 50 messages, then it means that you have to have some kind of logical ordering on the collection. Perhaps a timestamp (which might be a part of id_).
So what I propose instead is something similar to "pagination":
On the client side set timestamp_pointer = now()
Do a query: get me 50 most recent messages such that timestamp < timestamp_pointer
On the client side set timestamp_pointer = smallest timestamp of loaded messages
If a user scrolls up go back to point 2.
There are several advantages of this method, one of them is that you don't have to worry if a connection drops for a short moment since the state is tracked on the user side, not on the database side. And with a proper index it will be very fast.
And yes, using cursor like you do causes low performance because the database has to keep track of the query until it is fully iterated. Apart from pure memory and/or cpu usage it has some other nasty drawbacks, like Mongo has timeouts on cursors. What if a user scrolls up after 15 minutes? By default the timeout on cursor is 10 minutes. It would be very hard to implement your idea properly.
Use Postgres. #PostgresEvangelist
My midlet uses two record stores. Currently, I create/open both record stores when the app starts and I leave them both open for the entire lifetime of the app.
If I open/close the record store after each operation (e.g., reading or writing) the delays are really bad in the emulator.
Similarly, if I close the recordstores when the app exits, there is another very long delay.
So is it OK for me to never close the record stores in my code (thereby, presuming the device will do this itself when the app exits). If not, what is the best practice I can employ without causing a noticeable delay for the user and without risking any data loss?
There is nothing in the docs regarding this, and nothing I could find on google.
As far as I remember, on some phones changes in DB are stored permanently only when DB is closed. While in most J2ME implementations changes are saved on each record change.
I would suggest keeping DB open for whole app session, if it significantly improves performance. It is worth handling DB close in destroyApp() of course.
You also can consider implementing 'auto save' feature - close and reopen DB if IO is inactive for some time.
Usually heavy DB access is performed in some actions only, not constantly. In this case you could wrap bunch of IO operations in a 'transaction' finishing it with DB close.
In other words, on most devices you can go with the first approach (keeping DB open) but on some devices (do not remember exactly, probably on Nokia S40 or S60) it can lead to data loss when the app will be terminated by VM (and you can't handle it since destroyApp is not guarantied to be called), without proper DB close. So in general case it would be right to wrap critical transactions with DB.close() statements
I am trying to debug an issue with the `node-pg-cursor' module in node.js against a postgresql server (version 9.3)
This module allows for sequential reads of N rows in a select and works by sending
cur.read(N): 'Execute' on portal=unnamed, rows=N
this command fetches up to N rows and we can continue fetching rows incrementally until the end, where we receive
CommandComplete
ReadyForQuery
Now my problem is that I want to bail out of the extended command before fetching all the rows and reaching the end of the Execute sequence: I would like to incrementally fetch N rows, N rows, N rows,.. and at one point decide that I have enough.
When I do that (stop fetching via Execute), the query seem to never reach CommandComplete or ReadyForQuery. This seems normal since nothing tells the extended query that I am never going to ask rows from it again.
Apart from closing the connection, is there a command to reach CommandComplete, or ReadyForQuery while not fetching all the rows from the portal ?
I tried to send Close and received CloseComplete, but it did not go to ReadyForQuery.
If I force an ErrorResponse by sending garbage on the protocol, I reach ReadyForQuery but that does not seem very clean ...
I think you're referring to this, in the documentation:
If Execute terminates before completing the execution of a portal (due to reaching a nonzero result-row count), it will send a PortalSuspended message; the appearance of this message tells the frontend that another Execute should be issued against the same portal to complete the operation. The CommandComplete message indicating completion of the source SQL command is not sent until the portal's execution is completed. Therefore, an Execute phase is always terminated by the appearance of exactly one of these messages: CommandComplete, EmptyQueryResponse (if the portal was created from an empty query string), ErrorResponse, or PortalSuspended.
Presumably, you're getting PortalSuspended and you want to discard the portal without executing any more of it or consuming any more results.
If so, I think you can just send a Sync message:
At completion of each series of extended-query messages, the frontend should issue a Sync message. This parameterless message causes the backend to close the current transaction if it's not inside a BEGIN/COMMIT transaction block ("close" meaning to commit if no error, or roll back if error). Then a ReadyForQuery response is issued.
You may wish to issue a Close against the portal first:
The Close message closes an existing prepared statement or portal and releases resources.
so what I think you need to do is, in message flow terms:
Parse
Bind a named portal
Describe
Loop:
Execute with rowcount limit to fetch some rows
If no more rows needed; then
Close the portal
Break out of the loop
If CommandComplete received:
Break out of the loop
Sync
Wait for ReadyForQuery
It sounds like you might want to be using the asynchronous query processing API, if your driver is a libpq wrapper. If it's a native implementation the source code for libpq may offer you clues.
Overall, it looks like you'll need to cancel the query using a new connection, then continue to consume input until the buffer is empty. You'll receive however much result data was buffered, then an error message indicating the query was cancelled (if it didn't buffer all its output before you cancelled it) and finally a ReadyForQuery.
I quote the libpq manual:
A client that uses PQsendQuery/PQgetResult can also attempt to cancel a command that is still being processed by the server; see Section 31.6. But regardless of the return value of PQcancel, the application must continue with the normal result-reading sequence using PQgetResult. A successful cancellation will simply cause the command to terminate sooner than it would have otherwise.
Systems usually have quite big TCP send buffers, and they're typically dynamic. See Linux's tcp(7), the SO_SNDBUF option to setsockopt(2), etc. So quite a lot of data might be buffered before the PostgreSQL server blocks on writing to the socket. PostgreSQL doesn't offer per-connection control of the send buffer size, or even a global config option; you must do it on the operating system level. (That said, it'd be trivial to patch PostgreSQL to set a send buffer size with setsockopt and SO_SENDBUF if you wanted to).
PostgreSQL can't just flush the output buffer when you cancel a query. Even if it were safe to do so and the platform supported it, Pg doesn't know for sure that the buffer has emptied of results from prior queries and other relevant messages, since you might have piplined multiple queries.
So all you can really do is reduce the maximum size of the TCP output buffer. That'll reduce the amount of data you must read and throw away, but it may impact the performance of other queries that send bulk data.
Instead of trying to run the query and cancelling it when you've seen enough, I suggest reading rows in batches, requesting a new batch when you've consumed the current one. You can do this by using protocol-level cursors. That way you can control how much data the server queues up and you don't have to mess with buffer sizes. You may already be doing this - using a named portal, and sending an Execute with a maximum row-count, waiting for the PortalSuspended to say there are more rows to read.
I create a new managed object context in a new thread an insert some objects into it. Can I discard (just forget them) them by just not saving the context? My problem is this: I start a lenghty process which creates some NSManagedObjects atthe beginning and saves them at the end (merges them back into the main store). This happens in a NSOperation. I want the user to be able to quit the app at any time without having to wait for the process to finish. Can I just kill the operation and be save? My understanding is that this is possible because the context does not persist anything without saving. Right?
Yes, you can do that but you shouldn't if the background operation handles any user data.
The UI grammar on MacOS teachers users to expect that all of their data will be saved unless they specify otherwise.
Since saving is virtually instantaneous (from the user's perspective) in the vast majority of cases, it would be better to send a notification to the background operation telling it to stop and save.