MongoDB best practice for recomputing computed data - node.js

The MongoDB docs discuss modelling computed data saying:
The application can either recompute the value with every write that changes the computed value’s source data, or as part of a periodic job.
Updating the data periodically is straight forward but not real-time enough for our use-case, so I wanted to know the best practice for hooking into every DB write in order to recompute some derived data?
Our application is written in NodeJS, using the native MongoDB driver.
I understand Mongoose offers a 'save' hook which sounds ideal for this, however it falls down pretty quickly as it doesn't apply to .update, .updateMany etc. (and Mongoose.prototype.save is pretty terrifying imo).
Does MongoDB support change events at the DB level which would be suitable, or perhaps the native driver (or another client?) exposes hooks designed for re-computing data?

Related

How to track model changes nodejs/postgresql

I have a app perpetuating data in Postgresql/Express/Knex/Objection. I am looking for. way t track changes in my models, so that I can manage and revert versions similar to paper_trail in rails or this port for sequelize: https://github.com/nielsgl/sequelize-paper-trail
Is there something I could use for this in Knex/Objection or at the db level to track changes
Answer: There is not any generic way to do it in Objection nor knex.
Random rambling:
You need to design what kind of changes you like to track and write some code for example to Model hooks in objection how to track the changes.
One way to implement it would be for example by adding a separate table where all the tracked changes are written for example in JSONB object where updated fields or old values are stored and indexed or something like that. I'm pretty sure you don't want to add tracking of all the data in the database, since it will blow up the DB size very fast.
Anyways implementation depends what it is actually why you like or need to track the data and what are actual use cases that you need to support.
Also this might work for you: https://wiki.postgresql.org/wiki/Audit_trigger

Streaming data via Bookshelf.js

I'm looking at using Bookshelf.js as an ORM for an Express project, with Knex. My only question is whether it supports streaming--if we have a query which returns many results, I'd prefer to deal with a stream rather than saving the results in memory. I am not seeing this functionality in the docs, but perhaps there's a way to do it?
Currently Bookshelf doesn't have that functionality, but you can use Knex directly which does. Of course you lose the benefits of using an ORM, but you gain a bit more performance in return, which is probably more important if you're dealing with huge amounts of data.
You can read more about it in Knex's documentation.

Are the Node.JS MongoDB sorting/filtering functions available outside the database?

The MongoDB sorting functions are pretty neato. Can you use them on objects and/or arrays that have nothing to do with the database itself?
var mongo = require('mongodb'),
Server = mongo.Server,
Db = mongo.Db,
sortingFun = mongo.internalSortFilterFunction(); // By the miracle of imagination, this is a made-up line.
There is, for example, this awesome little node project called sift: MongoDB inspired array filtering. But there are more similar tools, different opinions, and projects merging and disappearing.
Considering it's popularity, MongoDB is quite probably gonna hang around. For that reason, plus the added bonus of being exactly similar instead of pretty similar, I was wondering if a specific object/model/function within node-mongodb could be linked from the require('mongodb') specifically for using the sorting and filtering functions on custom objects/arrays.
The sorting is done in the mongo server, not the client. It's also not particularily fast -- big collections should be pre-sorted, but that's another issue.
The mongo server is afaik written in C++ and uses custom types, separate from the JS engine, called BSON.
So if there is no sort implementation on the client for javascript, which would be an absurd feature, you can't use server sort.
Edit: If you really really want to use the sort, performance be damned, you could insert js objects into the DB, effectively converting them to BSON in mongo collections. Then sort it and pull it from the DB. Indexes etc will need to be recreated for every call to that function. Mongodb also refuses to sort for big collections sans index (limit being somewhere around 1000 I believe)
PS. I haven't read the source. I can't imagine a JS realtime, indexless sort that matches the speed of MongoDB's sort esp. when distributed (sharded). But you can write node.js modules in C++, and if BSON is similar enough to V8 JS objects (wouldn't think so), you might be able to port it. I wouldn't go down that road because it's probably not going to be a big speed increase compared to reimplementing it in JS, a reimplementation which would be a lot easier to create and maintain.

Using CouchDB and Redis together for transactional data

As I was reading up about couchdb I stumbled upon a question about transactions and couchdb. Apparently the way to handle transactions in couch is to pull the latest version and compare it to the version you are currently working with. This can present problems if data is changing quickly. The other way is a map reduce and to separate out the transactional data into multiple documents. This also seems less than optimal.
I was thinking about using redis for this sort of data. The increment and decrement functions seem fairly amazing for this sort of purpose.
So I could just write some sort of string for a transactional key like:
//some user document
{
name: "guy",
id: 10,
page_views: "redis user:page_views:10"
}
Then if I read something like "redis" inside of some piece of transactional data then I know to go get that information from redis. I suppose I could decide these things before hand, but since a document oriented database's primary mission is to be flexible and not bound data to columns I figured that there might be an easier way?
Is there an easy way to link redis data to couchdb? should I be doing this all manually and for the few fields that come up? Any other thoughts? Would it be better to update this transactional data "eventually" in the user document or simply not store it there?
Both Redis and CouchDB are "easy" (that is, simple). So in that regard, what you are describing is easy. Of course, by using two databases, you have increased the complexity of your application. But on the other hand, the CouchDB+Redis combination is gaining popularity.
The only tool I know that integrates the two is Mikeal Rogers's redcouch. It is a simple tool. Perhaps you could extend it to add what you need (and send a pull request!).
A more broad consideration is that Redis does not have the full replication feature set that CouchDB does. So Redis might restrict your future options with CouchDB. Specifically, Redis does not support multi-master replication. In contrast with CouchDB, you will always have a centralized Redis database. (Correct me if I'm wrong—I am stronger with CouchDB than with Redis.)

Pick database for ads/analytics service

Now I have a project with ads exchange service (something like google double click) and I have to pick a high-scalable database. I'm thinking about mongodb or cassandra.
Cassandra:
fit with our write-intensive system. (+)
looks hard to do aggregate(very important for analytics) (is there a good way? Just read slide about Twitter rainbird, seem good) (?)
I dont prefer java much. (-)
MongoDB:
Seem easier to do analytics. (have build-in aggregate functions) (+)
more RAM-consuming? (because of document-oriented vs key-value Cassandra) (?)
write perfomance compare to Cassandra? (?)
javascript shell and natural fit with node.js(one important part in our project) (+)
http://pastebin.com/raw.php?i=FD3xe6Jt - This article make me cautious. (-)
Can you guys help me to pick the one or answer some of my questions above
Thanks.
I don't know about Cassandra, but MongoDB has some advantages for using it for analytics: high concurrency, sharding, storing everything about an event in a single document, features like upsert and $inc.
For more detailed explanations check the following resources:
MongoDB Analytics - videos
http://blog.mongodb.org/post/171353301/using-mongodb-for-real-time-analytics
http://www.mongodb.org/display/DOCS/Use+Cases
http://www.slideshare.net/jrosoff/scalable-event-analytics-with-mongodb-ruby-on-rails
http://nosql.mypopescu.com/post/3508305955/fast-asynchronous-analytics-with-mongodb
http://blog.opengovernment.org/2011/02/24/fast-asynchronous-analytics-with-mongodb/
http://blog.10gen.com/post/4416876632/london-startup-ubervu-on-storing-5tb-of-data-in-mongodb
It depends a lot on your domain, most cases one would probably choose Mongo.
For example http://square.github.com/cube/ is built on Mongo.
Cube is an open-source system for visualizing time series data, built on MongoDB, Node and D3. If you send Cube timestamped events (with optional structured data), you can easily build realtime visualizations of aggregate metrics for internal dashboards. For example, you might use Cube to monitor traffic to your website, counting the number of requests in 5-minute intervals:
Most use cases of Cassandra draw from the need oh high availability that's the main feature of it afaik. Your needs seem to be centered around having a cheap way to shove queryable data in a scale-out DB, and mongo almost matches RDBMS in regards to querying. Mongo is also probably easier to deal with.
I think cassandra is a good fit for this problem.
You don't need to know much java to get it running (other than install java), as long as there is a client library in your chosen language.
Cassandra 0.8+ now has atomic counter support - perfect for impressions/click tracking.
You could also run hadoop on top of cassandra, giving you a proven platform for writing map reduce jobs to do analytics/aggregations and store the results back to Cassandra too.
Check out this slideshow about cassandra and hadoop: http://www.slideshare.net/jeromatron/cassandrahadoop-4399672
I hope that helps.

Resources