Is it possible to cache, say, mongoose document obejcts in Redis,
perhaps for implementing a write-back or write-through cache with timeout-based cache flush mechanisms?
P.S.:
I am familiar with mongoose-redis-cache, but I guess it supports only lean queries, which do not quite serve the purpose here. (But I may be wrong).
As Mongoose objects wrap a MongoDB document, there'd be no reason that you couldn't call
JSON.stringify(mongooseObject.toJSON())
which would return a string representing the MongoDB document. (toJSON) You could then store that result in a key/value in redis.
Where it starts to get more complex is that you'd need to first override the normal save and update functionality to save any modifications to your redis store rather than to the database. While doable, Mongoose wasn't designed for that and you'd be probably more successful to just use the native mongodb drivers and manage general document interactions that way. There are a number of extremely handy operators that you'd need to independently handle (like $push for example, which would add a single value to an array).
The real issue though is that you loose nearly all of the power of MongoDB by not being able to use the query engine or aggregation framework if all of the data isn't already stored in MongoDB (and even if it is, you're still bypassing your caching layer). And, if you're not using any of that functionality, then MongoDB may not be the best match for your needs (and you might instead consider something like CouchDB).
While I can see the potential value of using a caching layer for a high performance MongoDB system, the complexity of a write-back style cache may be more complex than it's worth (and not necessarily safe).
Of course, a write-through cache would be simpler (although you have the complexity of two data-stores and making sure writes are committed consistently if you're going to trust the cache and DB).
(As an aside, I'm not sure how you'd actually manage timeouts, as I thought redis deleted the values associated with keys if they were assigned a lifetime/timeout? I wouldn't want to loose data to the redis cache if you were doing write-back).
In Redis you can only cache raw JSONs, but to cache the whole Mongoose.Documents objects, You can use my library, which handles both - caching results in Redis and mongoose documents in memory. It also has event-based logic to clear both caches, if some related changes appear.
https://www.npmjs.com/package/speedgoose
Related
My previous question: Errors saving data to Google Datastore
We're running into issues writing to Datastore. Based on the previous question, we think the issue is that we're indexing a "SeenTime" attribute with YYYY-MM-DDTHH:MM:SSZ (e.g. 2021-04-29T17:42:58Z) and this is creating a hotspot (see: https://cloud.google.com/datastore/docs/best-practices#indexes).
We need to index this because we're querying the data by date and need the time for each observation in the end application. Is there a way around this issue where we can still query by date?
This answer is a bit late but:
On your previous question, before even writing a query, it feels like the main issue is "running into issues writing" (DEADLINE_EXCEEDED/UNAVAILABLE) -> it's happening on "some saves" -- so, it's not completely clear if it's due to data hot-spotting or from "ingesting more data in shorter bursts", which causes contention (see "Designing for scale").
A single entity in Datastore mode should not be updated too rapidly. If you are using Datastore mode, design your application so that it will not need to update an entity more than once per second. If you update an entity too rapidly, then your Datastore mode writes will have higher latency, timeouts, and other types of error. This is known as contention.
You would need to add a prefix to the key to index monotonically increasing timestamps (as mentioned in the best-practices doc). Then you can test your queries using GQL interface in the console. However, since you most likely want "all events", I don't think it would be possible, and so will result in hot-spotting & read-latency.
The impression is that the latency might be unavoidable. If so, then you would need to decide if it's acceptable, depending on the frequency of your query/number-of-elements returned, along with the amount of latency (performance impact).
Consider switching to Firestore Native Mode. It has a different architecture under the hood and is the next version of Datastore. While Firestore is not perfect, it can be more forgiving about hot-spotting and contention, so it's possible that you'll have fewer issues than in Datastore.
I just need a simple mutex, stored in MongoDB. I want a lock given a unique id. There seem to be many popular solutions with Redis, but in this case, since we are already using MongoDB, I am looking for some sort of library that I can use for locking with MongoDB, but I can't find any good packages. Is there a way to do a simple lock with Mongoose or the official MongoDB node.js driver?
I am especially looking for some mutex in MongoDB that has a built-in TTL (time to live). With Redis, you can give a key a TTL and it will remove itself after a period of time, that's an essential feature.
When I google "mongodb + ttl" this is what I see:
https://docs.mongodb.com/manual/core/index-ttl/
To recap our discussion in the comments...
DBMS Transaction Locking
If you're asking about locking at the DBMS transaction level, I think you will find that most DBMS (SQL or NoSQL) handle transactions / locking on their own (i.e. a read operation on a record will wait until a write operation is finished). In MongoDB, since each operation is a single transaction, they've provided a specifically helpful atomic operation called "findAndUpdate".
Domain Specific Locking
Nothing is stopping you from creating some sort of "locks" collection which must be checked before certain operations are made. You will definitely need to consider and take note of the "edge" cases that could result in illegal state or data inconsistency. This is a good time to also reevaluate your architecture (hint: microservices).
TTL
Mongo supports specifying a TTL index on any date field. So, in your case you could consider adding an index like so: db.my_locks.createIndex( { "deleteAt": 1 }, { expireAfterSeconds: 1 } ) and specifying "deleteAt" on insert.
I have a functionality is Mean Stack which has multiple collection inserts and creation. If it do that in plain mongoose , its going to be multiple Mongo calls and it might be slow.
Can i use mongo stored javascript for this?Pass some values to mongo javascript and it can do all the things from there..
Is it a suggested approach?
The recommended way to do lots of inserts is to use the Bulk Write Operations feature. You can define a set of inserts to be done as a single batch, then pass them all to MongoDB in one go.
However, that is really only appropriate for jobs such as a big data take-on, where you are importing a large number of similar records in one go. If you are running a normal application where there might be inserts, updates, deletes and reads in varying proportions and at varying rates, you would be better off letting Mongoose submit them as individual queries, and making sure your server hardware can cope.
I am going to do a project using nodejs and mongodb. We are designing the schema of database, we are not sure that whether we need to use different collections or same collection to store the data. Because each has its own pros and cons.
If we use single collection, whenever the database is invoked, total collection will be loaded into memory which reduces the RAM capacity.If we use different collections then to retrieve data we need to write different queries. By using one collection retrieving will be easy and by using different collections application will become faster. We are confused whether to use single collection or multiple collections. Please Guide me which one is better.
Usually you use different collections for different things. For example when you have users and articles in the systems, you usually create a "users" collection for users and "articles" collection for articles. You could create one collection called "objects" or something like that and put everything there but it would mean you would have to add some type fields and use it for searches and storage of data. You can use a single collection in the database but it would make the usage more complicated. Of course it would let you to load the entire collection at once but whether or not it is relevant for the performance of your application, that is something that would have to be profiled and tested to give your the performance impact for your particular use case.
Usually, developers create the different collection for different things. Like for post management, people create 'post' collection and save the posts in post collection and same for users and all.
Using different collection for different purpose is a good pratices.
MongoDB is great at scaling horizontally. It can shard a collection across a dynamic cluster to produce a fast, querable collection of your data.
So having a smaller collection size is not really a pro and I am not sure where this theory comes that it is, it isn't in SQL and it isn't in MongoDB. The performance of sharding, if done well, should be relative to the performance of querying a single small collection of data (with a small overhead). If it isn't then you have setup your sharding wrong.
MongoDB is not great at scaling vertically, as #Sushant quoted, the ns size of MongoDB would be a serious limitation here. One thing that quote does not mention is that index size and count also effect the ns size hence why it describes that:
By default MongoDB has a limit of approximately 24,000 namespaces per
database. Each namespace is 628 bytes, the .ns file is 16MB by
default.
Each collection counts as a namespace, as does each index. Thus if
every collection had one index, we can create up to 12,000
collections. The --nssize parameter allows you to increase this limit
(see below).
Be aware that there is a certain minimum overhead per collection -- a
few KB. Further, any index will require at least 8KB of data space as
the b-tree page size is 8KB. Certain operations can get slow if there
are a lot of collections and the meta data gets paged out.
So you won't be able to gracefully handle it if your users exceed the namespace limit. Also it won't be high on performance with the growth of your userbase.
UPDATE
For Mongodb 3.0 or above using WiredTiger storage engine, it will no longer be the limit.
Yes personally I think having multiple collections in a DB keeps it nice and clean. The only thing I would worry about is the size of the collections. Collections are used by a lot of developers to cut up their db into, for example, posts, comments, users.
Sorry about my grammar and lack of explanation I'm on my phone
For a project I am creating a queuing library and basically store URLs in a Set (it's actually an object, where I set keys to true, but one can see it as an array), so the queue only takes every url once. This works really well, however I am facing the problem that there are many URLs and so the RAM usage becomes really high.
Therefor I want to use an on-disk key-value store (actually only keys are required, no idea whether there is some different approach) with the following requirements:
No need to load the whole data set into RAM
Speedy lookups
Node.js bindings
It doesn't have to be too safe (losing data once in a while isn't a huge problem, low RAM requirements are more important) and even though I use Node.JS in this scenario this lookup doesn't necessarily need to run async.
Actually a side question would be whether there is some better way than a on-disk key-value approach. A term would be nice. Lookuptables somehow always lets me find data sets (IPs, ZIP codes, etc.)
I'd use a sql table with a single column (to store the url). Better control on memory usage than redis (which pretty much stores all in memory).
easy to check if there is already the same value
easy to insert
easy to remove one element
If it really "doesn't have to be too safe", another design would be to keep storing everything in memory but limit the number of URLs you store, for example by using an LRU cache.
You could either use a cache in node.js (easy to find via Google) or use a separate memcached server, possibly on the same machine.