How to make mongodb and nodejs response fast? - node.js

I have a blog schema(title, topic, body, author, etc) and a function to return all the blogs in that document.
Right now I am just doing Blog.find() to pull all the data at once and sending the responses using res.send(). The problem is the body property of the blog is very large and it takes a lot of time to return the results.
Is there any way to make it fast.

You can get faster results if you add the fields you use the most in searches to the collection as an index.
Usage example:
//The following example creates a single key descending index on the name field:
db.collection.createIndex( { name: -1 } )

Related

Contentful: How to get an entry using nothing but one of its fields? Or, how to set the entryId in web app?

I am needing to make some sharable blog post URLs. To do that, the URL must be something like webpage.com/blog-post-title. It cannot be webpage.com/5GFd5GDSg2345WD.
Since I am using dynamic routing, I need to get a Contentful entry using nothing but what is on the URL. There should not be any queries because queries are ugly and reduce shareability, like webpage.com/blog-post-title?query=queriesAreUgly.
Unfortunately, I need the entryId to get the entry. Also unfortunately, the entryIds are all very ugly and therefore completely useless/unusable. I wish I could set my own entryId, but this does not appear to be possible for mysterious reasons.
I could make a lookup table that pairs URLs with entryIds, but I'm going to be handing this contentful project to someone who is not tech savy, and they should not have to manage a lookup table.
I could get all blog entries then filter by blog title, but, very obviously, this is inefficient, as I would be loading thousands of lines of text for no reason at all.
I could create my own backend API and doing all this myself, but this is also a bad solution because it would take too much time and I could not give it to my non-tech-savvy client.
There are seemingly no solutions to this problem which created by Contentful's inherent needless inflexibility.
The only efficient way to get this to work is to find the entry not by its ID but by one of its fields. Is there a performant/efficient way to do this, or am I just going to have to filter through every single blog post until I find the one with the correct title?
How about adding a 'slug' field to the blog post content type, which you can auto-generate from the title using the field settings (so you don't have to type it out manually?)
You can then filter on the slug field in the query.
If you're using the JavaScript SDK (which it sounds like you are), you can use getEntries() and filter by the slug field to get a single blog post. Like so:
import { createClient } from "contentful";
const client = createClient({
space: {SPACE_ID},
accessToken: {ACCESS_TOKEN},
});
const response = await client
.getEntries({
content_type: "blogPost",
limit: 1,
"fields.slug": "blog-post-title",
})
.then((entry) => entry)
.catch(console.error);

How to ignore a query field in MongoDB

Is there a way to ignore a field passed in a query? This issue is being caused by a query that is coming from an HTTP request.
For example this query would get all documents with title of some title and user's email of user#example.com
//From HTTP request
var query = {
title: 'some title',
'user.email': 'user#example.com'
};
somecollection.find(query, function(err, documents) {
//Not good because we know who posted these documents
});
The difficulty I'm having is that I'm working on an API that basically lets you pass a query to MongoDB and it returns the response. However, the part that is sensitive is that I don't want you to query by the user's email (because the document is supposed to be anonymous). I know you can limit the fields that are returned, but if you can query for all documents by user#example.com then those posts are no longer anonymous.
I guess I could try and delete that part of the query that is passed in from the HTTP request, but then I get into issues with someone using $or or any other operator that I don't know about or forget. Or if they use a string to access deeper parts of the user object.
Is there a way to limit what fields the query can query against?
If you don't want to expose all of the query logic, then don't allow the client to pass in a query. Create a separate endpoint that only accepts the the title as search parameter.
That being said, you could easily retrofit this by doing something like.
var title_only_query = {
'title': user_query.title
}
This way only the title property will be queried for.

Is list function a good candidate for my scenario?

I have a view in couchDb that is defined like this
function (doc) {
if (doc.url) {
var a = new Date(doc.postedOn);
emit([a.toLocaleDateString(), doc.count, doc.userId], {
_id: doc.userId,
postTitle: doc.postTitle,
postSummary: doc.postSummary,
url: doc.url,
count: doc.count
});
}
};
This gives me the result in a format that I want.Sorted first by date then by count and then by userID.
However I have trouble querying it.What I want is to query this view just by userId.That is leave the date and the count parameter null.
_view/viewName?limit=20&descending=true&endkey=["","","userId"]
does not give me the desired result.
Should I be using list function to filter out the results of the view.Is there any impact on performance if I do this?
This quote from the definitive guide first gave me the idea that list functions could be used to filter and aggregate results.
The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON.
List function has nothing to do with your case. From the docs you've linked to yourself:
While Show functions are used to customize document presentation, List functions are used for same purpose, but against View functions results.
Show functions are used to represent documents in various formats, commonly as HTML page with nicer formatting. They can also be used to run server-side functions without requiring a pre-existing document.
To solve your problem just change the order of the emitted keys, putting userId first, i.e.:
[ doc.userId, a.toLocaleDateString(), doc.count ]
and update your query appropriately.
If changing the order of emitted keys is not an option, just create another view.

Pagination in CouchDB using variable keys

There's a bunch of questions on here related to pagination using CouchDB, but none that quite fit what I'm wondering about.
Basically, I have a result set ranked by number of votes, and I want to page through the set in descending order.
Here's the map for reference.
function(doc) {
emit(doc.votes);
}
Now, the problem. I found out that startkey_docid doesn't work on it's own. You have to use it in combination with startkey. The thing is, for the query, I don't use a startkey parameter (I'm not looking to restrict the results, just get the most->least). I was thinking I could just use startkey={{doc.votes}}&startkey_docid={{doc._id}} instead, but the number of votes for a document could have changed by the time someone clicks the "Next Page" link.
The way to solve this seemed obvious: just set startkey=99999999 so that it will return all documents in the database and I can just use startkey_docid to start at the one where we left off last time. Oddly, when I do that, the startkey_docid stopped working and just allowed all results to be returned again. Apparently startkey needs to exactly equal the key on the document whose _id is used in startkey_docid.
What I'm asking is whether anyone knows a workaround for using startkey_docid to page when the actual startkey could have changed by the time you want to use it? Should my application just lookup the document by _id and immediately use the doc.votes value hoping it hasn't changed in the few milliseconds between requests? Even that doesn't seem very reliable.
EDIT: Ended up switching to Mongo for the speed, so this question turned out to be kinda moot.
I have never done something like this but I think I have some idea how to do it. What you can do is to take a snapshot of the ratings and refer to it in every page. You probably want your view not to consume to much space, so you should not map separate copies of the documents with votes not changed after taking the snapshot. So, you can do the following:
Add some history of ratings with timestamp to your document.
Map the ratings AND history like this.
In your app get the current time: start_time = Date.now() and query all pages.
Cleanup the history older then the oldest active sessions.
The problem is that if you emit [votes, date] and try to paginate you will never know how many document you have to fetch to get desired number per page. There can always be some older version which you will have to skip, and you will have make next get from DB. Thats why you can consider emitting: [date, votes], read the view always twice -- for start_time and current time, and merge and sort the result (like in merge-sort).
Ad.1:
{ ...,
votes: 12,
history: [
{date: 1357390271342, votes: 10},
{date: 1357390294682, votes: 11}
]
}
Ad.2:
function (doc) {
emit([{}, doc.votes], null);
doc.history && doc.history.forEach(function(h) {
emit([h.date, h.votes], null);
});
}
Ad.3:
?startkey=[start_time, votes]&limit=items_per_page_plus1
?startkey=[{}, votes]&limit=items_per_page_plus1
Merge lists, sort by votes in your app (on in a list function).
If you will have problems with using start_docid then you can emit [date, votes, id] and query with the ID explicitly. Even when this particular doc changes its votes it will still be available in the history.
Ad.4:
If you emit [date, votes] then you can just get outdated history width: ?startkey=[0]&endkey=[oldest_active_session_time]&inclusive_end=false and update them with update handler:
function(doc, req) {
if (!doc || !doc.history) return [null, 'Error'];
var history = new Array();
var oldest = +(req.query.date);
doc.history.forEach(function(h) {
if (h.date >= oldest)
history.push(h);
});
doc.history = history;
return [doc, 'OK'];
}
Note: I have not tested it, so it is expected not to run without modifications :)
As far as I know CouchDB uses b-tree shadowing to make updates and in principle is should be possible to access older revisions of the view. I am not into the CouchDB design, so it is just a guess and there seems not to be any (documented) API for this.
I can't figure out any simple solution by now, but there are options:
Replicate not-so-often your sorting list to small dedicated db so it will be much more stale than stale=ok
Modify your schema in a way that you'll be able to sort by some more stable data. Look at the banking/ledger example in CouchDb guide: http://guide.couchdb.org/draft/recipes.html#banking. Try to log every vote and reduce them hourly for example. As a bonus you'll get a history/trends :)
I'm kind of surprised this question has been left unanswered because the functionality of CouchDB Futon basically does this when you are paginating through the results of a map function. I opened up firebug to see what was happening in the javascript console as I paginated and saw that for every set of paginated results it is passing the startkey along with startkey_docid. So although the question is how do I paginate without including startkey, CouchDB specifies that the startkey is required and demonstrates how it can work. The endkey is not specified, so if there is only one result for the specified startkey, the next set of paginated results will also contain the next key of the sorted results that do not match the startkey.
So to clarify a bit, the answer to this problem is that as you are paginating and keeping track of the startkey_docid, you also need to capture the startkey of the same document that will be the start of the next set of results. When you are calling the paginated results use both the captured startkey and startkey_docid as couchdb requires. Leave endkey off so that the results will continue on to the next key of the sorted results.
The usecase scenario for wanting to be able to paginate without specifying a key is kind of odd. So let's say that the start docid of the next paginated result did change it's key value drastically from a 9 to a 3. And we are also assuming that there is only one instance of the docid existing in the map results, even though it could potentially appear multiple times (which I believe is why the startkey needs to be specified). As the user is clicking the next button, the user's paginated results will have now moved from looking at rank 9 to rank 3. But if you are including the startkey in addition to the startkey_docid, the paginated results would just start all over at the beginning of the rank 9 results which is a more logical progression than potentially jumping over a large set of results.

Is a type property the correct way to store different data entities in CouchDB?

I'm trying to wrap my head around CouchDB. I'm trying to switch off of MongoDB to CouchDB because I think the concept of views are more appealing to me. In CouchDB it looks like all records are stored in a single database. There is no concept of collections or anything, like in MongoDB. So, when storing different data entities such as users, blog posts, comments, etc, how do you differentiate between them from within your map reduce functions? I was thinking about just using some sort of type property and for each item I'd just have to make sure to specify the type, always. This line of thought was sort of reinforced when I read over the CouchDB cookbook website, in which an example does the same thing.
Is this the most reliable way of doing this, or is there a better method? I was thinking of alternatives, and I think the only other alternative way is to basically embed as much as I can into logical documents. Like, the immediate records inside of the database would all be User records, and each User would have an array of Posts, in which you just add all of the Posts to. The downside here would be that embedded documents wouldn't get their own id properties, correct?
Using type is convenient and fast when creating views. Alternatively you can consider using a part of the JSON document. I.e., instead of defining:
{
type: "user",
firstname: "John",
lastname: "Smith"
}
You would have:
{
user: {
firstname: "John",
lastname: "Smith"
}
}
And then in the view for emitting documents containing user information, instead of using:
function (doc) {
if (doc.type === "user") emit(null, doc);
}
You would write:
function (doc) {
if (doc.user) emit(null, doc);
}
As you can see there is not much difference. As you have already realized 1st approach is the most widely used but second (afaik) is well accepted.
Regarding the question of storing all Posts of one User in one single document. Depends on how you plan to update your document. Remember that you need to write the whole document each time that you update (unless you use attachments). That means that each time a user writes a new Post you need to retrieve the document containing the array of Posts, add/modify one element and update the document. Probably too much (heavy).

Resources