I am currently designing a blog-type application with a MongoDB backend. The blog application will use editor.js to allow the creation and editing of 'blogs'.
https://editorjs.io/
Editor.js is very friendly, and returns data like this:
{
"time" : 1610826755415,
"blocks" : [
{
"type" : "header",
"data" : {
"text" : "Editor.js",
"level" : 2
}
},
{
"type" : "paragraph",
"data" : {
"text" : "Hey. Meet the new Editor. On this page you can see it in action — try to edit this text."
}
},
{
"type" : "header",
"data" : {
"text" : "Key features",
"level" : 3
}
}
}
My concern is that even though this is very MongoDB friendly, depending on the size of the blog, it is possible that the document will lean towards the 16MB document limit (which we should get close to). Is there a sensible way to split this up without facing the limits of Mongo? Perhaps taking the different types and dividing it up that way?
Thank you.
You can split out content that can be big by referring to the _id the data is in, e.g.:
If you decide that a paragraph can exceed
{
"time" : 1610826755415,
"docId": ObjectId("00000001"),
"blocks" : [
{
"type" : "paragraph",docuId"
"summary": "Hey. Meet the new Editor. On this page you...",
"dataId" : ObjectId("123456789")
}
},
{/*other blocks... */}
]
}
Then have a collection per type where you do your lazy loading. A MongoD collection named paragraph would have records like this:
"ObjectId(123456789)" : {
"docId": ObjectId("00000001"),
"text" : "Hey. Meet the new Editor. On this page you can see it in action — try to edit this text."
}
This has the additional benefit of faster db loading/data transfers if you don't need the entire Object : your queries from the browser may include an option "lazy":"true" which will only load the summary of paragraphs and delay the big content to a later phasis. The docId would allow you to load all dependencies of a document at once
Related
suppose I have a data structure in firebase real time database like
{ "donors" :
"uid1" : { "name" : "x", "bloodGroup" : "A+", "location" : "some Place"},
"uid2" : { "name" : "y", "bloodGroup" : "A-", "location" : "some place"},
...
...
}
now if I have millions of donor records like this. how could I filter them based on bloodGroup location and fetching say 100 records from server at a time using angularfire2.
I have found this page which was really helpful to me when using queries to query my firebase data:
https://howtofirebase.com/collection-queries-with-firebase-b95a0193745d
A very simple example would be along the lines of:
this.donorsData = af.database.list('/donors', {
query: {
orderByChild: 'bloodGroup',
equalTo: 'A+',
}
});
Not entirely sure how to fetch 100 records, then another 100, I am using datatables in my app, which fetches all my data and using the datatables for pagination.
My question is fairly simple. Say that I have a type mapping in an index that looks like this:
"mappings" : {
"post" : {
"analyzer" : "my_custom_analyzer",
"properties" : {
"body" : {
"type" : "string",
"store" : true
}
}
}
}
Note that I specified my_custom_analyzer as the analyzer for the type. When I search the body field without specifying an analyzer in the query, I expect my_custom_analyzer to be used. However, when I use the Analyze API to query the field:
curl http://localhost:9200/myindex/_analyze?field=post.body&text=test
It returns standard analysis results for string. When I specify the analyzer it works:
curl http://localhost:9200/myindex/_analyze?analyzer=my_custom_analyzer&text=test
My question is: why doesn't the Analyze API use the default type analyzer when I specify a field?
Analyzer is per string field.
You cant apply it over an object or nested object and hope all the fields under that object field will inherit that analyzer.
The right approach is as follows -
"mappings" : {
"post" : {
"properties" : {
"body" : {
"type" : "string",
"analyzer" : "my_custom_analyzer",
"store" : true
}
}
}
}
The reason the analyzer worked for analyzer API is because you have declared analyzer for that index.
If you want to define analyzer for all the string fields under a particular object ,you need to mention that in the type template. You can get more information about that here - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-root-object-type.html#_dynamic_templates
I'm using mongify to migrate a mysql database into mongodb.
Doing that, 2 questions appeared:
1- How can i declare my translation file in order to have a embedded array of ids that references to the objects (that are stored in a different collection and can be retrieved through populate), instead of just embedding as json objects.
2- Embedded objects can have an unique id as objects in colections do?. On other projects i've used that approach to query for embedded objects, but if that id is not present i should use a different field.
Unfortunately the first request isn't possible with Mongify at the moment, it requires a custom script to do that.
I could give you more details if you want to send me your translation file (Make sure to remove any sensitive data).
As for number two, the embedded object will get a unique ID. You don't need to do anything special.
Hope that answers your questions.
from mongify isn't possible but in mongodb you can transform data as follows:
//find posts has array of objects
db.getCollection('posts').find({'_tags.0': {$exists: true}}).forEach( function (post) {
var items = [];
var property = '_tags';
post[property].forEach(function(element){
if(element._id !== undefined){
items.push(element._id);
}
});
if(items.length>0){
post[property] = items;
db.posts.update({_id:post._id},post);
}
});
Source Document:
{
"_id" : ObjectId("576aa0389863482f64051c81"),
"id_post" : 130155,
"_tags" : [
{
"_id" : ObjectId("576a9efd9863482f64000044")
},
{
"_id" : ObjectId("576a9efd9863482f6400004b")
},
{
"_id" : ObjectId("576a9efd9863482f64000052")
},
{
"_id" : ObjectId("576a9efd9863482f6400005a")
}
]
}
Final Document:
{
"_id" : ObjectId("576aa0389863482f64051c81"),
"id_post" : 130155,
"_tags" : [
ObjectId("576a9efd9863482f64000044"),
ObjectId("576a9efd9863482f6400004b"),
ObjectId("576a9efd9863482f64000052"),
ObjectId("576a9efd9863482f6400005a")
]
}
I'm running a blog-style web application on AppFog (ex Nodester).
It's written in NodeJS + Express and uses Mongoose framework to persist to MongoDB.
MongoDB is version 1.8 and I don't know whether AppFog is going to upgrade it to 2.2 or not.
Why this intro? Well, now my "posts" are shown in a basic "paginated" visualization, I mean they're just picked up from mongo, sorted by date descending, a page at a time. Here's a snippet:
Post
.find({pubblicato:true})
.populate("commenti")
.sort("-dataInserimento")
.skip(offset)
.limit(archivePageSize)
.exec(function(err,docs) {
var result = {};
result.postsArray = (!err) ? docs : [];
result.currentPage = currentPage;
result.pages = howManyPages;
cb(null, result);
});
Now, my goal is to GROUP BY 'dataInserimento' and show posts like a "diary", I mean:
1st page => 2012/10/08: I show 3 posts
2nd page => 2012/10/10: I show 2 posts (2012/10/09 has no posts, so I don't allow a white page)
3rd page => 2012/10/11: 35 posts and so on...
My idea is to get first the list of all dates with grouping (and maybe counting posts for each day) then build the pages link and, when a page (date) is visited, query like above, adding date as parameter.
SOLUTIONS:
Aggregation framework would be perfect for that, but I can't get my hands on that version of Mongo, now
Using .group() in some way, but the idea it doesn't work in sharded environments does NOT excite me! :-(
writing a MAP-REDUCE! I think this is the right way to go but I can't imagine how map() and reduce() should be written.
Can you help me with a little example, please?
Thanks
EDIT :
The answer of peshkira is correct, however, I don't know if I need exactly that.
I mean, I will have URLs like /archive/2012/10/01, /archive/2012/09/20, and so on.
In each page, it's enough to have the date for querying for posts. But then I have to show "NEXT" or "PREV" links, so I need to know what's the next or previous day containing posts, if any. Maybe can I just query for posts with dates bigger or smaller than the current, and get the first one's date?
Assuming you have something similar as:
{
"author" : "john doe",
"title" : "Post 1",
"article" : "test",
"created" : ISODate("2012-02-17T00:00:00Z")
}
{
"author" : "john doe",
"title" : "Post 2",
"article" : "foo",
"created" : ISODate("2012-02-17T00:00:00Z")
}
{
"author" : "john doe",
"title" : "Post 3",
"article" : "bar",
"created" : ISODate("2012-02-18T00:00:00Z")
}
{
"author" : "john doe",
"title" : "Post 4",
"article" : "foo bar",
"created" : ISODate("2012-02-20T00:00:00Z")
}
{
"author" : "john doe",
"title" : "Post 5",
"article" : "lol cat",
"created" : ISODate("2012-02-20T00:00:00Z")
}
then you can use map reduce as follows:
Map
It just emits the date as key and the post title. You can change the title to the _id, which will probably be more useful to you. If you store the time of the date you will want to use only the date (without time) as the key, otherwise mongo will group by date time and not only date. In my test case all posts have the same time 00:00:00 so it does not matter.
function map() {
emit(this.created, this.title);
}
Reduce
It does nothing more, then just push all values for a key to an array and then the array is wrapped in a result object, because mongo does not allow arrays to be the result of a reduce function.
function reduce(key, values) {
var array = [];
var res = {posts:array};
values.forEach(function (v) {res.posts.push(v);});
return res;
}
Execute
Using db.runCommand({mapreduce: "posts", map: map, reduce: reduce, out: {inline: 1}}) will output the following result:
{
"results" : [
{
"_id" : ISODate("2012-02-17T00:00:00Z"),
"value" : {
"posts" : [
"Post 2",
"Post 1"
]
}
},
{
"_id" : ISODate("2012-02-18T00:00:00Z"),
"value" : "Post 3"
},
{
"_id" : ISODate("2012-02-20T00:00:00Z"),
"value" : {
"posts" : [
"Post 5",
"Post 4"
]
}
}
],
...
}
I hope this helps
Let's say that I store documents like this in ElasticSearch:
{
'name':'user name',
'age':43,
'location':'CA, USA',
'bio':'into java, scala, python ..etc.',
'tags':['java','scala','python','django','lift']
}
And let's say that I search using location=CA, how can I sort the results according to the number of the items in 'tags'?
I would like to list the people with the most number of tag in the first page.
You can do it indexing an additional field which contains the number of tags, on which you can then easily sort your results. Otherwise, if you are willing to pay a little performance cost at query time there's a nice solution that doesn't require to reindex your data: you can sort based on a script like this:
{
"query" : {
"match_all" : {}
},
"sort" : {
"_script" : {
"script" : "doc['tags'].values.length",
"type" : "number",
"order" : "asc"
}
}
}
As you can read from the script based sorting section:
Note, it is recommended, for single custom based script based sorting,
to use custom_score query instead as sorting based on score is faster.
That means that it'd be better to use a custom score query to influence your score, and then sort by score, like this:
{
"query" : {
"custom_score" : {
"query" : {
"match_all" : {}
},
"script" : "_score * doc['tags'].values.length"
}
}
}