Understanding NoSQL Data Modeling - blog application - node.js

I am creating an blogging application in Node.js + MongoDB Database. I have used relational Database like MySQL before but this is my first experience with NoSQL database. So I would like to conform my MongoDB data models before I move further.
I have decided my blogDB to have 3 collections
post_collection - stores information about that article
comment_collection - store information about comments on articles
user_info_collection - contains user inforamtion
PostDB
{
_"id" : ObjectID(...),
"author": "author_name",
"Date": new Date(....),
"tag" : ["politics" , "war"],
"post_title": "My first Article",
"post_content": "Big big article"
"likes": 23
"access": "public"
}
CommentDB
{
"_id" : Objectid(...),
"POST": "My First Article",
"comment_by": "User_name",
"comment": "MY comments"
}
UserInfoDB
{
"_id": ObjectID(...),
"user": "User_name",
"password": "My_password"
}
I would appreciate your comments.

In your place, I would embed the Comments collection into the Posts collection.
The disadvantage of this is if you have many many comments you could reach the limit of 16MB.
The advantage is that you prejoined the data, so the query will be faster. You won't look for a comment without its post anyway.
Another thing is that you could make the "user" field the ID of the UserInfo collection. That way you will have a reference to a unique user in Comments.

Related

How to create a linked list structure in MongoDB?

I have billions of documents in a collection. I'm trying to store reference next document for this particular account id.
{"_id": "1234", "title": "Document1", accountId:145, "next": "1236"}
{"_id": "1235", "title": "Document2", accountId:146, "next": "1238"}
{"_id": "1236", "title": "Document1a", accountId:145, }
{"_id": "1238", "title": "Document2a", accountId:146,"next": "1240"} }
{"_id": "1239", "title": "Document3", accountId:147}
{"_id": "1240", "title": "Document2b", accountId:146} }
How I get documents list with limit
Since I'll need the whole 'history' of a document including next documents I guess I'll have to perform a multitude of queries depending on the size of the list?
Any suggestions on how to create a performant index? A different structure for storing linked lists would also be interesting.

Making MongoDB more 'relational'

I like using MongoDB but can't quite swallow the non-relational aspect of it. As far as I can tell from mongo users and the docs: "It's fine, just duplicate parts of your data".
As I'm worried about scaling, and basically just not remembering to update parts of the code to update the correct parts of the data, it seems like a good trade-off to just do an extra query when my API has to return the data for a user with a summary of posts included:
{
"id": 1,
"name": "Default user",
"posts_summary": [
{
"id": 1,
"name": "I am making a blog post",
"description": "I write about some stuff and there are comments after it",
"tags_count": 3
},
{
"id": 2,
"name": "This is my second post",
"description": "In this one I write some more stuff",
"tags_count": 4
}
]
}
...when the posts data looks like this below:
//db.posts
{
"id": 1,
"owner": 1,
"name": "I am making a blog post",
"description": "I write about some stuff and there are comments after it",
"tags": ["Writing", "Blogs", "Stuff"]
},
{
"id": 2,
"owner": 1,
"name": "This is my second post",
"description": "In this one I write some mores tuff",
"tags": ["Writing", "Blogs", "Stuff", "Whatever"]
}
So behind the API, when the query to get the user succeeds, I am doing an additional query to the posts collection to get the "posts_summary" data I need, and adding it in before the API sends response.
It seems like a good trade-off considering the problems it will solve later. Is this what some mongo users do to get around it not being relational, or have I made a mistake when designing my schema?
You can use schema objects as references to implement relational mapping using mongoose
http://mongoosejs.com/docs/populate.html
using mongoose ur schema would be like:
User:Schema({
_id : Number,
name : String,
owner : String,
Post : [{ type: Schema.Types.ObjectId, ref: 'Post' }]
});
Post:Schema({
_id : Number,
name : String,
owner : String,
description : String,
tags:[String]
})

How to get critarea result with specific key and value from collection of mongo database?

Lets suppose, we have collection called posts
{
"_id": ObjectId("5146bb52d8524270060001f3"),
"post_text":"This is a sample post" ,
"user_name": "mark",
"post_privacy": "public",
"post_likes_count": 0
},
{
"_id": ObjectId("5146bb52d8524270060001f4"),
"post_text": "This is a sample post",
"user_name": "pramod",
"post_privacy": "public",
"post_likes_count": 0
}
lets assume we have same table in mysql.
I want same query of sql result into mongo.
query is: select post_likes_count from posts where user_name="mark";
How we can get same result in mongodb?
Ans:
db.posts.find({user_name:"mark"});
mongo playground

View with geospatial and non geospatial keys with CouchDB

I'm using CouchDB and GeoCouch and I'm trying to understand if it were possible to build a geospatial index and "query" the database both by using a location and a value from another field.
Data
{
"_id": "1",
"profession": "medic",
"location": [15.12, 30.22]
}
{
"_id": "2",
"profession": "secretary",
"location": [15.12, 30.22]
}
{
"_id": "3",
"profession": "clown",
"location": [27.12, 2.2]
}
Questions
Is there any way to perform the following queries on these documents:
Find all documents with job = "medic" near location [15.12, 30.22] (more important)
List all the different professions near this location [15.12, 30.22] (a plus)
In case that's not possible, what options do I have? I'm already considering switching to MongoDB, but I'd rather solve in a different way.
Notes
Data changes quickly, new documents might be added and many might be removed
References
Faceted search with geo-index using CouchDB

How do I index all the revisions of a couchdb doc using elasticsearch river plugin

I know how to set up the river plugin and search across it. The problem is if the same document is edited multiple times (multiple revisions), the data from oldest revision is retained and older data is lost. I intend to be able keep an index all revisions for my entire couchdb, so I don'thave to keep the history on my couchdb and retrieve history on a doc using elasticsearch and not have to go to the futon.
I know the issue will be to uniqely determine a key for a couchdb doc while indexing, but we can append the "revision" number to the key and every key will be unique.
I couldn't find for a way to do that in any documentation. Does anyone have an idea as to how to do it.
Any suggestions/thoughts are welcome.
EDIT 1 :
To be more explicit, at the moment elasticsearch saves couchdb docs like this:
"_index": "foo",
"_type": "foo",
"_id": "27fd33f3f51e16c0262e333f2002580a",
"_score": 1.0310782,
"_source": {
"barVal": "bar",
"_rev": "3-d10004227969c8073bc573c33e7e5cfd",
"_id": "27fd33f3f51e16c0262e333f2002580a",
here the _id from couchdb is same as _id for search index. I want the search index to be concat("_id","_rev") from couchdb.
EDIT 2: (after trying out #DaveS solution)
So I tried the following, but It didn't work - the search still indexes it based on the couchdb's _id
What I did:
curl -XDELETE 127.0.0.1:9200/_all
curl -XPUT 'localhost:9200/foo_test' -d '{
"mappings": {
"foo_test": {
"_id": {
"path": "newId",
"index": "not_analyzed",
"store": "yes"
}
}
}
}'
curl -XPUT 'localhost: 9200/_river/foo_test/_meta' -d '{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": 5984,
"db": "foo_test",
"script": "ctx.doc.newId = ctx.doc._id + ctx.doc._rev",
"filter": null
},
"index": {
"index": "foo_test",
"type": "foo_test",
"bulk_size": "100",
"bulk_timeout": "10ms"
}
}'
And after this, when I search for a doc I added, I get:
_index: foo_test
_type: foo_test
_id: 53fa6fcf981a01b05387e680ac4a2efa
_score: 8.238497
_source: {
_rev: 4-8f8808f84eebd0984d269318ad21de93
content: {
foo: bar
foo3: bar3
foo2: bar2
}
_id: 53fa6fcf981a01b05387e680ac4a2efa
newId: 53fa6fcf981a01b05387e680ac4a2efa4-8f8808f84eebd0984d269318ad21de93
#DaveS - Hope this helps in explaining that elasticsearch is not not using the new path to define its "_id" field.
EDIT 3 - for #dadoonet. Hope this helps
This is how you get all older rev info for a couchdb. Then you can iterate through the ones available and get their data and index them:
Get a list of all revisions on a doc id:
curl http://:5984/testdb/cde07b966fa7f32433d33b8d16000ecd?revs_info=true
{"_id":"cde07b966fa7f32433d33b8d16000ecd",
"_rev":"2-16e89e657d637c67749c8dd9375e662f",
"foo":"bar",
"foo2":"bar2",
"_revs_info":[
{"rev":"2-16e89e657d637c67749c8dd9375e662f",
"status":"available"},
{"rev":"1-4c6114c65e295552ab1019e2b046b10e",
"status":"available"}]}
And then you can retrieve each version by (if the status is available):
curl http://<foo>:5984/testdb/cde07b966fa7f32433d33b8d16000ecd?rev=1-4c6114c65e295552ab1019e2b046b10e
{"_id":"cde07b966fa7f32433d33b8d16000ecd",
"_rev":"1-4c6114c65e295552ab1019e2b046b10e",
"foo":"bar"}
curl http://<foo>:5984/testdb/cde07b966fa7f32433d33b8d16000ecd?rev=2-16e89e657d637c67749c8dd9375e662f
{"_id":"cde07b966fa7f32433d33b8d16000ecd",
"_rev":"2-16e89e657d637c67749c8dd9375e662f",
"foo":"bar",
"foo2":"bar2"}
I don't think you can.
Just because as far as I remember, CouchDb does not hold the older versions of a document.
After a compact, old versions are removed.
That said, even if it was doable in CouchDB, you can not store different versions of a document in Elasticsearch.
To do that, you have to define an ID for the new document: for example:
DOCID_REVNUM
That way, new revisions won't update the existing document.
The CouchDB river does not do that by now.
I suggest that you manage that in CouchDB (aka create new docs for each new version of a document) and let the standard CouchDB river index it as another document.
Hope this helps
You might consider adjusting your mapping to pull the _id field from a generated field, e.g. from the docs:
{
"couchdoc" : {
"_id" : {
"path" : "doc_rev_id"
}
}
}
Then "just" modify the river to concatenate the strings and add the result into the document in my_concat_field. One way to do that might be to use the script filter plugin that the couchdb river provides. E.g. something like this:
{
"type" : "couchdb",
"couchdb" : {
"script" : "ctx.doc.doc_rev_id = ctx.doc._id + '_' + ctx.doc._rev"
}
}
You'd take the above snippit and PUT it to the river's endpoint, possibly with the rest of the definition, e.g. via curl -XPUT 'localhost:9200/_river/my_db/_meta' -d '<snippit from above>. Take care to escape the quotes as necessary.

Resources