Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am developing a search engine for my website and i want to add following features to it;
Full text search
Did you mean feature
Data store in MongoDB
I want to make a restful backend. I will be add data to mongodb manually and it will be indexed (which one i can prefer? Mongodb indexing or some other search indexing libraries like Lucene). I also want to use node.js. These are what i found from my researches. Any idea would be appreciated for the architecture
Thanks in advance
I'm using Node.js / MongoDB / Elasticsearch (based on Lucene). It's an excellent combination. The flow is stunning as well, since all 3 packages (can) deal with JSON as their native format, so no need for transforming DTO's etc.
Have a look:
http://www.elasticsearch.org/
I personally use Sphinx and MongoDb, it is a great pair and I have no problems with it.
I back MongoDB onto a MySQL instance which Sphinx just quickly indexes. Since you should never need to actively index _id, since I have no idea who is gonna know the _id of one of your objects to search for, you can just stash it in MySQL as a string field and it will work just fine.
When I pull the results back out of Sphinx all I do is convert to (in PHP) a new MongoId, or in your case a ObjectId and then simply query on this object id for the rest of the data. It couldn't be simpler, no problems, no hassle, no nothing. And I can spin off the load of reindexing delta indexes off to my MySQL instance keeping my MongoDB instance dealing with what it needs to: serving up tasty data for the user.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
How do one implement live like and dislike [or say views count] in couchdb/couchbase in the most efficient way.
Yeah one can use reduce to calculate count each time and on front end only use increment and decrement to one API call to get results.
But for every post there will be say millions of views, like and dislikes.
If we will have millions of such post [in a social networking site], the index will be simply too big.
In terms of Cloudant, the described use case requires a bit of care:
Fast writes
Ever-growing data set
Potentially global queries with aggregations
The key here is to use an immutable data model--don't update any existing documents, only create new ones. This means that you won't have to suffer update conflicts as the load increases.
So a post is its own document in one database, and the likes stored separately. For likes, you have a few options. The classic CouchDB solution would be to have a separate database with "likes" documents containing the post id of the post they refer to, with a view emitting the post id, aggregated by the built-in _count. This would be a pretty efficient solution in this case, but yes, indexes do occupy space on Couch-like databases (just like as with any other database).
Second option would be to exploit the _id field, as this is an index you get for free. If you prefix the like-documents' ids with the liked post's id, you can do an _all_docs query with a start and end key to get all the likes for that post.
Third - recent CouchDBs and Cloudant has the concept of partitioned databases, which very loosely speaking can be viewed as a formalised version of option two above, where you nominate a partition key which is used to ensure a degree of storage locality behind the scenes -- all documents within the same partition are stored in the same shard. This means that it's faster to retrieve -- and on Cloudant, also cheaper. In your case you'd create a partitioned "likes" database with the partition key being the post-id. Glynn Bird wrote up a great intro to partitioned DBs here.
Your remaining issue is that of ever-growth. At Cloudant, we'd expect to get to know you well once your data volume goes beyond single digit TBs. If you'd expect to reach this kind of volume, it's worth tackling that up-front. Any of the likes schemes above could most likely be time-boxed and aggregated once a quarter/month/week or whatever suits your model.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am trying to build a blogging site(sort of). The users can write big blogs(or text) and also have facility for customisation like fonts, size, colour of text etc (kind of like posts in stack overflow n little more). I am looking to use mongo-db or couch-base for the database part. Now I am confused in few things
Where should I store the blogs or posts? In database or in text files? If in database how will I store the fonts, size, colour(user can have different fonts, sizes for different part of posts)?? The posts can sometimes be very big, so is it advisable to store such large texts in database. I see the easier option to store them as files(text files) but I am worried about performance of the site as loading text files can be slow in websites. Just for a knowledge sake, How does google store google docs files??
Should I use any other database which is more suited to handling the kind of things I mentioned?
Though Full search of texts in the post is not a feature I am looking into right now, but might afterwards. So take that also for a small consideration for your answer.
Please help me.
Honestly MongoDB has been the best database for our NodeJS projects. Before it had a 4MB maximum BSON document size, however it was increased to 8 MB and now to 16 MB with the latest versions. This is actually a fair amount of text. According to my calculation you should be able to store 2097152 characters in a 16MB object (though that includes the overhead)
Be aware that you are able to split up text into separate BSON documents very easily using GridFS.
I saw you were entertaining the idea of using flat files. While this may be easy and fast, you will have a hard time indexing the text for later use. MongoDB has the ability to index all your text and implementing search will be a fairly easy feature to add.
MongoDB is pretty fast and I have no doubt it will be the fastest database solution for you. Development in NodeJS + MongoDB has taken months off projects for my firm compared to SQL based databases. Also I have seen some pretty impressive performance reviews for it. Keep in mind as well that those performance reviews were last year and I have seen even more impressive reviews but that was what I could find easily today.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I come from mysql background, and I am aware of typical security concerns when using mysql.
Now, I am using mongodb (java driver).
What are the security concerns, and what are possible ways of avoiding security problems?
Specifically these areas:
1) Do I need to do anything for each get/post?
2) I store cookies from my application on client side and read those later (currently the only information I store is user's location, no sensitive information), Anything I should be careful about?
3) I have text boxes, text areas in my forms which users submit. Do I need to check for anything before saving data in mongo?
Can anybody provide any instances of security problems with existing applications in production?
It is in fact possible to perform injections with Mongo. My experience with it is in Ruby, but consider the following:
Request: /foo?id=1234
id = query_param["id"]
collection.find({_id: id})
# collection.find({_id: 1234})
Seems innocuous enough, right? Depending on your HTTP library, though, you may end up parsing certain query strings as data structures:
Request: /foo?id[$gt]=0
# query_param["id"] => {"$gt": 0}
collection.find({_id: id})
# collection.find({_id: {"$gt": 0}})
This is likely less of a danger in strongly typed languages, but it's still a concern to watch out for.
The typical rememdy here is to ensure that you always cast your inbound parameter data to the type you expect it to be, and fail hard when you mismatch types. This applies to cookie data, as well as any other data from untrusted sources; aggressive casting will prevent a clever user from modifying your query by passing in operator hashes in stead of a value.
The MongoDB documentation similarly says:
Field names in MongoDB’s query language have semantic meaning. The dollar sign (i.e $) is a reserved character used to represent operators (i.e. $inc.) Thus, you should ensure that your application’s users cannot inject operators into their inputs.
You might also get some value out of this answer.
Regarding programming:
When you come from a mysql background, you are surely thinking about SQL Injections and wonder if there is something like that for MongoDB.
When you make the same mistake of generating commands as strings and then sending them to the database by using db.command(String), you will have the same security problems. But no MongoDB tutorial I have ever read even mentions this method.
When you follow the usually taught practice of building DBObjects and passing them to the appropriate methods like collection.find and collection.update, it's the same as using parameterized queries in mysql and thus protects you from most injection attempts.
Regarding configuration:
You need, of course, make sure that the database itself is configured properly to not allow unauthorized access. Note that the out-of-the-box configuration of MongoDB is usually not safe, because it allows non-authorized access from anywhere. Either enable authentication, or make sure that your network firewalls are configured to only allow access to the mongodb port from within the network. But this is a topic for dba.stackexchange.com
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I m looking to store the profile details of the users in my app like their date of birth, country, first name, last name, date on which they joined, location of their profile_pic etc.
I am using mongoose in Expressjs app. I will also be storing the username and hashed password and if they have activated from their email address. I have a few questions that I am getting confused and not really sure what to do?
Should I store all the users in one mongoose model and have a column for activated to c if the user has activated from its email link or not OR Should I have two different tables and once the user has activated I move him from the unactivated table to activated table.
Second thing is want to know should i store the details of profile as columns in same model or create another model for the profile details using population. I am thinking this as these details have anything I need to query about. They are just basically for read and write. I dont query using those parameters. So I was thinking having them in a different model would be slight be better as always I would be querying for the username or password only. One other option I think possible is having them in sub documents.
I will also be storing user preferences of my app and I have the same prob as above.
Please help me as to what option should I choose. I have done a lot of reading but not sure as of now. Please help me with whats the standard thing to do and what would be better
Thanks
Yes, store all users in one mongoose model, with an activated field. That way you simplify some likely database queries, such as getting a list of all your users (activated or not) or checking whether a username has been taken.
You should also store the details of the profile as fields in the same model. If you don't query using those parameters, don't index them. Putting them as subdocuments, in my opinion, is not that useful, and it only makes sense if you plan on having sets of profile details (more than one) per user.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Nowadays when starting a web/mobile app project in which search is going to be an important variable. Is it better to go with Lucene from the start or quickly deploy a MySQL based solution and hope for the best?
I had the same decision in November 2010. I'm a friend of mysql and tried to build an search application on mysql first - which works well...
...and fast (i thought it was fast): searching in 200.000 documents (in not more than 2-3 second)
I avoided spending time to lucene/solr, because i would like to use this time for developing the application. And: lucene was new for me... I don't know, if it is good enough, i don't know what it is....
Finally: You can't change the habits of a lifetime.
However, i run in different problems with fuzzy search (which is difficult to implement in mysql) or "more like this" (which have to be coded from scrat in an application using mysql or simple use that "more like this" solr-feature out of the box).
Finally the number of documents rises up to a million and mysql needs now more than 15 seconds to search into the documents.
So i decided to start with lucene and it feels like i opened a door to a new world.
Lot's of features (i hardly coded application-features) are now provided from solr and working out of the box. The fulltext searches are much, much faster: less than 50ms in 1 million Documents, less than 1ms, if it is cached.
So the invested time has paid off.
So if you think about to make an fulltext search: take lucene, if you have mor than a couple of data.
By the way: i'm using an hybrid construct: holding the data in mysql and lucene is only an index with (nearly) no stored data (to keep that index small and fast).
generically speaking, if you are going to have full text searches, you will most surely need lucene or sphinx + mysql (or lucene + mysql, storing the indexable fields in lucene, and returning an id for a mysql row). either of them are excellent choices.
if you are going to do "normal" searches (i.e: integer or char columns or date), mysql partitoning will suffice.
you need to specify what are you going to search for. and how often you will be reindexing your db (if you are going to reindex a lot, i'd go with sphinx)
You are asking whether to go with Lucene or MySQL. But Lucene is a library, and MySQL is a server. You should really be deciding between SOLR search engine and MySQL. In that case, the right answer is likely to be both. Manage all the data in MySQL. Run processes to regularly extract changed data, transform it into SOLR search format, and load it into the search engine. Using SOLR is much more straightforward than using Lucene directly, and if you need to modify the behavior in some way, you can still write plugins for SOLR so there is no loss of flexibility.
But it would be the kiss of death to try and manage data with SOLR. The cycle of read-edit-update works great with SQL dbs but it is not what SOLR is all about. SOLR is fast flexible text search. You can stick image URLs in SOLR for convenience of preparing search results using a non-indexed field.