I'm working on a "real-time" website using Nodejs. Currently, I'm using Redis because I need high performance for read-access. The write accesses are not really significant for my use case.
In addition, Redis does not have a query language for the search. So, I create my indexes manually and I use some unions/intersections/... to find some values.
I think that it will be easier to use MongoDB with a embedded finding system and a ORM-like (Mongoose for example). The problem is that I'm not sure that MongoDB is the best choice for my usecase.
What is your advices about the NoSQL DB that I need ? Redis ? CouchDB ? MongoDB ? Cassandra ? etc.
I repeat: I want to have a real good performance for the read accesses and for the searches (the write accesses are not significant), the simplest possible (orm-like ? finding system ? etc.)
Thanks.
I believe that redis would be the better solution for the following reasons.
You require fast read access and redis provides the fastest solution since the keys are in memory, if not most.
Although mongodb is easier to query in the general case, your problem domain is narrow and once you decide how you would like to query the data, you can put the correct data structures and indexes in place.
I would say that Redis is a good fit for your DB, and you should look at something like Solr or elasticsearch to provide your searching.
CouchDB will do better in write heavy environment. I don't use it though.
MongoDB will do better on read heavy environment.
For search and indexing:
MongoDB would require separate index for each of your search criteria for better performance (at least this is what I remember).
Proper index is important in MongoDB. And no joins!!
Here are some links you might go through:
http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-vs-mongodb-benchmark/
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Hope these will help you find the right db
Goodluck
Related
I am trying to backup a database through sqlachemy and save it as a file. I tried using the extension, Flask-AlchemyDumps, but it appears to no longer be supported.
I musted be missing something obvious as this is surly an action a lot of developers want to do. Does anyone know how I should be backing up the database?
Thanks in advance
J Kirkman
SQLAlchemy is an ORM which sits between your code and the database. It's useful if you want to interact with specific rows and relationships without having to keep track of lots of ids and joins.
What you're looking for is a way to dump the entire contents of your DB to disk, presumably so you can restore it later/elsewhere. This is a bulk action, which is your first clue that an ORM may not be a suitable tool. (ORMs tend to be fast enough for small to medium operations, but slow and not ideal for actions which affect 10s of 1000s of rows at once.) And indeed, this isn't usually something you'd use an ORM for, it's a feature of your DB, presumably Postgres or MySQL. If you happen to be using Heroku, you can use their command line tool to do this.
Currently we are using mongodb as our primary store for big online sales site, and currently we are focusing ourselves on big scalability among multiple machines.
Site backend is written in node.js and we are using mongoose as ODM.
I can see many blog posts which are writing about awesome cassandra DB, and I am starting to think about switching to cassandra. But still I am not sure if this is a really good decision, because I didn't found any good ODM/ORM lib for cassandra and node.js (and writing raw queries can be pain. Also writing good tested ORM/ODM can be time consuming task). So I am not sure how much benefit will I have after this switch. We are using elasticsearch as search engine, and it works excellent in combination with mongodb, and I am asking my self will do also good with cassandra.
If you have any experiance with this, it will be very helpfull.
Thank you!
Cassandra is a very nicely designed database, which can fulfill a lot of scenarios. MongoDB is also a really good DB engine. So let me just compare couple of main bullet points for you.
Always on system
Cassandra is really great when you need to provide 24x7 operations in multiple data centers. If you got more then one datacenter with multiple servers in each of them then Cassandra is great for you. Cassandra can sync writes to more than one datacenter and maintain desired data consistency across complex set ups. Recovery and re-sync is also quite easy.
On the other note MongoDB is easy to operate. If you got one data center and only couple of servers it might be a perfect fit (although global write lock might be a pain over time). In simple deployments it's easy to maintain and monitor.
Scalability
To continue the above statements - Cassandra is linearly scalable. There is, literally, no limit of how big the cluster will be. Your writes will always stay fast, while reads might become more complicated over time - depending on the structure of your data.
Denormalization of data
With Cassandra your writes and reads can be extremely fast if you will create a structure that will reflect what you need to get from your data. There is no query language (well, there is, but it's not exactly SQL) that you can use to reorganize your result set using aggregates, groupings, etc. Yes, some things are doable and some not - that is very specific to Cassandra data model. You will have to implement a lot of things on your own and write the result to the DB - i.e. counters for aggregation, different groupings, etc.
In comparison MongoDB is easy to use, easier to learn and more flexible - both for development (as knowledge curve/efforts goes) and for implementation of business logic (as time/effort is considered). That is - kind of - a reason why there are ORM engines for MongoDB and only couple (very limited) for Cassandra.
To summarize - both DBs are really good... if you will embrace their limitations. If you got only 100GB of data and you need flexible, easy to implement DB engine I would stick to MongoDB, alternatively take a look RethinkDB which have a very similar model and way better (in my personal opinion) clustering/data center replication implementation.
Cassandra is a great option for you if you will need to store TBs of data soon, deploying your apps across multiple data centers while accepting the cost of additional efforts to implement the same features and maintaining similar capabilities.
Don't take it personally that I have used the word only while describing your data set. Yes, it's not big - my company stores more than 20 TB these days... so yeah, 100GB is really not that much...
To stop everyone from pointing that I should compare some other features or point out some other differences between those two - it's just a rough, high level overview on the things I consider relevant to the problem, not a full comparison or analysis of the problem. But feel free to point out what I have missed and I will be happy to include new stuff in this answer...
As I was reading up about couchdb I stumbled upon a question about transactions and couchdb. Apparently the way to handle transactions in couch is to pull the latest version and compare it to the version you are currently working with. This can present problems if data is changing quickly. The other way is a map reduce and to separate out the transactional data into multiple documents. This also seems less than optimal.
I was thinking about using redis for this sort of data. The increment and decrement functions seem fairly amazing for this sort of purpose.
So I could just write some sort of string for a transactional key like:
//some user document
{
name: "guy",
id: 10,
page_views: "redis user:page_views:10"
}
Then if I read something like "redis" inside of some piece of transactional data then I know to go get that information from redis. I suppose I could decide these things before hand, but since a document oriented database's primary mission is to be flexible and not bound data to columns I figured that there might be an easier way?
Is there an easy way to link redis data to couchdb? should I be doing this all manually and for the few fields that come up? Any other thoughts? Would it be better to update this transactional data "eventually" in the user document or simply not store it there?
Both Redis and CouchDB are "easy" (that is, simple). So in that regard, what you are describing is easy. Of course, by using two databases, you have increased the complexity of your application. But on the other hand, the CouchDB+Redis combination is gaining popularity.
The only tool I know that integrates the two is Mikeal Rogers's redcouch. It is a simple tool. Perhaps you could extend it to add what you need (and send a pull request!).
A more broad consideration is that Redis does not have the full replication feature set that CouchDB does. So Redis might restrict your future options with CouchDB. Specifically, Redis does not support multi-master replication. In contrast with CouchDB, you will always have a centralized Redis database. (Correct me if I'm wrong—I am stronger with CouchDB than with Redis.)
I've been learning Node.js so I decided to make a simple ad network, but I can't seem to decide on a database to use. I've been messing around with Redis but I can't seem to find a way to query the database by specific criteria, instead I can only get the value of a key or a list or set inside a key.
Am I missing something, or should I be using a more robust database like MongoDB?
I would recommend to read this tutorial about Redis in order to understand its concepts and data types. I also had problems to understand why there is no querying support similar to other (no) SQL databases until I read few articles and try to test and compare Redis with other solutions. Maybe it isn't the right database for your use case, although it is very fast and supports advanced data structures, but lacks querying which is crucial for you. If you are looking for a database which allows you to query your data then you should try mongodb or maybe riak.
Redis is often referred to as a data
structure server since keys can
contain strings, hashes, lists, sets
and sorted sets.
If able(easy to implement) you should use these primitives(strings,hashes,lists,set and sorted sets). The main advantage of Redis is that is lightning fast, but that it is rather primitive key-value store(redis is a little bit more advanced). This also means that it can not be queried like for example SQL.
It would probably be easier to use a more advanced store, like for example Mongodb, which is a document-oriented database. The trade-off you make in this case is PERFORMANCE, but I believe you should only tackle that if that is becoming a problem, which it probably will not be because Mongodb is also pretty fast and has the advantage that it can be queried. I think it would be advisable to have proper indexes for your queries(read>write) to make it fast.
I think that the main answer comes from the data structure. Check this article about NoSQL Data Modelling, for me it was very helpful: NoSql Data Modelling.
A second good article ever about Data Modeling, and making a comparison between SQL and NoSQL is the following: The Relational model anti pattern.
I am very newbie to this world of document db.
So... why this db are better than RDBMS ( like mysql or postgresql ) for very large amount of data ?
She have implement good indexing to carry this types of file, and this is designed for. This solution is better for Document Database, because is for it. Normal database is not designed to saving "documents", in this option you must hard work to search over your documents data, because each can be in other format this is a lot of work. If you choice document db solution you have all-in-one implemented because this database is for only "docuemnts", because this have implementation of these needed for it functions.
You want to distribute your data over multiple machines when you have a lot of data. That means that joins become really slow because joining between data on different machines means a lot of data communication between those machines.
You can store data in a mongodb/couchdb document in a hierarchical way so there is less need for joins.
But is is dependent on you use case(s). I think that relational databases do a better job when it comes to reporting.
MongoDB and CouchDB don't support transactions. Do you or your customers need transactions?
What do you want to do? Analyzing a lot of data (business intelligence/reporting) or a lot of small modifications per second "HVSP (High Volume Simple Processing)"?